Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Information Management:
World-Class Business Intelligence

  Column published in DMReview.com
March 1, 2005
  By Bill Inmon

Every few years, the corporate information factory (CIF) is extended, as architecture and technological advances occur in the industry. The highlights of the additions to the 2004/2005 CIF are the inclusion of:

  • Unstructured data,
  • Unstructured ETL,
  • Unstructured visualization, and the
  • Virtual operational data store.

Unstructured data has been around for a long time. Unstructured data includes e-mail, spreadsheets, text files, Word documents and more. Typically, unstructured data is what you find on the desktop. Interestingly, there is a large world of structured data and a large world of unstructured data, but there is very little intersection between the two. There is a lot of very valuable data in the world of unstructured data, and it is a shame that there has been little intersection between the two environments all these years. Now there is unstructured ETL technology and there is the potential for intersection of the two worlds at last.

One of the most intriguing new possibilities is unstructured visualization. Visualization today is really visualization of numbers and quantities. There are summarizations, drill down, drill across, detailed analysis and KPIs. All of this manipulation and visualization is based on the properties of numerical data. However, the fiber of unstructured data is made up of text, not numbers. Now there is unstructured visualization, based on text, which is the business intelligence of the unstructured world.

Perhaps the most interesting new addition to the CIF is that of the virtual operational data store (VODS). For a long time there has been talk of the virtual data warehouse. There have been many manifestations of the virtual data warehouse, the most prominent of which is the federated data warehouse. However, anything virtual in the world of data warehousing is pie in the sky. Because data warehousing requires a tangible, real foundation if it is going to do what it needs to do, "virtual" and "data warehousing" do not mix at all.

However, operational data stores (ODSs) are fundamentally different from data warehouses. The ODS only reflects information as of a single moment in time. The ODS reflects transitory data, not permanent data. Because of this fundamental difference, the ODS is architecturally different from the data warehouse. Having a virtual ODS is absolutely an acceptable thing to do.

In order to highlight this difference, consider this. You run a query against a data warehouse at 10:32 a.m. and get an answer of $4,981.07. Then you do an identical query against the same data warehouse at 7:18 p.m. What result should you get? It should be $4,981.07 -- not a penny more or less. However, consider a query against an ODS. You do a query at 11:15 a.m. and you get an answer of $5,119.06. The same query at 4:13 p.m. yields an answer of $6,510.74. Is this a problem? Not at all. In the ODS environment, data underlying the query has the potential to change from one instant in time to the next. Therefore, as time changes, the underlying values the query is based on can change as well.

Because of this transitory nature of ODS data, it is possible to have a virtual ODS. In a virtual ODS, the data needed for the query is gathered at the time the query is made. In a standard physical ODS, the data is gathered into the physical structure known as the ODS. In a virtual ODS, there is no physical infrastructure. This means that a virtual ODS is fast to build and is highly flexible.

The primary difference between a physical ODS and a virtual ODS is where resources are spent. In a physical ODS, resources are spent in building an infrastructure. When it comes time to make a query against a physical ODS, the query consumes very little resources. In a virtual ODS, there is no time spent in building an infrastructure, but the query of a virtual ODS consumes many more resources than a query against a physical ODS. In essence, the virtual ODS has to be rebuilt every time a query is made.

There are other important differences between a virtual ODS and a physical ODS as well. The physical ODS is much less versatile than the virtual ODS. However, the virtual ODS is subject to a series of limitations:

  • What if the data underlying the virtual query is not integrated? With a virtual ODS, there is the possibility of getting really strange results.
  • What if the underlying data is being accessed by a process that will not share the resources, such as a reorganization? It is possible that a query can take a very long time.
  • What if a vendor of an underlying resource one day decides to not cooperate with the other technologies? Anyone thinking that Larry Ellison is going to happily coordinate with IBM or Microsoft hasn't been paying a lot of attention.

For more information on related topics visit the following related portals...
Business Intelligence (BI), ETL and Unstructured Data.

Bill Inmon is universally recognized as the father of the data warehouse. He has more than 35 years of database technology management experience and data warehouse design expertise. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for many major computing associations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.