Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search

Business Intelligence
Corporate Performance Management
Data Management
Data Integration
Data Quality
Data Visualization
Data Warehousing Basics
Master Data Management
View all Portals

Scheduled Events

White Paper Library
Research Papers



DM Review Home
Current Magazine Issue
Magazine Archives
DM Review Extended Edition
Online Columnists
Ask the Experts
Industry News
Search DM Review

Tech Evaluation Center:
Evaluate IT solutions
Buyer's Guide
Industry Events Calendar
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Information Management:
DW 2.0 - Architecture for the Next Generation of Data Warehousing

  Column published in DM Review Magazine
April 2006 Issue
  By Bill Inmon

Data warehousing began in the 1980s as a response to the lack of information provided by the many online application systems that were being built. Online applications served the needs of a limited community of users, and they were rarely integrated with each other. Additionally, online applications had no appreciable amount of historical data because they jettisoned their historical data as quickly as possible in the name of high performance. Thus, corporations had lots of data and very little information. Data warehousing began as a way to reduce users' frustration with their inability to get integrated, reliable, accessible data.

The definition for a data warehouse was and still is today: A source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management's decision processes.

Since the 1980s, data warehousing has gone from being a concept that was derided by the database theoreticians of the day to conventional wisdom. Now everybody has data warehousing in one form or the other.

Once the idea of the data warehouse became popular, vendors and consultants latched onto the concept as a good way to sell their products. In short order, there were many vendors and consultants who claimed their product was an embodiment of a data warehouse. As a result, there was much confusion over what was and was not a data warehouse.

For 15 years, people built different manifestations of data warehouses, and for the most part they were happy with the results. Data warehousing fulfilled a real need in the marketplace.

In many ways, a data warehouse was the first attempt at architecture that most organizations had ever encountered. Prior to data warehousing, everything had been a new application; however, it became apparent that applications were not going to get the organization where it needed to go over time. The solution was to build an architecture or at least the first fledgling steps of an architecture.

Fast forward to today. There is still a great deal of confusion as to what a data warehouse really is. One problem is that the term "data warehouse" was never trademarked or copyrighted. As a result, anyone can call anything a data warehouse. You can call a Volkswagen a data warehouse if you want, and there is no legal barrier to doing so. In a word, data warehousing has lost its integrity - if it ever had any to begin with.

Over time, the underlying architecture for a data warehouse has evolved, even though the original definition remains the same. Figure 1 shows DW 2.0 - the architecture for the next generation of data warehousing.

Figure 1: DW 2.0TM Architecture


There are several substantial differences between the first generation of data warehouses and DW 2.0:

  • The lifecycle of data. As data ages, its characteristics change. As a consequence, the data in DW 2.0 is divided into different sectors based on the age of the data. In the first generation of data warehouses, there was no such distinction.
  • Unstructured data is a valid part of the data warehouse. Unstructured data is email, spreadsheets, documents and so forth. Some of the most valuable information in the corporation resides in unstructured data. The first generation of data warehouses did not recognize that there was valuable data in the unstructured environment and that the data belonged in the data warehouse.
  • The way unstructured data is treated. Unstructured data exists in several forms in DW 2.0 - actual snippets of text, edited words and phrases, and matching text. The most interesting of these forms of unstructured data in the DW 2.0 environment is easily the matching text. In the structured environment, matches are made positively and surely. Not so with unstructured data. In DW 2.0, when matches are made, either between unstructured data and unstructured data or between unstructured data and structured data, the match is said to be probabilistic. The match may or may not be valid, and a probability of an actual match can be calculated or estimated. The concept of a probabilistic match is hard to fathom for the person that has only dealt with structured systems, but it represents the proper way to link structured and unstructured data.
  • The need for close incorporation of metadata into the data warehouse. Metadata is the glue that holds the data together over its different states. Amazingly, the first generation of data warehousing omitted metadata as part of the infrastructure.
  • The need for different levels of metadata. Metadata is found in many places today - multidimensional technology, data warehouses,  database management system catalogs, spreadsheets, documents and extract, transform and load. There is little or no coordination of metadata from one architectural construct to another; however, there is still a need for a global repository. These sets of needs are recognized and addressed architecturally in DW 2.0.
  • The recognition of the need for integrity of data as data passes from online processing to integrated processing. Because data is constantly changing (or at least subject to change), there is only fleeting integrity of data at the online level.

One other important distinction with DW 2.0 is that because DW 2.0 is trademarked, it enjoys legal protection. There is a strict and clearly stated definition of the architecture for DW 2.0, and no one except the original authors and architects can change the specifications. There is integrity, then, in the definition of DW 2.0. This architecture is fully described on the Web site www.inmoncif.com. All access to the Web site and all noncommercial usage of the material on the Web site is free. All commercial usage of the material is strictly prohibited.


The advantages of the DW 2.0 architecture include the ability to:

  • Hold data at the lowest detail,
  • Hold data to infinity (or at least to your retirement),
  • Not cost huge amounts of money,
  • Have integrity of data and still have online high-performance transaction processing,
  • Link structured data and unstructured data,
  • Tightly couple metadata to the data warehouse environment,
  • Support different kinds of processing without sacrificing response time, and
  • Support changes of data over time.

One of the natural questions that should be asked is: what happens to the work that has already been done on a first-generation data warehouse? Some minor rework may be required, but for the most part DW 2.0 can be built as a natural extension of a first generation data warehouse. Extending a first generation data warehouse to a DW 2.0 architecture is an evolutionary process.


For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance and DW Basics.

Bill Inmon is universally recognized as the father of the data warehouse. He has more than 35 years of database technology management experience and data warehouse design expertise. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for many major computing associations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.

Solutions Marketplace
Provided by IndustryBrains

SAP for Midsize Companies
Thousands of midsize companies run SAP. View customer successes!

Integrate your SAP system within your enteprise
Need to integrate your mission-critical SAP system within your enterprise as well as connect it to your trading partners? Boomi helps companies connect their SAP application to trading partners (through EDI and XML) and other apps within the company.

EMC - The easier way to archive is here
Minimize risk, control costs, and protect vital information with EMC's software archiving platform.

Data Mining Courses: Strategy, Methods & Apps
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Find Consulting Jobs
Access Pre-Qualified Projects from Top Businesses. Register Now!

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.