Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Information Management: Charting the Course:
Fact Tables in the Warehouse

  Column published in DM Review Magazine
November 1999 Issue
 
  By Bill Inmon

The data warehouse is the center of the decision-making universe. The data warehouse supports data marts. The data warehouse supports exploration warehouses. The data warehouse interacts with operational data stores. The data warehouse supports data mining warehouses. In short, there are a whole host of environments that the data warehouse supports. The very essence of the data warehouse is reusability of data.

At the heart of the reusability of data is the granularity of data that is found there. The very fine granularity of data found in the data warehouse means that the data coming out of the data warehouse is very flexible. Finance can look at the granular data in the data warehouse one way. Accounting can look at granular data in the data warehouse another way. Marketing can add up the granular data yet another way while engineering can add up the data warehouse data in another way. Through it all, the granular data residing in the data warehouse remains in a reconcilable state. If management wants to see why marketing and sales disagree, there is a single definitive source to which the organization can turn.

In short, there are some very pleasant side effects of the granularity of the data that resides in the data warehouse.

Fact Tables Inside a Warehouse?

There is an interesting question that arises when examining the structure of the data warehouse: Can a data warehouse contain fact tables?

A fact table is a table that contains different kinds of data in the same physical structure. A fact table integrates different forms and occurrences of granular data into a single physical structure. The different units of data that are integrated into a fact table have one common theme - their reliance (direct or indirect) on the primary key of the fact table.

For example, the primary key of a fact table might be order number. Based on usage and requirements, this fact table might contain data about:

  • The part that was ordered
  • The cost of the part that was ordered
  • The shipment specifications of the order
  • The customer who made the order
  • The name of the customer who placed the order
  • The place where the order was made
  • The phone number of the order
  • The vendor who supplies the part that was ordered
  • The back order status as of the moment of order
  • The number of parts in inventory as of the date of the order
  • The available colors for the part under normal circumstances
  • Any substitute parts order number
  • The clerk's name who took the order
  • The available discounts at the moment of the placing of the order

Not only do fact tables store data so that it can be accessed efficiently, but the fact table often positions the data so that it has a natural look and feel to it. The end user looks at the fact table and recognizes the data and its structure.

In a word, a fact table combines data so that the granularity of the data is lost. Lots of granular data goes into the creation of a fact table. In doing so, the identity and the structure of the fact data is lost.

Different Levels of Granularity

A fact table can be created at a low level of granularity or at a level that contains summarized data. In either case, the fact table contains different types of data that are merged together into a single, physically contained data structure.

Why would anyone want to create a fact table? And why would anyone want to create a fact table inside a data warehouse? Doesn't a fact table ruin the granularity of the data found inside the warehouse? And once the granularity of data is compromised, isn't the purpose of the data warehouse destroyed? What's going on here anyway?

The main reason you would want to create a fact table inside a data warehouse is that the data contained inside a fact table is very easy and efficient to access. Think about it. If you were the system, which would you rather do: go out and grab a single record of data with everything you need inside the record; or go out and grab a lot of records, each with a small amount of data in the record, and then have to turn around and glue all those pieces of data together?

Of course, it is much easier and much more efficient to grab a single record than a bunch of records. Therefore, the way to go is to create a fact table. Or is that really the way to go? With that brief explanation of what a fact table is and why a person might want one, what are the solutions?

Indeed, fact tables can be created inside a data warehouse (see Figure 1). There is no argument (or even any real discussion) as to whether fact tables can be created inside a data warehouse. The issue is not can they be created inside a data warehouse. The issue is should they be created there.

Figure 1: Fact tables can be placed inside a data warehouse.
Figure 1: Fact tables can be placed inside a data warehouse.

Under some conditions, fact tables belong inside a data warehouse and under other conditions they do not belong in a data warehouse. When an organization - as an organization - always looks at data in the same way, and that way entails different types of granular data, then a fact table is in order inside the data warehouse. When data is organized such that everyone looks at data element A when they look at data element B, C and D, then it is absolutely safe to put A, B, C and D in the same fact table inside the data warehouse. But what if everyone does not look at the data in the same way? What if some people look at data elements B, F, Z and A together?

If 95 percent of the accesses of the data are for elements A, B, C and D, then it still is probably safe to create a fact table. However, if only 25 percent of the accesses of the data look at A, B, C and D together, it is probably a good idea to separate the data elements. Why? Because if A, B, C and D are placed together when only a few people need to see them together, it is very inefficient and very artificial for people who don't want to see the elements glued together to have to access them together, disengage them and then recombine them with other data elements as desired. In other words, the only time it is safe to create a fact table inside a data warehouse is when the vast preponderance of the accesses of the data will be made in exactly the same way. Whether or not it is a good idea to build a fact table inside a data warehouse is an issue of the probability of access of data - plain and simple.

The Great Unknown

Unfortunately in a DSS data warehouse environment, the pattern of access to data is usually ill-defined. One day people want to see the data one way; the next day they want to see the data another way. Under most circumstances, how data is to be accessed is a great unknown. And when that is the case, creating a fact table inside a data warehouse is a very awkward, very unnatural thing to do.

Fact Tables in the Data Mart

The place where fact tables fit very nicely is at the data mart level. In a data mart, requirements are carefully gathered before the data is structured. In a data mart, you know how the data is going to be used before the data mart is built. Under these circumstances, a fact table makes imminent sense.

However, is it possible to have your cake and eat it too? Is it possible to build a fact table and store data at a low level of granularity inside the data warehouse? It is possible to store data element A by itself and then store data element A in a fact table along with B, C and D? Yes, this is a definite possibility. But what about the redundancy of data? The essence of a data warehouse is that a minimal amount of data be used because data warehouses get to be big enough without any help. Any place where you can squeeze unnecessary data out of a data warehouse is welcome. When you create one place for data element A and another place for data elements A, B, C and D, you are creating redundancy of data - exactly what you don't want to do.

Another issue is that of update. In an ideal world, you don't need to update a data warehouse. But sometimes people build data warehouses under less than ideal circumstances. Occasionally people build data warehouses where update needs to occur. In those circumstances, keeping data element A in sync with data elements A, B, C and D may be asking a lot.

As you can see, there are some circumstances where it makes sense to create fact tables in a data warehouse. Under normal circumstances, however, such a structure is much better suited for the data mart environment where the fit is natural and efficient.

...............................................................................

For more information on related topics visit the following related portals...
DW Design, Methodology.

Bill Inmon is universally recognized as the father of the data warehouse. He has more than 35 years of database technology management experience and data warehouse design expertise. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for many major computing associations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.