|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Information Management: Charting the Course:
The data warehouse is the center of the decision-making universe. The data warehouse supports data marts. The data warehouse supports exploration warehouses. The data warehouse interacts with operational data stores. The data warehouse supports data mining warehouses. In short, there are a whole host of environments that the data warehouse supports. The very essence of the data warehouse is reusability of data.
At the heart of the reusability of data is the granularity of data that is found there. The very fine granularity of data found in the data warehouse means that the data coming out of the data warehouse is very flexible. Finance can look at the granular data in the data warehouse one way. Accounting can look at granular data in the data warehouse another way. Marketing can add up the granular data yet another way while engineering can add up the data warehouse data in another way. Through it all, the granular data residing in the data warehouse remains in a reconcilable state. If management wants to see why marketing and sales disagree, there is a single definitive source to which the organization can turn.
In short, there are some very pleasant side effects of the granularity of the data that resides in the data warehouse.
There is an interesting question that arises when examining the structure of the data warehouse: Can a data warehouse contain fact tables?
A fact table is a table that contains different kinds of data in the same physical structure. A fact table integrates different forms and occurrences of granular data into a single physical structure. The different units of data that are integrated into a fact table have one common theme - their reliance (direct or indirect) on the primary key of the fact table.
For example, the primary key of a fact table might be order number. Based on usage and requirements, this fact table might contain data about:
Not only do fact tables store data so that it can be accessed efficiently, but the fact table often positions the data so that it has a natural look and feel to it. The end user looks at the fact table and recognizes the data and its structure.
In a word, a fact table combines data so that the granularity of the data is lost. Lots of granular data goes into the creation of a fact table. In doing so, the identity and the structure of the fact data is lost.
A fact table can be created at a low level of granularity or at a level that contains summarized data. In either case, the fact table contains different types of data that are merged together into a single, physically contained data structure.
Why would anyone want to create a fact table? And why would anyone want to create a fact table inside a data warehouse? Doesn't a fact table ruin the granularity of the data found inside the warehouse? And once the granularity of data is compromised, isn't the purpose of the data warehouse destroyed? What's going on here anyway?
The main reason you would want to create a fact table inside a data warehouse is that the data contained inside a fact table is very easy and efficient to access. Think about it. If you were the system, which would you rather do: go out and grab a single record of data with everything you need inside the record; or go out and grab a lot of records, each with a small amount of data in the record, and then have to turn around and glue all those pieces of data together?
Of course, it is much easier and much more efficient to grab a single record than a bunch of records. Therefore, the way to go is to create a fact table. Or is that really the way to go? With that brief explanation of what a fact table is and why a person might want one, what are the solutions?
Indeed, fact tables can be created inside a data warehouse (see Figure 1). There is no argument (or even any real discussion) as to whether fact tables can be created inside a data warehouse. The issue is not can they be created inside a data warehouse. The issue is should they be created there.
Figure 1: Fact tables can be placed inside a data warehouse.
Under some conditions, fact tables belong inside a data warehouse and under other conditions they do not belong in a data warehouse. When an organization - as an organization - always looks at data in the same way, and that way entails different types of granular data, then a fact table is in order inside the data warehouse. When data is organized such that everyone looks at data element A when they look at data element B, C and D, then it is absolutely safe to put A, B, C and D in the same fact table inside the data warehouse. But what if everyone does not look at the data in the same way? What if some people look at data elements B, F, Z and A together?
If 95 percent of the accesses of the data are for elements A, B, C and D, then it still is probably safe to create a fact table. However, if only 25 percent of the accesses of the data look at A, B, C and D together, it is probably a good idea to separate the data elements. Why? Because if A, B, C and D are placed together when only a few people need to see them together, it is very inefficient and very artificial for people who don't want to see the elements glued together to have to access them together, disengage them and then recombine them with other data elements as desired. In other words, the only time it is safe to create a fact table inside a data warehouse is when the vast preponderance of the accesses of the data will be made in exactly the same way. Whether or not it is a good idea to build a fact table inside a data warehouse is an issue of the probability of access of data - plain and simple.
Unfortunately in a DSS data warehouse environment, the pattern of access to data is usually ill-defined. One day people want to see the data one way; the next day they want to see the data another way. Under most circumstances, how data is to be accessed is a great unknown. And when that is the case, creating a fact table inside a data warehouse is a very awkward, very unnatural thing to do.
The place where fact tables fit very nicely is at the data mart level. In a data mart, requirements are carefully gathered before the data is structured. In a data mart, you know how the data is going to be used before the data mart is built. Under these circumstances, a fact table makes imminent sense.
However, is it possible to have your cake and eat it too? Is it possible to build a fact table and store data at a low level of granularity inside the data warehouse? It is possible to store data element A by itself and then store data element A in a fact table along with B, C and D? Yes, this is a definite possibility. But what about the redundancy of data? The essence of a data warehouse is that a minimal amount of data be used because data warehouses get to be big enough without any help. Any place where you can squeeze unnecessary data out of a data warehouse is welcome. When you create one place for data element A and another place for data elements A, B, C and D, you are creating redundancy of data - exactly what you don't want to do.
Another issue is that of update. In an ideal world, you don't need to update a data warehouse. But sometimes people build data warehouses under less than ideal circumstances. Occasionally people build data warehouses where update needs to occur. In those circumstances, keeping data element A in sync with data elements A, B, C and D may be asking a lot.
As you can see, there are some circumstances where it makes sense to create fact tables in a data warehouse. Under normal circumstances, however, such a structure is much better suited for the data mart environment where the fit is natural and efficient.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 35 years of database technology management experience and data warehouse design expertise. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for many major computing associations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.
|View Full Magazine Issue|
|E-Mail This Column|