-
Marketplace
-
Channel Resources
Articles from this Site
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
Where is the best place for a BI application to reside?
Are there best practices or tools for creating a data dictionary?
When do you use a star schema and when to use a snowflake schema?
What are dimension mapping and fact mapping?
White Papers
Data Warehouses: What are they and how will they benefit your organization?
Advances in Data Warehouse Performance
Books
Designing Fact Tables for OLAP Cubes
The purpose of this article is to explain how to design a dimensional model for differing grains of fact tables and provide a comprehensive strategy for cube implementation.
In almost all data warehousing applications, there is a need for analytical processing based on the comparison of allocated amounts and incurred amounts. What this means is that businesses are interested in comparing facts at a level on which allocation is done to the facts rolled up to this level from the lowest atomic level. To further elaborate, most data warehouse business users invariably want to know how much cost has been allocated, how much cost has actually been incurred and their variance. Most operational systems feeding data warehouses do not maintain activity-based costing. Allocated cost information is maintained at a much higher grain than incurred cost information, which is maintained at the lowest atomic level. The intention of this article is to provide a design solution to effectively answer this need.
For example, Figure 1 shows a sales hierarchy that will be used in this discussion. The marketing budget of company ABC, which sells a large number of products, is allocated once a year. Budget allocation is done at the regional level for each product. The marketing manager for the product in the respective region is responsible for tracking marketing expenses incurred at each level within the region.

Figure 1
The companys senior manager, who happens to be key user of the data warehouse, is not only interested in knowing the marketing expense for product A in state B this year, but also the marketing budget allocated versus the marketing expense incurred at regional level for each product this year.
The answer to the first question is simple and can be obtained by adding expenses incurred for product A for all cities within state B. The answer to the second question can be obtained by comparing the budget allocated at the regional level to the expenses rolled up from all states within that region (expenses at the state level can be determined by rolling up all cities within that state). This kind of requirement sends tremors to data modelers and database designers.
One design solution is to provide a multiple grain fact table that contains measures at differing grains. This approach not only makes fact table population logic unnecessarily complicated for ETL programmers, but also severely degrades query performance. Another design solution is to allocate costs at the lowest level, but this approach requires a paradigm shift in the way source systems are designed. (Users will have to allocate costs at lower levels just to make the technical design feasible.)
The third design solution is to provide two fact tables at different grains, one at the lowest level of atomicity (expense at city) and the other at a higher level (allocation at region). It is important to note that all the dimensions for both these fact tables are conformed dimensions; otherwise this design approach will not be feasible. Once the dual fact table design is done, a database view is created which is treated as a single fact by the OLAP tools. Lets see the design of this requirement.

Figure 2
Clearly, the two fact tables in Figure 2 are at different grains. Cube developers will have questions such as:
- What facts will appear in the cube?
- Will there be one or two cubes?
- How can we incorporate data from both fact tables in a single cube?
- What will the facts (measures) look like at the lowest level, in case of a single cube design?
The answer to some/all of the questions depends on the way the user community is comfortable looking at the information. In most situations, all facts will appear in the cube (otherwise it most likely wouldnt have made it to the fact table). Having two different cubes one having the "allocated" facts and the other "incurred" facts will be out of question because the user will not want to keep switching from one cube to the other to compare and find the variances in facts at each level of hierarchy. It is against the industry-accepted navigational rules. Hence, we agree that one cube is not only desired but is required. A database view containing facts (measures) from both the fact tables provides the answer to the third question. The answer to the fourth question might raise some acceptability issues from the users. At the lowest level (city) of atomicity, the cube is going to hold both "allocation" and "incurred" facts data, but the reality is that "allocation" data is not available at the city level (according to the design). So, what does the user see in the allocation column within the cube? It really depends on the OLAP tool being used because most OLAP tools in the market today will distribute the allocation amount down the hierarchy either to a single line or equally to all lines. This needs to be done in order to maintain additivity when traversing the hierarchy. If the OLAP tools provide a feature to hide facts (measures) depending upon the level in the hierarchy being traversed, users would see only the incurred amount at the city and state levels; and when they roll up to the regional level, they would see both allocation and incurred amounts. So, for now, the user community will have to be educated so they do analytical processing in a way that makes most sense. As long as they are aware that information in allocation amount does not mean anything at levels below region, it should not be a problem.
Another alternative that involves doing the lower-level allocation programatically, but this approach does not add any value because it will be an attempt to distribute allocations by creating dummy records at each of the lower levels of the hierarchy. It is nothing but forcing the grains of both the fact tables to be at the same level. But, from the cube perspective, it does not add any value as the lower level distribution will still be arbitrary and the numbers will not make sense to the users. The biggest downside of this approach is that it increases work for the ETL programmers, which is ultimately going to reflect in the project plan and project time lines.
Manish Varma is a data warehouse consultant who specializes in the design and implementation of data warehouse architecture, data modeling and database administration of large Oracle-based systems. He has successfully designed, implemented and facilitated the development of the Corporate Information Factory for Fortune 100 clients in the U.S.
For more information on related topics, visit the following channels:


