-
Marketplace
-
Channel Resources
Articles from this Site
A Scoring Model and Choice Model for Multistage Cross Selling in the Insurance Industry, Part 2
Lawrence Technological University Uses Metastorm ProVision
Information Builders to Extend WebFOCUS to Predictive Analysis
A Statistical Stocking Stuffer for the Holidays
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
White Papers
Best Practices: Eight Tips for Improving Your Professional Services Business
Metadata Management for Enterprise Applications
UML for C#
PHP Code Design
Domain-Specific Modeling: 10x Faster than UML
Web Seminars
Modeling Unstructured Data
Creative Strategies for Achieving 24/7 Uptime
Closing the Loop: Real-Time Event Detection and Response
Books
Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models
Data Mining Using SAS Applications
Data Mining: Concepts, Models, Methods and Algorithms
DW Design in the Real World, Part 4: Hierarchical Relationships
Building Business Intelligence
William wishes to thank Mike Cross (mcross@racenter.com), data warehouse director at Rent-A-Center, for his contribution to this month's column.
Data Warehouse Design in the Real World, Part 1
Data Warehouse Design in the Real World, Part 2: Abstract Design
Data Warehouse Design in the Real World, Part 3: Event-State Management
This is the fourth in a series of articles on data warehousing concepts and best practices. In this article, we will address a new area that is fundamental to good data warehouse design - hierarchical relationships.
Parent-child or hierarchical relationships are a quintessential element of every organization and every data warehouse. They define the relationship between two entities (whether it be a subpart to a part, a worker to her manager or a store to a market) and underlie almost all BI reporting. For example, Rent-A-Center has more than 3,500 stores that report through markets and regions to 12 divisions. For home office use, our BI group aggregates and reports the data at the division level. Because data warehouses usually store data at a much lower aggregation level than how it is reported, designing, representing and traversing hierarchies is fundamental to the success of the warehouse.
There are multiple ways to architect hierarchical relationships within data warehouses and data marts, all of which have advantages and disadvantages. Before looking at the modeling techniques, there are some basic assumptions about the data and hierarchical relationships. We will use the term "parent and child" to imply a hierarchical relationship, but realize that most entities will be both children and parents, depending upon the level you are at within the hierarchy. The first and foremost assumption is that at any given point in time, a child may have only one parent; second, with the exception of the top parent, a.k.a. head, all children have a parent; and third, data is maintained at the leaf level, that is, a child that is not a parent.
There are two basic modeling techniques: a flattened (horizontal) hierarchy and a relative (vertical) hierarchy. The first technique simply employs a table with one column for each potential level within an organization and one record for each leaf entity. The table is then populated horizontally either top down (head first) or bottom up (leaf first). It is often the case where some entities will not all have the same number of hierarchical levels as other entities (such as an employee table) and the horizontal approach will create a "ragged" alignment.
Vertical hierarchies, because of their abstract nature, are more powerful and can replicate any hierarchy that meets the just-mentioned assumptions. They are, however, much more difficult to maintain, query and use for reporting. A vertical hierarchy simply defines a parent-child relationship between two entities. To determine the full ancestry of a given entity, you must recursively find the parent of a parent until there is no parent. Conversely, to find all descendants of an entity, you must recursively find all children of all children until there are no children. For database management systems that have recursive capabilities, this is relatively easy; for those that do not, it is not so easy.
A more robust form of the vertical hierarchy goes beyond parent-child and links all ancestors while providing a relative distance between the two. For example, in a simple parent-child structure where Mike's parent is John, Chad's parent is John, John's parent is Robert and Robert has no parent, four records would be created. In a robust vertical hierarchy that fully links all ancestors, two additional records would be added showing a relationship between Mike and Robert, and Chad and Robert both with a relative distance of two. This expanded technique, while unnecessary, greatly eases reporting and eliminates the need to recursively traverse the hierarchy.
An additional consideration for all hierarchical relationships is a time factor. You want to be able to show how a relationship looks not only now, but also in the past and possibly in the future. To accomplish this, simply add a start time and an end time to each record. The start time becomes part of the key, but should not be included as part of the reference or foreign key for parents in vertical hierarchies. End times are only populated when relationships cease to exist or are replaced by new relationships. If you want to do an end-of-year report at the division level that includes all that were open any time during the year, you simply need to find the alignment where the end date is between January 1 and December 31 or the start date is before December 31 and the end date is blank.
Both hierarchical techniques have their uses within data warehouses and data marts. For relationships that have well-defined levels that all organizational entities fall within, the horizontal hierarchy is by far the easiest to implement and understand, but may create significant challenges if your assumptions change. For hierarchies that are variable in depth, a vertical approach should be taken. At Rent-A-Center, we maintain five different pure parent-child hierarchies within the data warehouse. This technique was chosen because of its capability to adapt to almost any operational change despite promises that "this will never change" (it has), it is the easiest for us to maintain and because it ultimately consumes the least amount of storage. Within the data marts, these hierarchies are transformed into horizontal and robust vertical hierarchies, depending upon the needs of the reporting tool and user community.
William McKnight is partner, Information Management, at Lucidity Consulting Group. William functions as strategist, lead enterprise information architect and program manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success story. McKnight is a Southwest Entrepreneur of the Year Finalist, a frequent best practices judge, has authored more than 150 articles and white papers and given over 150 international keynotes and public seminars. His teams implementations from both IT and consultant positions have won Best Practices awards. He is a former IT VP of a Fortune company, a former engineer of DB2 at IBM and holds an MBA. He can be reached at wmcknight@luciditycg.com
For more information on related topics, visit the following channels:


