-
Marketplace
-
Channel Resources
Articles from this Site
A Scoring Model and Choice Model for Multistage Cross Selling in the Insurance Industry, Part 2
Lawrence Technological University Uses Metastorm ProVision
Information Builders to Extend WebFOCUS to Predictive Analysis
A Statistical Stocking Stuffer for the Holidays
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
White Papers
Best Practices: Eight Tips for Improving Your Professional Services Business
Metadata Management for Enterprise Applications
UML for C#
PHP Code Design
Domain-Specific Modeling: 10x Faster than UML
Web Seminars
Modeling Unstructured Data
Creative Strategies for Achieving 24/7 Uptime
Closing the Loop: Real-Time Event Detection and Response
Learning from Others: Best Practices for Data Governance
Data Quality Getting Started the Right Way
Books
Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models
Data Mining Using SAS Applications
Data Mining: Concepts, Models, Methods and Algorithms
DW Design in the Real World, Part 2: Abstract Design
Building Business Intelligence
William wishes to thank Mike Cross (mcross@racenter.com), data warehouse director at Rent-A-Center, for his contributions to this month's column.
The one overriding constant about data warehousing is that the data warehouse will change. Data designs that you spend months perfecting will become obsolete overnight, and unforeseen business requirements will require a different view of the data. If you want to be a successful data warehouse architect, you can either become very astute at accommodating change or you can design for the unknown.
One way to architect your data warehouse for the unknown is abstract design. Abstract design allows for and welcomes change without impacting the overall structure or design of your data warehouse. The primary benefit of abstract design is flexibility. This design technique can represent the data more naturally, is easily understood by end users, allows for unforeseen changes, requires less knowledge of the data and data relationships by the end user, and prevents the carrying forward of legacy data elements. Additionally, database indexing techniques can be fully exploited to make querying abstract designs much faster than straight normalized or dimensional designs.
As the name implies, abstract design removes most of the rigidity of traditional data design and replaces it with one or more levels of abstraction. Abstract design is characterized by heavy use of supertypes and subtypes, surrogate keys representing natural keys, very simple elements - such as amount, count and date - and lookup tables that define element types and relationships between element types.
As an example, consider a simplified daily income table for a convenience store. Traditionally, it may look something like Figure 1.

Figure 1: Simplified Daily Income Table
One of the problems with this design is that if additional income categories appear, as they inevitably will, you will need to add them to the table structure. After several iterations of adding, renaming and subtracting, the table transforms itself into a very inefficient structure. Other problems with this design are that developers need to understand the history of all changes to use it effectively, and you will need to include all income columns for every store, even if it does not apply to that store. An abstract design for this same structure would look like Figure 2.

Figure 2: Abstract Design Daily Income Table
This structure, combined with the lookup table, which is shown in Figure 3, allows for new income categories without changing the existing structure and provides the added benefit of defining categories and showing relationships between categories.

Figure 3: Lookup Table
An Example of Abstract Design
Consider the following. You have the new capability to break fuel sales into diesel and unleaded. Traditionally, you would need to add two additional columns to the income table and begin populating these. Developers would need to know when this change was made and query the table accordingly. (Because this change would probably not happen in all stores at the same time, more complicated logic will probably be necessary.) In the abstract design, you would simply add two income types whose parents are fuel sales. If developers were querying the data at the lowest possible level, they would not even have to be cognizant of this change and would automatically receive the lowest level of detail data available, regardless of the implementation schedule.
The use of surrogate keys means you are not tied to current names and allows for the renaming of types without changing the structure, breaking existing queries or needing to notify developers or end users. The use of recursive parent-child relationships allows for the simple aggregation of types without having to know the children or the number of generations below the parent. Again, developers simply need to write code once that anticipates these situations.
The heavy use of supertypes and subtypes aids in abstract data design. By combining multiple related entity types under a single supertype, you can exploit the abstract design to show relationships between items that are not within the same subtype without having to worry about tracking which subtype they belong to or having to perform "tricks" if an entity belongs to multiple subtypes. The fundamental principle that subtypes must be exclusive and exhaustive is great in theory, but in practice becomes very limiting, if not impossible to implement. In addition, there is no implied relationship or hierarchy between subtypes. This is defined elsewhere if necessary.
In an insurance example, there are agents, adjusters, agencies and companies. An abstract design would create a supertype of organization, containing the core information about each organization (such as name, address and tax ID) and the subtypes of agent, adjuster, agency and company, which have specific information pertinent to each subtype. With this design, a person would be defined once as an organization, possibly once as an agent and once as an adjuster; all three would have the same surrogate key. A relationship table could then be created that relates agents to agencies and agencies to companies. However, because there is not an implied relationship or order within the subtypes, an agent could be the child of a company and an agency could be the child of another agency. The only relationship not allowed is a child being its own parent.
Before tackling abstract design, a thorough understanding of the underlying data and its relationship with other data is necessary. Relational and dimensional tables, though inefficient at times, do not require as much understanding of the data. The elements are grouped according to a certain granularity level (i.e., the primary key), and all relationships between the elements below the key are left to the end user to understand and implement accordingly. Abstract design, however, requires the complete understanding of data for its full potential to be realized.
Abstract design is not the answer to all data modeling challenges and should not be taken to an extreme, but it is a very powerful technique that allows the data warehouse to accommodate unforeseen changes without having to be redesigned, because we all know that change happens.
William McKnight is partner, Information Management, at Lucidity Consulting Group. William functions as strategist, lead enterprise information architect and program manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success story. McKnight is a Southwest Entrepreneur of the Year Finalist, a frequent best practices judge, has authored more than 150 articles and white papers and given over 150 international keynotes and public seminars. His teams implementations from both IT and consultant positions have won Best Practices awards. He is a former IT VP of a Fortune company, a former engineer of DB2 at IBM and holds an MBA. He can be reached at wmcknight@luciditycg.com
For more information on related topics, visit the following channels:


