FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Super Star

Design Challenge

A star schema is designed to efficiently answer metric business questions, such as those questions that return amounts, quantities and counts. For example, "What are our total sales by customer and product over the last six months?" Stars are generally small in scope, with a handful of dimensions surrounding a single fact table. But one day, a colleague stops by your desk with a printout of a star schema that makes you jump right out of your seat. It is a sales super star! More than 1,000 data elements from the sales area are on this model. You learn that this star was produced by placing every metric data element from sales in the fact table and every nonmetric data element in a dimension. The fact table contains more than 100 different metrics and there are more than 30 dimensions! You are even more shocked to learn that this star schema is in production!

The Challenge

The sales star schema is experiencing some performance and space issues, and your colleague would like advice on how best to restructure this model. What is your response?

The Response

Many database problems, such as performance and space, have as their root a gap in analysis. In this scenario, the gap is not fully understanding the scope or subject matter of the original requirements. Once the requirements are fully understood and properly scoped, a variety of different modeling techniques can enter the picture to reduce the super star into a series of fast and user-friendly stars.

Understanding the Requirements

A star needs to be defined by a set of business questions and not by a broad functional area such as sales. Ideally, a representative set of the business questions within the sales area can be identified and documented, and the metrics behind these questions can be grouped together into efficient stars. The metrics can be grouped by usage or by business area.

Grouping metrics by usage means assigning metrics to stars based on common reporting and queries. Nishant Upadhyay, data architect, would analyze usage by studying the reports generated from this super star. "This will enable me to categorize the reports into sales lifecycle reports, sales process reports, pipeline sales reports, etc. The super star can be reviewed to see if it can be broken up into these subject areas to make them more manageable and focused." Michael Workman, systems analyst, would interview business users and create a series of stars according to different audiences.

Grouping metrics by business area means breaking the sales function down into its business processes. Maurice Frank, data modeler, would begin by investigating the sales business processes. "Sales is very general. Break it down into smaller business processes and map the measures and dimensions to each process which would have its own fact table." Mike Nicewarner, data analyst, supports this approach by saying, "Divide and conquer! Instead of just sales, you have tables named outside sales or marketing efforts."

Applying Different Modeling Techniques

After understanding the requirements, design challengers suggested a number of modeling techniques to restructure this super star, including shifting simple calculations to the reporting tool, selectively applying indexes, re-evaluating history and restructuring dimensions.

Both Mustufa Kapadia, BI analyst, and Eve Halberg, BI team lead, stress the need to shift simple calculations to the reporting tool. Eve says, "I would review all metric elements to determine whether they are required. Base metrics are required; calculated metrics may be better handled by reporting tools, such as averages."

Indexes can be a quick fix to a performance issue when applied properly. Mustufa Kapadia would analyze indexes on both the facts and dimensions. He would also look for opportunities to reduce or offload the amount of history in the sales fact table. Another approach would be to store the oldest sales fact records at a higher level of granularity to save space and increase performance.

After understanding the requirements, the dimensions themselves can undergo a restructuring. Eve Halberg says, "Dimensions that can be merged are status or code dimensions. For performance, I usually look to create what I call 'key dimensions' for large dimension tables. For example, I broke my customer dimension into a customer dimension and customer key dimension. The key dimension contains the elements that are frequently accessed in reports. Also, sometimes I've seen metrics in a sense repeated, such as checking account deposit amount and savings account deposit amount. I would replace these with one amount and create a key dimension that types the amount."

If you would like to become a design challenger and have the opportunity to submit modeling solutions, go to www.stevehoberman.com/designchallenge.htm and add your email address. If you have a challenge for us, please email me a description of the scenario at me@stevehoberman.com.


Steve Hoberman has worked as a business intelligence and data management practitioner and trainer since 1990. He is a Certified Business Intelligence Professional (CBIP), having achieved mastery level certification in data analysis and design. He is a popular and frequent presenter at industry conferences, both nationally and internationally.  Hoberman is a columnist and frequent contributor to industry publications, as well as the author of  Data Modeler's Workbench and Data Modeling Made Simple (available for purchase through the DM Review bookstore). He is the founder of the Design Challenges group, inventor of the Data Model Scorecard and a recognized innovator and thought leader in the field of data modeling. He can be reached at me@stevehoberman.com.

Graeme Simsion's latest book is out! Data Modeling Theory and Practice. Here's a link where you can read more about the book and purchase it at a discounted price.

For more information on related topics, visit the following channels:



Industry Vendors