FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

What is the purpose and role of dimensional modeling during data warehouse development process?

Q: What is the purpose and role of dimensional modeling during data warehouse development process and what are the strengths and weaknesses of dimensional modeling?

Clay Rehm's Answer:

The main purpose is to identify common business terms used by your users that are used as query identifiers when they run queries against the data.

The weakness is that you may think you have captured all of the requirements when in reality you may have not. However, being aware of this fact may prevent this from happening. The key to dimensional modeling is to have all of the business users involved - not just the ones managing the effort and not just the ones who will be using it. This is a perfect time to get all of the possible stakeholders together and find out the questions they intend to ask of your data warehouse.

Anne Marie Smith's Answer:

Dimensional modeling is the type of modeling used in data warehouse/data mart design since it is intended to provide the capabilities of end-user querying that is the purpose of a data warehouse. A dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds to one of the components of the multipart key in the fact table. This construct is frequently called a "star schema" since the graphical representation can resemble a star. One strength of the star schema (dimensional model) is that it can accommodate various types of user queries so the design for the data warehouse database can be performed without concern for the exact types of queries that users may employ. Another advantage of the star schema is that the model can be extended simply by adding additional rows to both the fact and dimension tables, no need to add new tables or rejoin existing tables. Weaknesses of the dimensional model are that they need to be built after having completed traditional ER (entity-relationship) models so that you have a proper view of the data and its usage in the source systems, and many organizations do not take this essential first step - they go right to dimensional modeling and lose the insight that having an ER model would give. Also, learning how to do a good dimensional model takes time, and some techniques of DM are different from ER modeling, so the modeler has to remember that they are working on a dimensional model so they don't fall back into a more traditional ER design.

Joe Oates' Answer:

I would say that the main purpose of dimensional modeling is to make queries easier to write and more efficient than an equivalent third normal form (3NF) design. It should be pointed out that 3NF designs are suited for transaction processing where the goal is the efficiency of insert as opposed to efficiency of reporting.

Multidimensional models consist of "snowflake schemas" as well as star schemas. Typically, data marts with a single system has the source can use a star schema. But when there are multiple source systems with, for example, multiple product structures and multiple people or organizations that can be involved in a transaction, a simple star schema is not sufficient. Ralph Kimball categorizes these real-world situations as the heterogeneous data problem and data requiring "helper" tables.

The key strengths that I have found for multidimensional modeling and implementation include:

  1. It is much easier to write a query using a dimensional design with six or eight tables (or views masking an underlying snowflake schema) that includes two or three hierarchies than writing a query for the equivalent 3NF design that may have 20 or more tables
  2. It is much easier to talk to business people using a multidimensional design than a more complicated 3NF design
  3. Many query tools expect a multidimensional design
  4. In my experience, it's much easier to add to a multidimensional design and it is to a 3NF design.

And Ralph Kimball's The Data Warehouse Lifecycle Tool Kit, he describes the strengths of the dimensional design on page 147.

Chuck Kelley's Answer:

This depends on your technical architecture. If your data warehouse is based on a grouping of stars as data marts (Ralph Kimball style), then dimensional modeling plays a huge role in the development of the data warehouse. If your data warehouse is based on a data warehouse where access is primarily from the data marts (Bill Inmon style), then dimensional modeling plays a lesser role, but still quite important because it is where the largest communities of our business users access the data.


Anne Marie Smith is a highly acclaimed author and speaker in the fields of data stewardship, data governance, data warehousing, data modeling and metadata management. She holds a doctorate in Management Information Systems and has taught at LaSalle University. Smith serves on the board of directors of DAMA International and is an expert advisor to DM Review's Ask the Experts. Smith is the director of education at EWSolutions, a GSA schedule partner and systems integrator dedicated to providing companies and government agencies with best-in-class business intelligence solutions using data warehousing, enterprise architecture and managed metadata environment technologies (www.EWSolutions.com). She may be reached directly via email at AMSmith@EWSolutions.com.

Joe Oates is an internationally known speaker, author and consultant on data warehousing. Oates has more than 30 years of experience in the successful management and technical development of business, real-time and data warehouse applications for industry and government clients. He has designed or helped design and implement more than 30 successful data warehouse projects.

Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.

Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, metadata and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a co-author of the book, Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science and a Masters Degree in Software Engineering from Carroll College. He can be reached at clay.rehm@rehmtech.com.

For more information on related topics, visit the following channels:



Industry Vendors