Data Warehousing Lessons Learned:
Avoiding Data Warehousing Religious Wars
High-performance data warehousing applications begin with a consistent and unified data model ? a design for performance. Design is on the critical path of high performance data warehousing. Different approaches to designing a data model for a data warehouse have frequently translated into religious wars between the followers of the star schema school and those of the operational data store (ODS) approach. However, in fact, both approaches are valid within context and represent different perspectives on the same domain or problem space. Here is how to reconcile the two different approaches.
One reason the data warehouse was invented was to avoid performance problems when decision-support queries were executed against operational systems. The long-running summaries characteristic of decision support presented logjams to the quick updates needed by the transactional systems. The situation also applies in reverse ? a highly normalized data model design is not optimal for business intelligence that aggregates data across multiple transactional systems, the latter used to run the business on a day-to-day basis. Data warehousing application performance requirements are different in interesting ways from those of classical online transaction processing (OLTP) databases. OLTP applications such as SAP R/3, PeopleSoft Human Resources, Oracle Financial or Siebel operational customer relationship management (CRM) require frequent updates to what are usually highly normalized data structures. In contrast, the requirements for decision support include aggregating and summarizing of large amounts of data in order to get a business intelligence perspective on trends for customer segments and product brands, and are performed across large time horizons. For example, knowing the lifetime value of a customer requires aggregating a lifetime of detailed transactional data. Call it what you will, that mass of detail ? often collected from multiple operational systems ? is a data warehousing function. These differences translate into different design approaches.
The star schema works best in the micro process ? designing a particular data model for summarizing a join of customer, product and geographic data. The ODS works best in the macro process ? integrating, managing and processing large volumes of data that represent the lifeblood of the enterprise ? the information factory. From the ODS perspective, an individual star schema is a functionally dependent data mart ? a local solution (albeit a highly useful one). In contrast, from the perspective of the star schema, the ODS is a staging area in which operational data from inconsistent transactional systems can be reconciled and consolidated. The star schema is a true third-normal form (3NF) data structure, though the points of the star are sometimes denormalized if the customer or product dimensions extend to millions of rows.
Performance is a function of the entire technology stack ? hardware, software, network, database, data model, business design and application, and the architecture that ties them all together. High-performance data warehousing applications begin with a consistent and unified data model ? a design for performance. Developers who think data warehousing performance issues are best left to post-implementation monitoring and tuning are in for a shocking disappointment. High-performance data warehousing begins with a design for performance, including the design of the data model supporting the application. Of course, one can and must apply performance and tuning principles after any significant implementation or upgrade, but that misses the bigger picture. Regardless of whether an enterprise has a centralized or distributed architecture and regardless of the many details of the technology stack, the single most important thing an enterprise can do to promote long-term scalability is design consistent, unified representations of product, customer, geography, calendar and other data dimensions intrinsic to the firm in question.
Avoid religious wars. Apply and use the star schema and the ODS as the micro and the macro approaches, respectively, to data warehousing. Use the star schema when designing a particular subject-area domain, individual aggregate or related data model. Performance in building summaries and aggregates and performing joins is often optimized in this way. Use the ODS when designing a workflow back and forth between operational and decision support systems that must match-merge, consolidate and process data from heterogeneous transactional systems. As data volumes (and related performance challenges) ramp upward, it is often useful to introduce a concept such as the ODS to make manageable the tidal wave of information that is being pumped through the systems. As particular business questions arise about subject-area domains, it is likewise useful to deploy a star schema to represent the same data in different ways to accommodate the different perspectives of marketing, finance, inventory control and your business process of choice.
For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance,
DW Design, Methodology and
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to email@example.com.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Test Drive the Standard in Data Protection|
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?
|Click here to advertise in this space|