Data Warehousing Lessons Learned:
The Data Warehouse Was Built, Not Designed!
In a last month's column, I reported of a client complaining to Giga that their data warehouse lacked data. It could not access the upstream sources of operational, enterprise data. This month, a client reports a subtly different but intimately related compliant: " ...(in our installation) the data warehouse contains the 'raw' data - mostly extracted from operational systems. There was very limited design; little effort was spent on rolling up, or summarizing, the data. It is very complicated to collect all the needed information and to integrate and aggregate it. As a result, every new source takes a lot of effort which has prevented maximum use and business benefit from the warehouse." The client continues, offering a perceptive diagnosis of their predicament: "The data warehouse was built, not designed. We keep way too much detail and do not summarize enough data into prebuilt tables. Too many of our queries are ad hoc and take a long time to run because of detail level in the tables."
The folktale of the three little pigs and the wolf comes to mind. Only one of the pigs designed his house. When trouble arose - which in this analogy might be all users simultaneously clicking on the submit icon for a complex ad hoc query - a designed solution was able to handle the stressful situation that was not necessarily envisioned in advance. Giga's assessment: The data warehouse discussed in my January column lacked data; this one lacks information. Unlike the first example, there is plenty of data - but not enough information. The irony is that not enough effort was expended at the front end to design aggregates so extra effort must be incurred at the back end to collect, integrate and aggregate all the information needed. The good news is that if the detailed data is available, designing and building consistent and unified representations of the customer, product and essential data dimensions is feasible. Some rework will be needed - but the expended effort is not a total loss. In addition, given the detailed data, building aggregates ought to be relatively easy. Indeed, the 20 percent of the data that gets used 80 percent of the time is often not obvious until an installation gets some experience under its belt. Although this is not a discussion about tools, getting technology to monitor database usage can be helpful at this point. Aggregates are one of the chief performance enhancers for any data warehouse; therefore, justifying their creation and maintenance is relatively easy. Once the same business analyst has submitted an ad hoc query three times, it is not really ad hoc anymore. It is predictable and definable in advance, even if parameters such as customer number vary from day to day. Such a query should be captured, tuned, optimized and made part of a regularly scheduled process. In any case, adding business views and summaries (e.g., by time period, business unit, product family, etc.) should be a priority. This too is a part of the design effort.
Look at the big picture from the beginning, even if only a part of the big picture is being implemented. In other words, if a consistent unified representation of customers, products, channels and other data dimensions were available, it would not matter whether there were one or many data stores because they would be able to interoperate. They all use the same definitions of the basic data structures. However, if each data mart uses a different definition of customer, comparisons across the marts are rendered much more complex or even impossible. They are silos of data that cannot communicate. Many firms have struggled as they try to decide whether to build data marts or a data warehouse. Absent a consistent design, even detailed data is meaningless. If it were an urban landscape, the result would be a trailer park instead of a subdivision. Through a painful process of trial and error, such as that described here, many practitioners and analysts, including those at Giga, arrived at the method of designing the data dimensions at the enterprise level, but iteratively implementing the fact structures according to line-of- business processes at the data mart level. It is possible some rework will have to occur and one or more of the data marts may have to be discarded as a prototype that does not conform to the canonical design. Because the detailed data is available, it will be possible to design and rebuild the key dimensions and the aggregates dependent on them, though a penalty is implied in terms of extra effort (and cost). Without such effort, building and operating data warehousing systems is at risk of surprises such as schedule slips, disappointed customers, unsatisfied business requirements, solving the wrong problem and budget overruns. In contrast, a successful data warehouse provides infrastructure on the critical path to high-impact business applications in CRM, supply chain management, demand planning, forecasting and business intelligence. Lesson learned: Don't just build a data warehouse, design it.
For more information on related topics visit the following related portals...
DW Design, Methodology.
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to firstname.lastname@example.org.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Test Drive the Standard in Data Protection|
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?
|Click here to advertise in this space|