Data Warehousing Lessons Learned:
Don't Let This Happen to Your Data Warehouse
During the summer and autumn of 2001, Giga Information Group asked data warehousing people at financial service institutions, insurance companies, retailers, manufacturers, transportation companies, and organizations in the private sector and education to let Giga take a look at the weaknesses in data warehousing solutions that they had implemented. Giga also asked what they would do differently if they had the chance. The result is a revealing, sometimes painful and often humorous look at the uses and abuses of data warehousing in the real world.
It would be an exaggeration to say that in every instance Giga has an easy solution. However, in every instance, there is a lesson learned and a best practice that can be inferred. In every case, it is important for firms building or operating data warehouses to understand what can go wrong. This enables action to be taken to reduce the risk of surprises such as schedule slips, disappointed customers or budget overruns due to technology miscalculations. This, in turn, enables implementation of state-of- the-art data warehouses that support high-impact business applications in CRM, supply chain management and business intelligence.
One data warehouse implementer reports: The major weakness of our data warehouse is its lack of data. This organization has many different kinds of data, but the data warehouse focuses only on customer billing information. There is no order, delivery, warranty or detailed line-item data. The main reason for not having this data, according to the client, is lack of justification for expending the IT effort to access the data stores.
A data warehouse is no better than its access to upstream transactional systems. The situation described by this client vignette occurs when the data warehouse lacks access to the diverse operational systems that hold the transactions for order completion, delivery and detailed line items. The client suffers from the latter. There also is evidence of organizational obstacle ? possibly territorial behavior on the part of some operational systems whereby access to the required data is denied. The direct approach is to analyze the business problem which, in this case, highlights the additional data required to get a more complete view of the customer. Next, design and implement dimensions that capture and forward the information from the operational systems to the data warehouse. Finally, use that information to address the business issue (not explicitly stated here) such as selling additional products to those whose warranty data indicates the product is at the end of its useful life. Lesson learned: A basic architectural principle can usefully be employed here ? the data without the application is meaningless, and the application without the data is empty.
In a related but slightly different problem, a respondent complains: The data warehouse contains the "raw" data ? mostly extracted from operational systems. We had very limited design and little effort was spent on rolling up or summarizing the data. It is very complicated to collect all the needed information ? and to integrate and aggregate it. As a result, every new source takes a lot of effort, which has prevented maximum use and business benefit from the warehouse. Along the same lines: The data warehouse was built, not designed. We keep way too much detail and do not summarize enough data into prebuilt tables. Too many of our queries are ad hoc and take long times to run because of the detail level in the tables. The folktale of the three little pigs and the wolf comes to mind. All three of the pigs built their houses ? only one designed his.
This is a slightly different issue, but related to the first. In the first example, the data warehouse lacked data. In the second, the data warehouse lacks information. Unlike the first example, there is plenty of data ? but not enough information. The irony is that not enough effort was expended at the front end to design aggregates; therefore, extra effort must be incurred at the back end to collect, integrate and aggregate all the information needed. The good news is that if the detailed data is available, building aggregates should be relatively easy. Indeed, knowing that 20 percent of the data gets used 80 percent of the time is often not obvious until an installation gets some experience under its belt. Although this is not a discussion about tools, getting technology to monitor database usage can be helpful at this point. Aggregates are one of the chief performance enhancers for any data warehouse; therefore, justifying their creation and maintenance is relatively easy to do. Adding business views and summaries (e.g., by time period, business unit, product family, etc.) should be a priority. Also, the introduction and use of data mining technology/tools to discover unforeseen relationships and correlative factors would greatly enhance the perception of value, use of the warehouse and provide better insight into the business. Lesson learned: Design the data warehouse; don't just build it.
For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance,
DW Design, Methodology and
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to firstname.lastname@example.org.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Test Drive the Standard in Data Protection|
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?
|Click here to advertise in this space|