Data Warehousing Lessons Learned:
Data Warehouse Size Depends on the Size of the Business Problem
The primary determinant of the size and shape of the data warehouse is the size and shape of the business problem, not the company building the system. If ever a question was squarely in the "it depends" area, this is it. Still, it is possible to make some generalizations concerning the functional dependencies on which "it depends." The dependencies include:
- The business problem being addressed and solved.
- The experience and maturity of the industry as to the use of business intelligence for decision support and competitive advantage.
- The costs, risk and benefits of designing, implementing and operating a data warehouse with direct reference to the particular operational environment at the hosting enterprise in question.
Let's look at some examples by way of illustration. Think of three different enterprises with $3 billion in revenue. One is a telecommunications company with 3 million customers to whom the firm wants to cross-sell and up-sell additional telecom products and services. The second is a consumer packaged goods (CPG) firm that wants to improve logistics and distribution, and reduce inventory by means of superior demand planning. The third is a high technology manufacturer (not necessarily in information technology) that wants to improve quality and manage relations with suppliers, users of its products and regulatory oversight agencies. The first will be the largest because of the amount of detailed transactional data to be aggregated. The second will be less large in terms of data, but perhaps more computationally intense and complex in terms of the calculations that need to occur to generate a multiplicity of forecasts by product. Therefore, less cost will be consumed by disk, but more will be consumed in process design. The third will probably entail even less data but will be extremely challenging in terms of how the data is to be captured, represented and evaluated as to the key performance indicators (KPIs). While the data and schema integration challenges are significant in all three cases, the third case presents special challenges requiring considerable domain expertise concerning the manufacturing processes in question. After a year of operation, the telecommunication warehouse will be 1 terabyte or more of detailed transactional data, the CPG warehouse will be a couple of hundred gigabytes of shipment and forecasting data (depending on the number of products), and the manufacturing warehouse may be 50 to 100 gigabytes.
Second, vertical industry dynamics influence the business problem being addressed by the data warehouse and, in turn, influence the size and shape of the data warehouse. If an industry makes extensive use of data warehousing -? think of retail or financial services ?- an enterprise in that vertical which does not build a data warehouse (or find a way of getting the same answers elsewhere) is at risk of incurring information asymmetries that put it at a competitive disadvantage.
The size and shape of the data warehouse in a given enterprise is a function of the experience and maturity of the industry as to the use of business intelligence for decision support and competitive advantage. Early adopters of data warehousing include market research firms such as A.C. Nielsen and marketing-driven enterprises in all aspects of retail and consumer goods. The basic question of data warehousing is: What customers are buying and using what products or services and when and where (channel) are they doing so?
Retailers have many questions of this form, and those that operate a data warehouse also have the answers. That is now largely the case in customer-facing industries. CPG and manufacturers have exploited data warehousing to reduce inventory and optimize distribution through superior demand planning. Transportation (airlines) and hostelry have exploited data warehousing to build loyalty through frequent flyer and related programs. Banking and financial services exploit data warehousing once they overcome the challenge of finding the customer behind all the various accounts in which he or she is hidden. Insurance presents a mixed bag with property and casualty being committed users of the technology, but the healthcare industry is ambivalent about the entire concept. The pharmaceutical domain and related suppliers of medical devices do indeed have sophisticated, complex and substantial data requirements, but they have been relatively late adopters of data warehousing (as have public sector firms and education). The reasons for this are many, but relate closely to the power of suppliers and the amount of regulatory oversight. In the latter sectors, data warehousing exists in a form limited to good, solid data management practices -? without particular reference to infrastructure for business intelligence ?- and "data management" is often what these clients mean when they use the term "data warehouse." Data warehousing is also sometimes confused with data archiving ?- but that is a different story for another month.
For more information on related topics visit the following related portals...
DW Design, Methodology and
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to firstname.lastname@example.org.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Test Drive the Standard in Data Protection|
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?
|Click here to advertise in this space|