|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Meta Data & Knowledge Management:
There is a severe disease that has spread to epidemic proportions throughout our society. This disease is particularly dangerous as its effects are not readily identifiable at the time of infection. However, if this condition goes untreated, it can be debilitating and even terminal. This disease is not hepatitis, but rather "independent" data marts. While this imagery may seem a bit dramatic, unfortunately it reflects the reality in many of today's companies. For example, at EWSolutions, we have a large client that has many multiterabyte data warehouses. We have estimated that they have 75 to 200 independent data marts. The cost to this company for data warehousing is greater than $500 million annually. Sadly, their situation is not unique. If you work for a government agency or a Global 2000 company, it is highly likely that your data warehouse architecture is that of independent data marts.
This column is the first of a three-part series on migrating from independent data marts to an architected data warehouse solution. This installment will address the characteristics of independent data marts, the flaws in their architecture and the reasons they exist. Part two will address specifically how a company can migrate from the independent data mart architecture to an architected data warehouse solution.
Independent data marts are characterized by several traits. First, each data mart is sourced directly from the operational systems without the structure of a data warehouse to supply the architecture necessary to sustain and grow the data marts. Second, these data marts are typically built independent of one another by autonomous teams. Typically, these teams will utilize varying tools, software, hardware, procedures, standards and processes.
Possibly the most visually descriptive trait of a company that has constructed independent data marts is that once they map out a schema of their data warehousing systems, the schema will resemble that of a "spaghetti" chart (see Figure 1).* Disturbingly, a number of companies have expressed that this chart resembles their current data warehousing architecture.
Figure 1: Independent Data Mart Architecture
*It is important to note that this chart is an actual client's data warehousing architecture schematic. I'm proud to say that they are no longer on this architecture.
The architecture in Figure 1 is not an architecture at all. Instead, it is a series of stovepipe data mart systems. This architecture greatly differs from that of an architected data warehouse (see Figure 2).
Figure 2: Architected Data Warehousing System
The purpose of this column is to discuss independent data marts and the process for migrating to an architected solution; however, we will briefly touch on the topic of data warehouse architecture. We will not go into a detailed discussion of top-down versus bottom-up approaches (we will save that topic for a future column), except to say that the "classic" top-down approach is a more scalable and logical approach for constructing a data warehousing system. It is surprising how often the top-down methodology is mistaken for a "galactic" approach. This is a misunderstanding as the top-down approach is best used iteratively and incrementally to build the data warehousing system. When used in this fashion, the cost for building a data warehouse that feeds "dependent" data marts becomes highly comparable to the cost of building independent data marts.
Redundant Data: As the number of independent data marts grows, the amount of redundant data begins to grow uncontrollably across the enterprise. This redundancy occurs because each of the independent data marts requires its own, typically duplicated, copy of the detailed corporate data. Often, a great deal of this detailed data is not really required in the data marts, which typically provide summarized views.
It would be enlightening if a study were conducted to calculate the costs of maintaining unnecessary redundant data for Fortune 1000 companies. The end total would be in the billions of dollars in expenses and lost opportunity. Certainly, it has been my experience working with large government agencies and Global 2000 companies that needless duplicate data is running rampant throughout our industry. As a result, IT budgets are straining under this weight.
Redundant Processing: A data warehouse provides the architecture to centralize the data integration and cleansing activities common to all of the data marts of a company. Without the data warehouse, all of these data integration and cleansing processes need to be duplicated for all of the independent data marts. This greatly increases the number of support staff required to maintain the data warehousing system as these tasks are the largest and most costly data warehousing activities.
Separate teams will typically build each of the independent data marts in isolation. As a result, these teams do not leverage the other's standards, processes, knowledge and lessons learned. This results in a great deal of rework and reanalysis.
These autonomous teams will commonly select differing tools, software and hardware. This forces the enterprise to retain skilled employees to support each of these technologies. In addition, a great deal of financial savings is lost as standardization on these tools doesn't occur. Often, a software, hardware or tool contract can be negotiated to provide considerable discounts for enterprise licenses, which can be phased in. These economies of scale can provide tremendous cost savings to the organization.
Scalability: Independent data marts directly read operational system files and/or tables, which greatly limits the data warehousing system's ability to scale. For example, if a company has five independent data marts, it is likely that each data mart would require customer information. Therefore, there would be five separate extracts pulled from the same customer tables in the operational system of record. Most operational systems have limited batch windows and cannot support this number extracts. With a data warehouse, only one extract is required in the operational system of record.
Non-Integrated: As previously discussed, each independent data mart is built by autonomous teams, typically working for separate departments. As a result, these data marts are not integrated, and none of them contain an enterprise view of the corporation. Therefore, if the CEO asks the IT department to provide him with a "listing of our most profitable customers," each data mart will offer a different answer. Having worked with a company that had experienced this exact situation, I can attest that the CIO is rarely pleased to have to explain why his department cannot answer this seemingly simple question. In this company's case, the CIO and his directors were removed from their positions.
With all of these architectural flaws, it would seem surprising that so many companies have built their data warehousing systems around this architecture. There are several reasons why this aberration has occurred.
Complexity: When the decision support craze spread, most companies were looking to build a data warehouse of their own. Unfortunately, the task of building a well-architected and scalable business intelligence system is complicated and requires sophisticated software, expensive hardware and a highly skilled and experienced team. Finding data warehouse architects and project leaders that truly understand data warehouse architecture is a daunting challenge, both in the corporate and consulting ranks.
In order to construct a data warehouse, a corporation must truly come to terms with their data and the business procedures that the data represents. While this task is challenging, it is a necessary step and one from which the true value of the data warehousing process is derived.
Independent Data Mart Shortcut: The building of independent data marts is less expensive than that of architected data warehousing systems. In addition, independent data marts can be constructed fairly quickly and do not require a company to really understand their data beyond that of individual departments (as a data warehouse requires). These points have been effectively used to sell the concept of constructing independent data marts. Unfortunately, it is this lack of thorough analysis and long-term planning that limits the independent data marts from being an effective business intelligence system.
Inappropriate Vendor Messages: Many vendors (both consulting and software) have developed tools/ methodologies that are effective for building small departmental independent data marts. In their rush to market with these tools, these companies have worked very hard at selling the independent data mart concept (of course, it is never worded as such). The reasons are obvious. These companies can significantly reduce their sales cycles because only one department is involved in the software purchasing decision. In addition, their software requires much less sophistication because they merely need to build a standalone data store.
The second part of this three-part series will take an in-depth look at how to migrate from this flawed architecture. It will present the two approaches for migrating from independent data marts, identify necessary initial corporate decisions, give methods for identifying the migration path to the architected solution and walk through an independent data mart migration case study.
David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40." He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.
|View Full Magazine Issue|
|E-Mail This Column|