Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Meta Data & Knowledge Management:
Independent Data Marts: Stranded on Islands of Data, Part 1

  Column published in DM Review Magazine
April 2003 Issue
 
  By David Marco

There is a severe disease that has spread to epidemic proportions throughout our society. This disease is particularly dangerous as its effects are not readily identifiable at the time of infection. However, if this condition goes untreated, it can be debilitating and even terminal. This disease is not hepatitis, but rather "independent" data marts. While this imagery may seem a bit dramatic, unfortunately it reflects the reality in many of today's companies. For example, at EWSolutions, we have a large client that has many multiterabyte data warehouses. We have estimated that they have 75 to 200 independent data marts. The cost to this company for data warehousing is greater than $500 million annually. Sadly, their situation is not unique. If you work for a government agency or a Global 2000 company, it is highly likely that your data warehouse architecture is that of independent data marts.

This column is the first of a three-part series on migrating from independent data marts to an architected data warehouse solution. This installment will address the characteristics of independent data marts, the flaws in their architecture and the reasons they exist. Part two will address specifically how a company can migrate from the independent data mart architecture to an architected data warehouse solution.

Independent data marts are characterized by several traits. First, each data mart is sourced directly from the operational systems without the structure of a data warehouse to supply the architecture necessary to sustain and grow the data marts. Second, these data marts are typically built independent of one another by autonomous teams. Typically, these teams will utilize varying tools, software, hardware, procedures, standards and processes.

Possibly the most visually descriptive trait of a company that has constructed independent data marts is that once they map out a schema of their data warehousing systems, the schema will resemble that of a "spaghetti" chart (see Figure 1).* Disturbingly, a number of companies have expressed that this chart resembles their current data warehousing architecture.


Figure 1: Independent Data Mart Architecture
*It is important to note that this chart is an actual client's data warehousing architecture schematic. I'm proud to say that they are no longer on this architecture.

The architecture in Figure 1 is not an architecture at all. Instead, it is a series of stovepipe data mart systems. This architecture greatly differs from that of an architected data warehouse (see Figure 2).


Figure 2: Architected Data Warehousing System

The purpose of this column is to discuss independent data marts and the process for migrating to an architected solution; however, we will briefly touch on the topic of data warehouse architecture. We will not go into a detailed discussion of top-down versus bottom-up approaches (we will save that topic for a future column), except to say that the "classic" top-down approach is a more scalable and logical approach for constructing a data warehousing system. It is surprising how often the top-down methodology is mistaken for a "galactic" approach. This is a misunderstanding as the top-down approach is best used iteratively and incrementally to build the data warehousing system. When used in this fashion, the cost for building a data warehouse that feeds "dependent" data marts becomes highly comparable to the cost of building independent data marts.

Problems With Independent Data Marts

Redundant Data: As the number of independent data marts grows, the amount of redundant data begins to grow uncontrollably across the enterprise. This redundancy occurs because each of the independent data marts requires its own, typically duplicated, copy of the detailed corporate data. Often, a great deal of this detailed data is not really required in the data marts, which typically provide summarized views.

It would be enlightening if a study were conducted to calculate the costs of maintaining unnecessary redundant data for Fortune 1000 companies. The end total would be in the billions of dollars in expenses and lost opportunity. Certainly, it has been my experience working with large government agencies and Global 2000 companies that needless duplicate data is running rampant throughout our industry. As a result, IT budgets are straining under this weight.

Redundant Processing: A data warehouse provides the architecture to centralize the data integration and cleansing activities common to all of the data marts of a company. Without the data warehouse, all of these data integration and cleansing processes need to be duplicated for all of the independent data marts. This greatly increases the number of support staff required to maintain the data warehousing system as these tasks are the largest and most costly data warehousing activities.

Separate teams will typically build each of the independent data marts in isolation. As a result, these teams do not leverage the other's standards, processes, knowledge and lessons learned. This results in a great deal of rework and reanalysis.

These autonomous teams will commonly select differing tools, software and hardware. This forces the enterprise to retain skilled employees to support each of these technologies. In addition, a great deal of financial savings is lost as standardization on these tools doesn't occur. Often, a software, hardware or tool contract can be negotiated to provide considerable discounts for enterprise licenses, which can be phased in. These economies of scale can provide tremendous cost savings to the organization.

Scalability: Independent data marts directly read operational system files and/or tables, which greatly limits the data warehousing system's ability to scale. For example, if a company has five independent data marts, it is likely that each data mart would require customer information. Therefore, there would be five separate extracts pulled from the same customer tables in the operational system of record. Most operational systems have limited batch windows and cannot support this number extracts. With a data warehouse, only one extract is required in the operational system of record.

Non-Integrated: As previously discussed, each independent data mart is built by autonomous teams, typically working for separate departments. As a result, these data marts are not integrated, and none of them contain an enterprise view of the corporation. Therefore, if the CEO asks the IT department to provide him with a "listing of our most profitable customers," each data mart will offer a different answer. Having worked with a company that had experienced this exact situation, I can attest that the CIO is rarely pleased to have to explain why his department cannot answer this seemingly simple question. In this company's case, the CIO and his directors were removed from their positions.

Why Do Independent Data Marts Exist?

With all of these architectural flaws, it would seem surprising that so many companies have built their data warehousing systems around this architecture. There are several reasons why this aberration has occurred.

Complexity: When the decision support craze spread, most companies were looking to build a data warehouse of their own. Unfortunately, the task of building a well-architected and scalable business intelligence system is complicated and requires sophisticated software, expensive hardware and a highly skilled and experienced team. Finding data warehouse architects and project leaders that truly understand data warehouse architecture is a daunting challenge, both in the corporate and consulting ranks.

In order to construct a data warehouse, a corporation must truly come to terms with their data and the business procedures that the data represents. While this task is challenging, it is a necessary step and one from which the true value of the data warehousing process is derived.

Independent Data Mart Shortcut: The building of independent data marts is less expensive than that of architected data warehousing systems. In addition, independent data marts can be constructed fairly quickly and do not require a company to really understand their data beyond that of individual departments (as a data warehouse requires). These points have been effectively used to sell the concept of constructing independent data marts. Unfortunately, it is this lack of thorough analysis and long-term planning that limits the independent data marts from being an effective business intelligence system.

Inappropriate Vendor Messages: Many vendors (both consulting and software) have developed tools/ methodologies that are effective for building small departmental independent data marts. In their rush to market with these tools, these companies have worked very hard at selling the independent data mart concept (of course, it is never worded as such). The reasons are obvious. These companies can significantly reduce their sales cycles because only one department is involved in the software purchasing decision. In addition, their software requires much less sophistication because they merely need to build a standalone data store.

The second part of this three-part series will take an in-depth look at how to migrate from this flawed architecture. It will present the two approaches for migrating from independent data marts, identify necessary initial corporate decisions, give methods for identifying the migration path to the architected solution and walk through an independent data mart migration case study.

...............................................................................

For more information on related topics visit the following related portals...
Data Marts.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Backup SQL Server or Exchange Continuously
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.