DM Review Published in DM Review in May 2003.
Printed from

Meta Data & Knowledge Management: Independent Data Marts: Stranded on Islands of Data, Part 2

by David Marco

Independent data marts have spread like a disease through many of today's best and most advanced corporations. The devastating nature of this disease is that it is not easily detected in its initial stages; however, if it is not treated, the patient's condition will steadily deteriorate. The "cure" is to migrate the independent data marts to a structured data warehousing architecture.

There are two general approaches for migration: big bang and iterative. Figure 1 summarizes the advantages and disadvantages of each approach.

With the big-bang approach, all of the independent data marts will be reengineered simultaneously into a structured data warehousing architecture. There are a couple of advantages to the big-bang approach. First, it can provide the fastest path for migration. Often, companies will need to change their data warehousing architecture as quickly as possible (regardless of risk) because: 1) there is a need to implement additional data warehousing projects that are required to meet government regulations, 2) there are data warehousing projects that promise a high return on investment (ROI), and/or 3) there are currently funds available for the integration effort that might not be available at a later date. Secondly, this approach allows for immediate economies of scale rather than attaining them in the iterative method. The disadvantages of the big-bang approach are that it is labor-intensive and requires tremendous coordination. In addition, the big-bang approach is the more complex of the two to implement and thus provides the highest exposure. Many companies have failed in their attempts at big-bang integrations.

The big-bang approach is best when the independent data mart problem is relatively small and not highly complex.

The iterative approach involves reengineering the independent data marts in manageable phases (one or two data marts at a time). The advantages to the iterative approach are several. First, it allows a government agency or company to manage and reduce the risk. The migration can be accomplished in a phased manner, thereby increasing the probability of success. Secondly, as each project phase is executed, lessons are learned and leveraged for subsequent phases.

The major disadvantage to the iterative approach is that it takes longer to fully complete. This approach is best used when the independent data mart problem is large and too complex to tackle in a big-bang manner. Having conducted both big-bang and iterative independent data mart migrations, I strongly prefer the iterative approach.

Many companies fail in their migration efforts well before they start. The chief reason for this is the lack of initial planning and sponsorship. Attaining executive sponsorship is one of the most important tasks at the onset of the project. This is critical as typically each of the independent data marts has been constructed by autonomous teams in different corporate departments. Therefore, having a project champion that has cross-departmental authority is critical for dealing with the political challenges.

Approach Advantages Disadvantages Approach
Big Bang (waterfall)
  1. Provides the potentially fastest path for migration.
  2. Allows immediate economies of scale.
  1. Labor intensive.
  2. Requires tremendous coordination.
  3. Complex parallel testing.
  4. Very risky.
Best used when the independent data mart problem is not very pervasive or when conformity to government regulations is required.
  1. Dramatically reduces risk.
  2. Lessons learned are leveraged.
  3. Does provide eventual economies of scale.
  1. Potential migration time is elongated.
  2. Multiple development efforts need to be managed and coordinated.
This approach is superior to the waterfall in the vast majority of cases.
Figure 1: Big-Bang vs. Iterative Approach

During the initial planning phases, it is important to plan on implementing a meta data repository that can support future data warehousing development efforts and provide a semantic layer between the business users and the data warehousing system. The data mart migration provides an outstanding opportunity to implement the meta data repository. Before the data mart migration begins, it is best to standardize the data-naming nomenclature for the data warehousing system. Implementing standard data-naming nomenclature will aid in the system's maintenance and provide cleaner and more understandable meta data.

A great deal of research needs to be conducted on the independent data marts before a migration is possible (Figure 2 summarizes these tasks). The most important research activity is to understand the business needs that each independent data mart is meeting. Typically, multiple independent data marts exist to meet the same or similar business needs. These situations are common and do suggest a path for migration. The results of this research will illustrate the independent data marts that will be the most difficult to migrate.

Independent Data Mart Research

Available meta data (both technical and business)
What is the ROI of this data mart?
Amount of data (raw and total) in each data mart
Data refresh/update criteria
Archive criteria
Understand the business users' requirements of the data mart
Identify transformation rules
How is error checking and data cleansing handled?
Identify the key users for the data mart
How is the ETL process constructed (tool vs. custom)?

Figure 2: Independent Data Mart Research

During independent data mart migration is an excellent time to standardize on hardware and software for the data warehousing project (see Figure 3). A company needs to have trained personnel to support each differing software or hardware platform. Therefore, by limiting the redundant software/hardware, the corporation reduces the support strain on their IT staff. In addition, standardizing allows for software and hardware purchasing economies of scale to be achieved.

Hardware/Software Classification

Hardware (UNIX, Mainframe, AS/400)
Hardware Architecture (SMP, MPP)
Desktop Computers
Notebook Computers
Database (Oracle, SQL Server, DB2)
ETL Tool (Ascential Software, Informatica)
Meta Data Integration Tool (Computer Associates Advantage, ASG Viasoft Rochade)
OLAP Access Tool (BusinessObjects, Cognos, MicroStrategy)
Data Quality Tool (Firstlogic, Trillium)
CASE Tool (ERwin, PowerDesigner, etc.)

Figure 3: Hardware/Software Classification

The central covenant of any independent data mart migration effort is: Never deliver less functionality to the business users than they have today. Generally, business users do not react well to spending money on infrastructure because they don't initially see its value. The key business users need to be educated that a bad system's architecture leads to a non-scalable and non-flexible system that will eventually need to be rewritten at a very high cost. Therefore, during migration, the users must be assured that they will not receive less functionality (information, ease of use and response time) than they currently have.

It is necessary to conduct several activities before a migration path will be evident. First, diagram the current data warehousing architecture. This is critical for identifying which legacy systems are feeding which independent data marts and for showing the problems with this architecture (or lack thereof).

Often, independent data marts will be sourced from the same legacy systems. By targeting independent data marts with the same source data, often multiple independent data marts can be removed with minimal extra effort. Identifying redundant data often suggests a migration path.

Figure 4 illustrates existing independent data marts for a company. In the schematic, both the finance and marketing data marts are being sourced from the same legacy systems. This suggests that it might be wise to target both of these data marts for initial migration (assuming the iterative approach is being used).

Figure 4: Identifying Redundant Data Sources
Figure 4: Identifying Redundant Data Sources

It is important to target those independent data marts whose data will most likely be used in future data warehousing efforts. Targeting these data marts first will ease the task of keeping all new data warehousing development activity in the new architected environment.

The next step is to identify those data marts whose transformation rules are known and documented. Understand that even the best-documented transformation rules will have gaps. Moreover, even those marts that have been built using extract, transform and load (ETL) tools have meta data (documentation) gaps. For example, ETL tools many times provide the functionality to call user exits that are hand-coded programs. The processes performed by these user exits will not be captured in the ETL tool's meta data stores. If documentation does not exist for a mart, programmers will need to manually analyze each of the ETL programs' code to extract the transformation rules which is a very time-consuming and expensive activity.

It will be critical to obtain support from the current independent data mart IT teams and business users. Identify those data mart teams most likely to work cooperatively with the centralized data warehousing team. Recognize the strengths and weaknesses of those teams that can and will provide the most aid. If a particular data mart team/business users are not willing to assist with the migration effort, it is best to work around these teams by delaying the migration of their particular data mart. If this is not an option, utilize your executive sponsor to "motivate" this group to provide their support.

Keep in mind that any team will have its stronger and weaker areas of knowledge/skill. As much as possible, keep your team's areas of weakness off of the critical path. Any mission-critical team weaknesses need to be shored up with internal members from the other data mart teams or from outside vendors.

Next month I will present a case study illustrating how a corporation can migrate from independent data marts into an architected solution.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at

Copyright 2005, SourceMedia and DM Review.