DM Review Published in DM Review in March 1998.
Printed from

Managing Meta Data

by David Marco

Summary: Since constructing a meta data repository is critical for accessing and maintaining a DSS, a full life cycle approach to defining requirements, creating, controlling and accessing a meta data repositroy is provided.

Everyone agrees that constructing a meta data repository is critical for accessing and maintaining a decision support system (DSS). Without meta data, the warehouse becomes a modern day version of a "stovepipe application." Since we all understand this concept, why isn't anyone doing it? Rarely do companies successfully implement an active meta data repository that their users can employ to access the reports, queries and data located in the DSS system. In fact, most companies' meta data repository can be accurately described as a disaster. It is common to see a non-architected, non-accessible repository containing a small, inaccurate subset of the actual meta data that the corporation really needs.

What I would like to accomplish in this article is to provide a complete, full life cycle approach to defining requirements, creating, controlling and accessing a meta data repository.

The concept of meta data is not new; however, meta data's role and its importance in the decision support environment is certainly new. First, the differences between business and technical meta data and its uses will be discussed. Second, a generic meta data integration strategy to manage all meta data associated with a data warehouse will be presented. This strategy addresses the proliferation of meta data across multiple platforms and defines an approach to migrate meta data to the repository. Lastly, a proven approach to accessing the meta data will be provided.

Technical vs. Business Meta Data

There are two types of meta data that the repository will contain--technical and business. Technical meta data provides the developers and technical users of the decision support system the confidence that the data in the data warehouse is accurate. (See Figure 1 for examples.) In addition, technical meta data is absolutely critical for the ongoing maintenance and growth of the warehouse. Without technical meta data, the task of analyzing and implementing changes to a decision support system is significantly more difficult and time consuming.

Examples of Technical Meta Data
  • User report and query access patterns, frequency and execution time
  • Audit controls and balancing information
  • The system of record feeding the data warehouse
  • Mappings and transformation from the system of record to the data warehouse
  • The data model, both physical and logical
  • Data warehouse table names, keys and indexes
  • Data warehouse table structures and table attribution
  • Job dependencies
  • Program names and descriptions
  • Security
  • Purge Criteria

Figure 1: Examples of Technical Meta Data

The business meta data is the link between the data warehouse and the business users. Business meta data provides these users with a road map for access to the data in the data warehouse and its data marts. The business users are primarily executives or business analysts and tend to be less technical; therefore, they need to have the DSS system defined for them in business terms. The business meta data presents in business terms what reports, queries and data are in the data warehouse, location of the data, reliability of the data, context of the data, what transformation rules were applied and from which legacy systems the data was sourced. (See Figure 2 for examples of business meta data.)

Examples of Business Meta Data
  • The structure of data as known to the business analyst
  • Common access routines for the data in the warehouse/mart
  • Table names and their business definitions
  • Attribute names and their business definitions
  • Data warehouse field mappings, trnasformations and summarization
  • Rules for drill down, drill up, drill across and drill through
  • Data warehouse refresh dates

Figure 2: Examples of Business Meta Data

Meta Data Consumers

Meta data users can be broadly placed into the categories of business users and technical users. (See Figure 3.) Both of these groups contain a wide variety of users of the data warehouse meta data. They all need meta data to identify and effectively use the information in the DSS system.

Figure 3: Meta Data Users

Business Users

The majority of business users are not very technical. These users get their information from the DSS system's predefined queries and reports. The meta data needed by these users primarily concerns business names and definitions of the warehouse tables and columns and descriptions of existing reports. This allows the business users to identify and locate the information they need in the data warehouse. In addition, these users are interested in receiving estimates of query and report execution times.

Technical Users

This classification contains many different users--data acquisition developers, data access developers, data modelers, senior analysts and the data administration staff. The systems staff who build, maintain and administer the data warehouse have a different need for meta data than do the business users. They need to understand how the programs that extract, transform and load the data into the data warehouse and data marts work. They need to determine which programs, files and tables are affected by changes to the warehouse. The technical meta data allows the data warehouse staff to more efficiently and accurately plan for additional development releases of the DSS system. The data administrator needs to know the data warehouse users' patterns of access, frequency and report/query execution time. This meta data is most commonly captured by the vendor access tool (i.e., Brio, Business Objects, MicroStrategy, Cognos and Seagate) being employed to acquire the information. The meta data repository supplies the vital information that allows the data warehouse staff to effectively maintain and grow the DSS system over time.

Meta Data Integration Architecture

It would be wonderful to be able to say that there are tools available today to seamlessly integrate all of your company's sources of meta data into one integrated and architected repository; however, that utopia just doesn't exist. Today's meta data integration architecture does not involve seamlessly integrated products, but rather a series of proprietary vendor products attached to one another with chewing gum, string and paper clips. Now before you give up hope, the new Microsoft Repository extensions offer the promise to remedy this situation. The Microsoft Repository is a broadly supported set of repository extensions that will enable data warehousing products from different vendors to share information. The extensions are the result of extensive collaboration with the various data warehousing industry leaders. These new repository extensions should help expand the data warehousing market by providing an open, common infrastructure across all data warehouse tool vendors. This issue of meta data integration is one of the chief mitigating factors that has prevented most organizations from achieving successful data warehouse and data mart implementations.

Until the vendors adopt and implement these standards, an involved integration strategy will be needed to bring these different sources of meta data together. Figure 4 illustrates an actual integration strategy I implemented at one of my client sites. This client purchased a vendor tool to integrate their various sources of meta data.

Figure 4: Sample Meta Data Architecture

As you can see, this client has quite an array of meta data sources; however, the process for integrating all of these sources left us even more light headed. On the positive side, we made sure that the repository tool was certified with the CASE tool and the integration tool that we used. On the other hand, integrating the data dictionary was not quite as easy. The data dictionary was located in a third-party application in a proprietary database format. The answer to this problem was to design and write a complex program to manipulate the data dictionary into a format that could be integrated by the repository tool. This process took one fully dedicated programmer one month to accomplish. The last source of meta data was the data steward's spreadsheet. This spreadsheet did not exist; therefore, we had to manually create it. On a final note, we used an OLAP tool to access the information in the meta data repository. This tool captured access patterns and frequency which the data warehouse staff used to guide them on future data warehouse development phases.

Types of Meta Data Sources

There are two broad types of meta data sources--formal and informal. These sources comprise the business and technical meta data for an organization.

Formal meta data sources are those sources of meta data that have been discussed, documented and agreed upon by the decision-makers of the enterprise. Formal meta data is commonly stored in tools or documents that are maintained, distributed and recognized throughout the organization. These formal meta data sources populate both technical and business meta data. (See Figure 5 for examples of formal meta data sources.)

Examples of Formal Meta Data Sources
Types of Formal Meta Data Common Locations
Extraction/Transformation Tool Prism Warehouse Executive, WMARK, DataStage, D2K Tapestry, ETI*Extract, Apertus Carleton
Logical & Physical Data Models CASE Tools (Cayenne, PowerDesigner, ERwin) and Vendor Applications (SAP, Peoplesoft)
Business Policies MS Word, MS Excel
Data Dictionary CASE Tool, MS Acess, MS Excel

Figure 5:Examples of Formal Meta Data Sources

Informal meta data consists of corporate knowledge, policies and guidelines that are not in a standard form. This is the information that people "just know." This type of information is located in the "company consciousness" or it could be on a note on a key employee's desk. It is not formally documented or agreed upon; however, this knowledge is every bit as valuable as that in the formal meta data sources. Often informal meta data provides some of the most valuable information since it tends to be business related. It is important to note that often much of the business meta data is informal. As a result, it is critical that this meta data is captured, documented, formalized and reflected in the data warehouse. By doing this you are taking an informal source of meta data and transforming it into a formal source. Since every organization differs, it is difficult to say where your informal sources of meta data are; however, following is a list of the most common types of informal meta data:

  • Data Stewardship
  • Business Rules
  • Business Definitions
  • Competitor Product Lists
  • Some Transformations and Summarizations

In Figure 4 the data steward's spreadsheet did not exist within this company; however, most of the employees knew who was responsible for the legacy systems data. So we gathered all of the key participants in a room and formalized this list. This is a typical example of a previous informal source of meta data becoming a formal source.

Meta Data for the Masses

Figure 6: DSS Web Architecture

Throwing an OLAP or reporting tool onto a desktop and distributing it to thousands of users does not necessarily deliver the benefits of business intelligence to the ever-increasing end-user community. The new trend is to move reporting and OLAP functions to the Web within an Internet or, more commonly, an intranet structure. (See Figure 6.) By moving these functions from the desktop to the Web, it changes the architecture from "client centric" to "server centric." This allows the business users to use their familiar Web browsers to access the meta data to navigate them through the warehouse and its data marts. This n-tier architecture provides for thinner clients and logic distributed among multiple communicating servers. A Web client (browser) can access the Web server to go through the CGI gateway or other Internet service to act as a middle tier. It's important to note that the Web-enabled OLAP tools are not quite as mature as their desktop brothers; however, most of the vendors are working at bringing their OLAP tool suites up to speed as quickly as possible.Web access comes in two broad flavors: static and dynamic. Static Web access uses a standard Web browser (Netscape or Microsoft Explorer) to generate static pages of HTML on an event-driven or time-driven basis. This method of access is very reliable, cost effective and easy to implement. Dynamic Web access references the OLAP and ad hoc query forms of business-user access. These access types require longer response times and a more sophisticated toolset which is still maturing in the marketplace.

Data warehouse and data mart projects need to have a meta data repository as part of their core objectives from the very onset of the project. This repository needs to be built with the business users in mind and created on a technologically sound architecture. In addition, the meta data needs to have a delivery vehicle that the non-technical business users can employ for access and navigation into the repository. Even with the immature state of the meta data marketplace, the alternative of not building a repository will not satisfy the needs of the business users or the data warehouse staff that will need to maintain the DSS system over time. The meta data repository significantly aids the DSS system in making the information in the warehouse and data marts more visible, understandable and accessible to the masses. In short, the repository can make or break the data warehouse.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at

Copyright 2005, SourceMedia and DM Review.