DM Review Published in DM Review in August 2004.
Printed from DMReview.com


Meta Data & Knowledge Management: Managed Meta Data Environment: A Complete Walk-Through, Part 5

by David Marco

This column is adapted from the book Universal Meta Data Models by David Marco and Michael Jennings, John Wiley & Sons.

In the last several columns, I presented the first component of a managed meta data environment (MME), the meta data sourcing layer. This installment on the six architectural components of a MME will walk through the second and third major components of an MME: meta data integration layer and meta data repository.

Meta Data Integration Layer

The meta data integration layer takes the various sources of meta data, integrates them and loads it into the meta data repository (see Figure 1). This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In an MME, these steps are combined because, unlike a data warehouse, the volume of meta data is not nearly that of data warehousing data. As a general rule, the MME holds between 5 and 20 gigabytes of meta data; however, as MMEs are looking to target data audit related meta data, storage can grow into the 20-75 gigabyte range. Over the next few years, you will see some MMEs reach the terabyte range.


Figure 1: Meta Data Integration Layer

The specific steps in this process depend on whether you are building a custom process or if you are using a meta data integration tool to assist your effort. If you decide to use a meta data integration tool, the specific tool selection can also greatly impact this process.

Meta Data Repository

A meta data repository is a fancy name for a database designed to gather, retain and disseminate meta data. The meta data repository is responsible for the cataloging and persistent physical storage of the meta data.

The meta data repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store meta data by meta data subject area as opposed to being application-specific. For example, a generic meta model will have an attribute named DATABASE_PHYS_NAME that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute ORACLE_PHYS_NAME. The problem with application-specific meta models is that meta data subject areas change. To return to our example, today Oracle may be our company's database standard. Tomorrow, we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the physical meta model.1

A meta data repository also provides an integrated view of the enterprise's major meta data subject areas. The repository should allow the user to view all entities within the company, not just entities loaded in Oracle or entities that are only in the customer relationship management (CRM) applications.

Third, the meta data repository contains current and future meta data, meaning that the meta data is periodically updated to reflect the current and future technical and business environment. Keep in mind that a meta data repository is constantly being updated - and it needs to be in order to be truly valuable.

Lastly, meta data repositories are historical. A good repository will hold historical views of the meta data, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, assume the business meta data definition of customer is "anyone who has purchased a product from our company in one of our stores or through our catalog." A year later, a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order products. At that point in time, the business meta data definition for customer would be modified to "anyone who has purchased a product from our company in one of our stores, through our mail order catalog or via the Web." A good meta data repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data). Lastly, it is strongly recommended that you implement your meta data repository component on an open, relational database platform, as opposed to a proprietary database engine.

Reference:

1. See Chapters 4 - 8 of Universal Meta Data Models (David Marco & Michael Jennings, Wiley 2004) for various physical meta models.


David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Copyright 2005, SourceMedia and DM Review.