Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Meta Data Integration: Fitting Square Pegs in Round Holes

  Article published in DM Review Magazine
September 1998 Issue
 
  By David Marco

Meta data integration is one of the least understood and most misrepresented topics in the entire decision support arena. It's astounding the number of times we're told that a tool seamlessly integrates every source of meta data.


This article will present a "real-world" example of meta data integration and the challenges of today's meta data integration tools market. Before we begin reviewing the strengths and limitations of these meta data integration tools, keep in mind that it is much easier purchasing such a tool and working around its limitations, as opposed to manually building programs to populate and support the meta data repository.

A State of the Market

One of the chief reasons why these tools are not cleanly integrated is that there isn't a globally adopted meta model standard (the meta model refers to the physical data model of the repository). Once an appropriate meta model standard is defined and adopted, the data warehousing products from different vendors will be able to share information much more easily.

The Microsoft Repository provides the best hope to date for defining such an industry-wide standard for the meta model. While Microsoft definitely has the clout to bring the industry around to a standard meta model, industry analysts are still taking a "wait-and-see" attitude. Assuming the Microsoft Repository becomes the standard, it will take two to three years before their meta model is mature enough for most organizations and tool vendors to integrate it into their software.

Meta Data Integration Tools

There is a dizzying array of meta data repository tools, and most of them claim to seamlessly integrate all meta data sources into one architected repository. However, reality offers a different message. It is true that most of these tools do a pretty good job of integrating formal meta data sources (i.e., CASE tools, extraction/transformation tools and other repositories) which they tend to certify. On the other hand, a majority of the meta models being offered do not provide an adequate foundation for business meta data and lack an overall vision for the complete data administration process. It is critical that the meta model that becomes adopted as a standard is as strong on the business side of the meta data equation as it is on the technical side. Unfortunately, most meta models are inadequate in this respect. Some of the limitations include the entities and attributes to support business meta data for user access patterns, frequency of use, transformation rules in business terms, DSS (decision support system) report business definitions and source system business definitions. On the technical meta data side, many of these models do not support warehouse balancing information and data quality metrics for each of the data warehouse/data mart load runs.

As meta data requirements are defined, the data administration team will likely have to extend the meta model to accommodate those needs. Therefore, a prerequisite for any tool must be to have a fully extendible meta model. Most tools do come with their own meta model and provide the capability to extend the model as needed. However, the task for extending the meta model is not trivial. It is equivalent to modifying a data model in a decision support system. As a result, it is typical, if not a prerequisite, to have a full-time data modeler on the data administration team.

Few of the repository tools provide a Web-enabled access layer that would satisfy a business user. Typically the access piece needs to be provided by a "true" reporting tool (i.e., MicroStrategy, Cognos or Business Objects). Whichever access tool is selected, it needs to be fully Web-enabled, capture user access patterns and frequency of report and table use. This meta data would then be extracted and stored directly into the repository.

Meta Data SourceIntegration

Each of the repository tools integrates meta data through differing interfaces. However, three broad categories can be defined to classify the manner in which these interfaces integrate different sources of meta data. Each of these categories requires varying levels of integration complexity and meta model changes.

Certified Sources

Certified sources of meta data are those sources that the tool can directly read, properly interpret the information and load it into the correct attributes of the meta model. These sources are easily integrated and do not require an extension to the base meta model. In addition, because the tool is designed to accept these sources they do not require additional programming or analysis. Common examples of certifiable meta data sources include technical meta data from CASE tools and transformation rules from extraction/transformation engines. Normally a repository tool is certified for several vendor tools in each of these categories.

Generic Sources

Most tools allow for one or more generic meta data sources. Generic meta data sources are those that are in a common format (i.e., CDIF, tab delimited, space delimited, comma delimited) that the tool can read. The challenge with these sources is that while the tool can easily read the source, programming is still needed to map the source elements to the correct attributes in the meta model. As a result, it is important that the repository tool has an interface that can be easily changed to map these sources. In addition, it is somewhat common for these sources to require extensions to the meta model. The process for extending the model can range from a simple change (if all that is needed is to add an additional attribute to an existing table) or it can be quite complex (if new tables are required and additional foreign keys are needed for other tables to reference them). If the repository is built on an object database, it can greatly simplify the task of administrating changes to the model. Common examples of generic sources are those technical and business meta data sources in databases and spreadsheets which are easily extracted into these formats.

Non-Supported Sources

Some sources of meta data are not certified or in a generic format. These non-supported sources might require sophisticated analysis for design and programming. These sources contain all of the challenges of generic sources with the potential added challenge of a complicated programming step that would be needed to transform the non-supported source into a generic source. Non-supported sources are common for the informal business meta data sources and the meta data stored in vendor applications.

Meta Data Integration Architecture

Before meta data integration can begin, it is critical that a meta data readiness assessment be conducted and that the meta data requirements and their meta data sources have been clearly identified and documented.

Figure 1 illustrates an actual integration strategy. This organization purchased a vendor tool to integrate their various sources of meta data (listed in Figure 2). The process for integrating all of these sources presented a challenge. After determining that the repository tool was certified with the CASE tool and the integration tool, the next step was integrating the data dictionary. The data dictionary was located in a third-party application in a proprietary database format. As a result, its format was not supported by the integration tool. The answer to this problem was to design and write two complex programs to extract and manipulate the data dictionary into a format (comma delimited) that could be integrated by the repository tool. This process took one fully dedicated programmer one month to accomplish. The last source of meta data was the data steward's spreadsheet. This source was housed in MS Excel and could be extracted in space delimited format. An OLAP tool was used to access the information in the meta data repository. This tool captured user access patterns and frequency of use which the data administration staff used to guide them on future repository development phases.

Meta Data Sources Meta Data Description Type Model Extension
CASE Tool Physical & logical models, domain values, technical entity definitions and technical attribute definitions Certified No
Extraction/Transformation Technical transformation rules Certified No
Custom Data Dictionary Business attribute and entity definitions Non-supported No
MS Exel Data steward's list Generic Yes
Reporting Tool Access patterns and frequency of use Generic Yes

Figure 2: Meta Data Sources

It is critical for the data administration team to reside at the enterprise level in the organization. This will allow the team to set standards for the capture of the different meta data sources that need to be integrated into the repository across all of an organization's DSS projects. This is important because most Fortune 1000 companies have multiple DSS projects. If these standards are not in place, the task of integrating each of the meta data sources becomes exceedingly more difficult. These standards should be kept simple and be easily communicated and understandable. The standards cannot become a bottleneck for the other DSS teams or they will not be followed.

A meta data repository built with the business users in mind and created on a technologically sound architecture lifts the data warehouse from a stovepipe application to a true business intelligence system. Even with the immature state of the meta data repository marketplace, the alternative of not building a repository will not satisfy the needs of the business users or the data warehouse staff that will need to maintain the DSS system over time. This issue of meta data integration is one of the chief mitigating factors that has prevented most organizations from achieving successful data warehouse and data mart implementations.

...............................................................................

For more information on related topics visit the following related portals...
Business Intelligence (BI).

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Backup SQL Server or Exchange Continuously
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.