Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Meta Data and Data Administration:
Implementing Data Quality Through Meta Data, Part 1

  Column published in DMReview.com
April 1, 2000
  By David Marco

A special thanks to Mike Jennings for his invaluable contribution to this month's column.

How are you addressing the single most difficult problem facing data warehouses today? Data quality. When the quality of data is compromised, incorrect interpretation and use of information from your data warehouse can destroy the confidence level of its customers, your users. Once the user's confidence in your warehouse is eroded; it is a question of time before your system will no longer exist.

This data quality quandary often results from system architectures that fail to identify "bad" data before it is loaded into the data warehouse. This missed opportunity leads to a dramatic increase in the time and costs that companies expend to reconcile and audit information in the warehouse. Insertion of technical meta data "tags" directly into the data warehouse's dimensional data model design and the extraction, transformation and loading (ETL) processes corrects this situation by providing a practical means to measure data quality precisely at a table row level of granularity.

This article is the first portion of a two-part series on implementing data quality through meta data. This installment examines the role meta data can have in the data warehouse model and data acquisition designs for information content and quality. Part two of the series will examine the beneficial technical meta data tags that can be incorporated into an architecture to measure data quality and provide flexibility to the system design.

The warehouse developers use technical meta data as a method to build a tighter relationship between the repository and the data warehouse. This is accomplished by incorporating technical meta data directly into the data warehouse design and ETL processes. This technique is used to extend the design and architecture of the data warehouse to provide increased processing optimizations for data acquisitions, maintenance activities and data quality measurement opportunities. These technical meta data tags, unlike information stored in a meta data repository, are referenced at a row level of granularity in the data warehouse. This direct association of meta data to each row of information in the data warehouse is a key distinction of extending meta data into the architecture.

To select operators, each row of data is tagged from the source systems during ETL processing with technical meta data. The meta data tags on each row in the warehouse provide a clearer semantic meaning to the data by placing the information in context with the repository. As an example, consider a client dimension table that derives its information from two operational sources. Client information is extracted in priority order from either a sales force automation application, an enterprise resource planning application (ERP) or both, depending on availability and stability of the data. The absence of technical meta data in the dimension table would require use of the information without consideration of the source system(s) that provided it. Technical meta data tagging allows you to determine the origin (s) of information in dimension tables. Information originating from one or more sources can be easily and quickly determined through the technical meta data tag on that row.

A clear, consistent method of tagging the data originating from the operational systems needs to be developed and agreed upon by both the technical and business users of the data warehouse. Any technical meta data tied to the row must be applicable to the entire row of data, not just the majority of columns in the table.

I like to keep technical data tagging to a minimum in a simple dimensional data model design. Schemas that have only one or two fact tables or the case where a single operational system is source to the warehouse are examples of these basic cases. I prefer to increase its use in very complex schema designs, using multiple fact tables with several conformed dimension tables, or when integration of numerous source systems is required. These more complicated cases make mapping of technical meta data from the source systems more challenging.

The data warehouse team is responsible for reviewing the design, construction and maintenance impact resulting from the use of technical meta data to the repository, ETL processes, data model design, database sizing and front-end data access tools. For example, some relational online analytical processing (ROLAP) tools require a very strict adherence in the design of the data warehouse model in order to function properly or fully. This may preclude the use of some, or all, meta data tags on certain tables such as the time dimension. Certain types of technical meta data can require additional ETL processing times that may interfere with tight processing window schedules. For example, careful consideration to ETL processing times needs to occur in the case where a meta data tag is used in the warehouse to indicate whether production keys, or natural keys, are still active or not in the source system.

The benefits of technical meta data use include source system identification, data quality measurement, improved management of ETL processes and database administration. Use of these technical data tags offers warehouse administrators and business users a means for measuring the content quality of the data in the warehouse. Adding these types of meta data tags into the warehouse model and ETL processes can help you to reconcile data quality issues in your environment. Such reconciliation offers an increase in data integrity benefitting both the warehouse technical and business users by increasing the level of confidence in the quality of the information content.

Michael F. Jennings has more than seventeen years of information technology experience and is the manager for data warehouse infrastructure at Hewitt Associates LLC. He is a regular columnist for the "Real World Decision Support" newsletter, and his talks on decision support can be heard at many of the major data warehousing conferences in the U.S. and abroad.


For more information on related topics visit the following related portals...
Data Quality.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.