Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Meta Data and Data Administration:
Implementing Data Quality Through Meta Data, Part 2

  Column published in DM Review Magazine
May 2000 Issue
  By David Marco

A special thanks to Mike Jennings for his invaluable contribution to this month's column.

This article is the concluding portion of a two-part series on implementing data quality through meta data. The first installment examined the role meta data can have in the data warehouse model and data acquisition designs for information content and quality. This segment will examine real world examples of technical meta data tags that can be incorporated into your designs to facilitate measurement of data quality and promote user confidence in the informational content of the warehouse. This meta data provides a semantic layer of knowledge about the information in your warehouse that is highly valuable to both business users and information technology (IT) development staff.

Load Date: The most commonly used technical meta data tag is the load date column. It indicates when (date and/or time) a row of information was loaded into the warehouse. This "snapshot" date maintains temporal integrity of the data in the warehouse. The column can be referenced by warehouse administrators to identify candidate rows for archival/ purge processing or by users to reconcile/audit information in the data warehouse with the source systems.

Update Date: Another common technical meta data column is the update date. This column denotes when a row was last updated in the warehouse. Like the load date, this column maintains the historical meaning (temporal integrity) of information in the data warehouse. It is routinely used in dimensional models that implement slowly changing dimensions (SCD), type one or three processing methods, to identify when the row was refreshed or updated. For those not familiar with the implementation of SCD, type one maintains a single row per production key(s) in the dimension table which is updated as required, overwriting any history about the row. Type three also maintains one row per production key(s) but doubles the number of columns to keep both a current and previous view of the information. The column is used in administration activities such as archival/purge processing or reconciliation/audit by end users.

Load Cycle Identifier: One more technical meta data tag a data warehouse development team can incorporate is the load cycle identifier. This column is a sequential identifier assigned during each load cycle to the data warehouse regardless of the refresh frequency (e.g., daily, weekly, monthly, etc.). It can be used to easily remove data from a particular load cycle run if data corruption or other data quality issues arise. The load cycle identifier is typically used in conjunction with a meta data repository that describes other operational statistics about the load cycle. Using the repository alone, you could determine how many and when load cycles occurred against the warehouse. Now, by tying the repository statistics to the actual warehouse content, you know exactly which rows were loaded and when.

Current Flag Indicator: The current flag indicator tag identifies the latest version of a row in a table. It facilitates quick identification of the latest version of a row as compared to performing date comparisons. This flag is especially useful for managing dimension tables using SCD, type two, where history of a production record is maintained. SCD 2 is used to model a dimension table when changes to relevant columns need to be captured over time. This technique relies on the production key(s) in the dimension table not changing. New surrogate keys are assigned to the dimension table when changes to relevant columns are detected during a batch load cycle. Comparisons are made between the previously loaded production key(s) of the dimension table and the new load cycle data. Changes to relevant columns in the new load cycle data for matching production keys are loaded with new surrogate key assignments. This tag is also very useful in non-star-like data model schema designs such as an atomic (most granular) data warehouse where structures tend to conform closer to third- normal form. Instead of querying a table for the latest date field, the ETL process assigns a "Y" to the latest record loaded for a production/natural key (s) while setting any previously loaded record to an "N."

Operational System Identifier: One of the most useful technical meta data tags is the operational system identifier. This tag is used to track the originating source(s) of a data row in the warehouse. This tag allows identification of each row in a warehouse table to the sources used in its construction. This provides the user, repository architect and data acquisition developer with a powerful means for identifying and measuring the quality of the data received from an operational source. In the common case where your ETL process is required to extract and merge data from multiple sources, the tag is assigned a value that represents this particular integration. For example, where a row of data is integrated from more than one operational source system (client information from a trouble ticket, order management and billing information), a column value indicating the combination of these systems is assigned.

Active Operational System Flag: This tag is used to indicate whether the production keys (rows of data) in a warehouse table are still active in the originating operational system or systems. The active operational system flag provides an analysis alternative to queries posed to the data warehouse. This column can be used effectively in a variety of analysis activities to identify dormant data or data that should be constrained in reporting.

Confidence Level Indicator: One of the more intriguing technical meta data tags is the confidence level indicator. This column is used to indicate how business rules or assumptions were applied during the ETL processes for a particular row of data through application of a ranking value. This tag provides a means of measure to a user as to the credibility level of a data row based on the transformation processing performed. It is used to identify potential problems with data quality from source systems and to facilitate correcting these issues. For example, data from a stable source such as customer was loaded at the highest level. Data more volatile, easy to cleanse or relatively moderate to define was loaded at the second level. The third level of data was considered more problematic to define, such as planning or forecasting data. The fourth level consisted of data not originating from internal operational systems but provided by management in the form of a spreadsheet. The fifth level was used to tag the data from external sources such as news services or commercial sources.

Using these technical meta data tags allows your corporation the ability to vastly improve the overall data content quality by tying the meta data repository and decision support data models closer together.

Michael F. Jennings has more than seventeen years of information technology experience and is the unit manager for business intelligence and data warehousing infrastructure at Hewitt Associates LLC. Jennings can be reached at (847) 295- 5000 or mfjennin@hewitt.com.


For more information on related topics visit the following related portals...
Data Quality and Meta Data.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Verify Data at the Point of Collection: Free Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Integrated Revenue Management Software
Maximize, protect, produce and analyze revenue by consolidating customer data from multiple disparate sources across the enterprise and associating it with real-time operational data for a complete and unique view of your customers and business.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.