DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Quality
Data Warehousing Basics
EDM
ETL
Storage
More Portals...

Advertisement

Information Center
DM Review Home
Online Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge Integrity:
Data Standards and Data Models

  Column published in DM Review Magazine
January 2004 Issue
 
  By David Loshin

A standard defines a frame of reference that encourages confidence between interacting parties. For example, when you fill your car's tank at a gas station, the standard definition of a "gallon" of gas will assure you that you are acquiring the amount of gas that you think you are. In turn, the standard definition of a "dollar" assures the gas station owner that you are paying him the appropriate value of the gas that you are purchasing. In essence, a standard is an agreement between interacting parties to the context of the interaction.

Presuming that any two (or more) parties wish to share information, there must be a way to describe what that information "looks like" so that when a data set arrives at its target location, the receiving party can actually do something with it. A data standard provides the guidelines through which interacting parties can confidently exchange information.

The goal of a data standard is to enable the sharing or exchange of information between multiple parties in a way that guarantees that the interacting parties share the same understanding of what is represented within that information. When exchanged information is comprised of structured data, a data standard provides the description of that structure. A data standard, at the very least, defines entity names, data element names, descriptions, definitions and formatting rules. In addition, a data standard may include procedures, implementation guidelines and usage directives. As more information is being exchanged in different operating environments, the need for defined data standards is becoming more acute. Particularly in environments where many separate organizations (each with its own data definition peculiarities) have agreed to exchange data, there is a need to coordinate that information exchange in a way that provides the most benefit to all participants.

Data models and data standards are related, yet they differ subtly. A data model is a formal structured representation of real-world entities focused on the definition of an object and its associated attributes. For example, a data model representing people might capture all attributes relevant to the description of a person: last name, first name, weight, height, birth date, hair color, eye color, etc. In addition, a data model captures how individual entities are related, such as documenting all line items associated with a customer's order.

The data model, however, is mostly concerned with the structure of the representation and not necessarily all the details associated with the content in that structure. We can say that an instance of a person object is attributed with that person's birth date and that the birth date attribute is represented using a character string, but the model does not specify whether that birth date is expressed using month names followed by the day of the month followed by a year (e.g., February 28, 1977), or whether it is expressed using the MM/DD/YY format (e.g., 02/28/77) or in any other date format.

Regardless of the format used for that date, as long as the representation is valid within the operating context (i.e., meets the needs of those working with that data), the value will conform to the model's directive. This may be fine as long as the people using the data in that model understand this to be true. However, as soon as anyone wants to share the data stored using that model with someone in a different organization, the variety of date formats may negatively affect the ease with which the information may be migrated from its source to its next destination.

For example, in contrast to the laxness associated with the source data model with respect to date representation, the next user of that data set may have strict requirements about date formats. This apparent formatting dichotomy evolves from the fact that any participant sharing the information may have his/her own data model, and embedded within each data model is information about the data types that populate each field. Therefore, while one data model (built using one vendor's database system) may allow dates to be stored as character strings, another data model (built using a different vendor's database system) might use an embedded system type for representing dates. When the target system attempts to load a record whose values do not conform to the specified type, an exception occurs that may prevent the participant from using that violating record (or the entire set of records).

The solution to this problem is the use of a data standard for information exchange. The standard may correspond to the source data model or the target data model, or may provide for a format that is foreign to both models. The actual format selected is irrelevant; what is important is the participants agree to use the selected format in any situation where they exchange data. This is not to say that a data standard should not be distinct from the data models associated with the applications that use the exchanged data. On the contrary, it is sometimes very important to develop the data standard in concert with the data model. However, it is important to be aware that there is a difference between a data model and a data standard.


David Loshin is the president of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Enterprise Knowledge Management ? The Data Quality Approach (Morgan Kaufmann, 2001) and Business Intelligence ? The Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. Loshin may be reached at loshin@knowledge-integrity.com.

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World?s largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

MicroStrategy Data Warehousing Tools
Access any size data warehouse with our award-winning software, MicroStrategy 7i. Try our award-winning software for 30 days FREE!

Help Desk Software Co-Winners HelpSTAR and Remedy
Help Desk Technology's HelpSTAR and BMC Remedy have been declared co-winners in Windows IT Pro Readers' Choice Awards for 2004. Discover proven help desk best practices right out of the box.

Data Mining: Strategy, Methods & Practice
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy

Thomson Media

2004 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.