Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events
Archived Events

RESEARCH VAULT
White Paper Library

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge Integrity:
Abstracting Access

  Column published in DM Review Magazine
February 2005 Issue
 
  By David Loshin

One nice thing about meta data is that it provides a level of abstraction regarding the description of data elements within a data set, and because of this, we also are able to abstract details of the instantiation of the data set. Conceptually, we can distinguish between the data elements along with their corresponding attributes and the formats in which sets of those data elements are collected, stored, exchanged, presented or grouped together. Consider an example using a simplistic meta data description of a customer:

  • Customer ID: a 10-digit number
  • First Name: variable length character string not to exceed 25 characters
  • Middle Name: variable length character string not to exceed 25 characters
  • Last Name: variable length character string not to exceed 30 characters
  • Date of Birth: fixed-length formatted field using the format MMDDYYYY

Each data element is provided with a simple data type and size description. While this description does accurately provide information about a grouping of data elements, it does not specify whether we are talking about records in an RDBMS (relational database management system), rows in a flat file, a grouping of elements in an XML document, a row in a spreadsheet or any number of other possible materializations. However, any business rules that apply to the elements within a single instance, to a set of data instances, or to a set of these records compared to some other described data set, will still apply, regardless of what the actual physical representation is.

This introduces an interesting question: If our business rules apply to the abstraction as described by the meta data, then can we abstract the application of business rules as well? I have discussed business rules as meta data in previous columns, but this month I am interested in a more basic question regarding data access: How do we manipulate data instances that may be represented in different ways? From a direct access approach, the problem is complex. From a programming point of view, however, using an object-oriented approach to develop an interface provides a way to mirror and consequently to exploit the meta data abstraction.

Figure 1: A Sample Class Hierarchy

Let's focus solely on the question of data access. Presume that we have a set of records and that we want a means for successively accessing those records so that we could (in practice) apply a predefined business rule to each record. We know that each collection of data elements constitutes a single instance. We also know that any means through which the data is accessed is likely to contain a set of data instances. Therefore, regardless of the physical representation, we can define a simple interface that expects to be able to:

  • Instantiate a pointer to the beginning of the data set
  • Determine if there are still data instances in the set
  • Access the next data instance in the set
  • Apply some business rule to the data instance

I am an object-oriented programmer at heart, so my inclination is to describe everything in terms of classes, objects, attributes and methods. In this month's column, I will provide some high-level descriptions of classes that can be used; in next month's column, I will provide more detail as to a sample code implementation, purely as a guideline for understanding approaches to enterprise information integration. There will be a class representing data instances and a hierarchy of classes representing collections of data instances.

A data instance should be able to publish the names and data types of the elements composing the instance, as well as present the value of any of the data elements contained within. Should we desire to allow modification of the values, we might provide a method for updating a data element. The value of defining a standard data instance interface is that we can program our data set classes to always deliver records in the same class, which in turn simplifies the application.

A data set should have high level summary information, such as the number of records in the set, the maximum and minimum data instance sizes, etc., as well as some kind of iterating pointer that can be reset to the first data instance in the set. One should be able to determine whether there are still more data instances in the set, and if so, be able to access the next data instance available. Lastly, each data set class should transparently deliver data instances in the standard object representation described in the previous paragraph.

Conceptually this is great, until you recall the many different ways that we store or exchange data sets. By developing an application interface based on the functionality described, we can structure a class hierarchy that implements that interface and still remains transparent - this is the magic of the object-oriented approach. We might see data sets in RDBMSs, flat files or XML documents. Those flat files might be separated-element files, such as comma-separated files (CSVs), or they may be fixed format files; we may see a variety of database systems as well. Yet we can derive classes within a logical hierarchy that allows us to target each potential data source without requiring a significant code implementation, as is seen in Figure 1.

Next month, we will look at these classes a little more carefully and begin to see how creating standardized class representations simplifies information exchange as well as application of business rules. 



...............................................................................

For more information on related topics visit the following related portals...
Business Rules, Data Quality and Meta Data.

David Loshin is the president of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Enterprise Knowledge Management - The Data Quality Approach (Morgan Kaufmann, 2001) and Business Intelligence - The Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. Loshin may be reached at loshin@knowledge-integrity.com.



View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.