Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
More Portals...


Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Industry Events Calendar
Vendor Listings
White Paper Library
Software Demo Lab
Monthly Product Guides
Buyer's Guide

General Resources
About Us
Press Releases
Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

The Enterprise:
Technologies for EAI: Data Content Analysis

  Column published in DM Review Magazine
November 2000 Issue
  By Clive Finkelstein

One of the biggest problems facing enterprises today is the question of integration of application systems and databases within and across enterprises. These may be legacy databases and systems that were developed years ago for a specific purpose and are still being used very effectively in the enterprise. Or they may be recently developed but difficult to integrate with other data- bases and systems that contain much the same data. These may all be redundant versions of the same data, each of which must be kept up to date with any changes so that all data versions are current.

In this and following columns, we will discuss various technologies, including XML, that assist with enterprise application integration (EAI).

The first approach uses data content analysis for normalization of live databases or files to reverse engineer third normal form (3NF) data structures and database designs directly from the live data content. This can permit EAI to be achieved more effectively than by using the unnormalized databases and files. The second approach is based on XML which is used to expose all aspects of databases, including business rules. This is called inter-enterprise data integration. The third approach analyzes the implicit relationships between tables that reside in databases developed by the enterprise, as well as databases developed by enterprise resource planning (ERP) vendors such as SAP, Baan and others. This is called hyperrelational analysis. We will start with data content analysis.

Data Content Analysis

We have all encountered legacy databases and application systems that were developed many years ago, but the database design and application design were never documented. Other databases were originally documented, but changes have since been made to the applications or the databases ? and those changes were never updated in the documentation. As a result, little is known today of the database structures.

Of course, it is possible to reverse engineer these undocumented legacy databases to determine their structure by using CASE modeling tools. These extract from the database catalog various details about the tables and columns that comprise a database. With this knowledge of the database structure, legacy database designs can be integrated with other databases. They can then be reengineered for new database environments. The problem becomes more complex, however, when it is necessary to reengineer databases that were unnormalized for performance.

You know the problem. Many legacy databases did not store details about customers or orders or products only in the relevant customer, order or product tables as normalized data. Instead, these details were combined in common tables as unnormalized data, hoping to avoid perceived performance problems. This may indeed have enabled improved database performance, but it was often achieved at the expense of creating redundant data versions throughout the enterprise. The problem emerges when redundant data changes. For example, if a customer's address is changed or a product price is changed, each redundant data version has to be updated so that all versions reflect the same status of the data.

EAI brings all of these redundant data versions together, so that relevant customer, product or other details exist in only one place ? yet can be shared throughout the enterprise. When a change occurs, the change only needs to be made once. The single, updated data version is immediately available at its latest status for everyone who is authorized to use it.

Consider when these problems occur together: unnormalized data versions that exist redundantly throughout the enterprise plus an absence of documentation of those unnormalized database designs. To resolve this problem requires an enormous expenditure of effort. Examining the database catalogs and the live data content ? to infer data dependencies and to derive normalized database designs for EAI ? is largely a manual task.

Fortunately, new technologies based on the application of data content analysis are emerging to assist this analysis. Products such as Axio from Evoke Software analyze live databases to infer data dependencies. All of the data values in a column are first analyzed for data value consistency and data quality. For example, the same address column may have some rows that seem to be different ? appearing as 100 Fillmore and also as 100 Fillmore Street. When quality problems are detected, these different values can be changed so that only consistent data values exist (using only 100 Fillmore Street, for example).

Many products are available to assist this data quality analysis. However, Evoke Axio takes this analysis further. It also examines the data values in each row of a table to identify columns that are dependent on the values of other columns in the same row. This dependency analysis of data values identifies possible primary and foreign keys. It enables those columns to be normalized to 3NF. It eliminates data redundancy by deriving 3NF database designs and 3NF data models, working from the live data content of the database. The end result is the automatic generation of 3NF data definition language (DDL) schema scripts to install the 3NF databases using appropriate database management systems (DBMS) products. It resurrects undocumented legacy databases that may have been lost forever and, in turn, enables more accurate EAI.


For more information on related topics visit the following related portals...
Enterprise Application Integration (EAI), XML and Data Analysis.

Clive Finkelstein, the father of information engineering (IE), is an international consultant and an instructor. He is the managing director of Information Engineering Services Pty Ltd (IES) in Australia. You may contact Clive Finkelstein by e-mail at cfink@ies.aust.com.

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Award-Winning Database Administration Tools
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!

Online Backup and Recovery for Business Servers
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

Help Desk Software Co-Winners HelpSTAR and Remedy
Help Desk Technology's HelpSTAR and BMC Remedy have been declared co-winners in Windows IT Pro Readers' Choice Awards for 2004. Discover proven help desk best practices right out of the box.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy

Thomson Media

2005 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.