Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Warehousing Lessons Learned:
The Impact and Cost of Information Quality Lapses

  Column published in DM Review Magazine
October 2004 Issue
 
  By Lou Agosta

Responses from the TDWI-Forrester Quarterly Technology Survey (May 2004) indicate that information and data quality defects fall into a variety of broad categories (see Figure 1). Respondents identified between three and four of these issues as applying to their own firms, corresponding to the major dimensions of Forrester's definition of information quality. Forrester's working definition of information quality and how to transform dumb data into quality information is: information = objective (data) + usable (data) + trustworthy (data). As attributes of the data are structured according to a defined transformation process along the three high-level dimensions of objectivity, usability and trustworthiness, information quality then improves in precisely those dimensions. We learn about the definition of information quality when IQ breaks down along the three dimensions of objectivity, trustworthiness and usability. Abstracting from and interpreting the data, the top three groups of information quality issues include:


Figure 1: Responses to TDWI-Forrester Quarterly Technology Survey

Loss of objective validity. The data in the system does not agree with what is the case in the world. It is not accurate. Mail and packages returned due to incorrect customer contact data are reported by some 20 percent of respondents (Figure 1). Lack of objectivity also shows up as empty, missing or garbage-filled data elements in master files as well as corrupted databases, which are cited by a whopping 50 percent and 25 percent, respectively. Duplicate and overlapping entries in customer and product master files weigh in at 37 percent and are also a clear misfit with objective reality - there is one customer "out there," but the system contains multiple representations, rows or instances. Data integrity is the most important product of the data administration function, and its loss is felt most keenly as the loss of objective validity.

Loss of procedural controls. For example, the data in the file is perfectly accurate, but it is loaded into the database twice. This exemplifies the instance where a defined procedure exists, but the practice providing the framework of information quality is incorrectly implemented. The high percentages in the TDWI-Forrester survey due to loading duplicate files and rerun jobs due to incorrect input - some 33 percent and 53 percent, respectively - are really shocking. In such volumes, these kinds of errors, though extremely costly and common, are remediable and can be fixed with modest attention to basic operating methods. Correcting them is low-hanging fruit. Defining, scripting and automating procedures in data center operations will significantly reduce or eliminate clumsy errors in handing off data between systems and within processes.

Loss of trust. System users report loss of trust due to unfavorable experiences with the reliability of the data. Inconsistent and conflicting data from different reporting processes is damaging to the credibility of the operations of the entire IT organization. When different reports from different systems express different information about the same set of transactions (data), then the opposite of information is produced. Uncertainty is produced - that is, chaos -- not information.

Information Quality Defects are Expensive

The impact of these unscheduled system events is significant. The flow of information is disrupted and, therefore, business is interrupted. Though the survey did not capture explicit dollar costs, these are easy to infer because of the precise and granular scope of the issues and answers, including:

The cost of inaccurate data. The loss of data integrity and database corruption are show stoppers. They use the time of valuable staffing resources - database administrators - who must perform system archaeology to discover the source of the corruption (or it will keep happening again, which is an even greater cost). In another example, post office penalties for returned packages and misdirected mail are well documented and readily calculated. The 50 percent of respondents who report data elements that are empty or contain garbage data do not use these fields in their system applications. These are perfect examples of digital scrap. They waste disk and processor cycles backing up and re-orging this meaningless data.

The cost of uncontrolled procedures. Rerun jobs cost processor (CPU) cycles, the utilization of the disk and network resources, and misuse the time of the IT operations staff needed to disentangle the failed processes and then babysit the recovery effort. The cost of restoring the integrity of a database to which duplicates have been selectively loaded can be prohibitive. In some instances, it is necessary to write a one-shot, custom application to undo the damage.

The cost of loss of trust. The loss of trust in reports results in the marching and counter marching of large numbers of people in multiple departments as decisions are delayed, meetings are called and analysis paralysis looms large. In serious cases, the cost is at risk of growing to be coextensive with the value of the entire enterprise as the time and effort of staff, managers and executives reaches the point of thrashing, finger pointing and hand wringing over lost opportunities.

Though clients rarely call Forrester to volunteer horror stories about the costs of their really damaging information quality mistakes, we estimate that the difference between an apprentice, entry-level information quality enterprise addressing information quality through heroics and an enterprise with a defined, repeatable, metric-enabled process for continuous information quality improvement is as high as a full order of magnitude.

...............................................................................

For more information on related topics visit the following related portals...
Data Quality.

Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to lagosta@acm.org.

 

 

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Award-Winning Database Administration Tools
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!

Online Backup and Recovery for Business Servers
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.

NEW Glasshouse White Paper from ADIC
Learn to integrate disk into your backup system; evaluate real benefits and costs of different disk backup approaches; choose between disk arrays and virtual tape libraries; and build long-term disaster recovery protection into a disk backup system.

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy

Thomson Media

2005 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.