Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Warehousing Lessons Learned:
The Costs of Information and Data Quality Defects

  Column published in DM Review Magazine
May 2003 Issue
 
  By Lou Agosta

Calculating the costs of information and data quality defects is important in assessing the priority of quality improvement initiatives. The costs of poor quality information line up with the three main categories of defects: representational, procedural and judgmental.

Representational information quality refers to how objectively the IT system aligns with the business reality it is supposed to present. Poor quality information and defective data do not line up with reality. Redundant data storage and processing costs are often cited as the paradigm of representational costs within the IT data center. If the system contains a date field that is ambiguous due to an incomplete representation of the data (e.g., a missing century), one obvious cost is that of fixing the system to accurately represent the context. The Y2K data element reportedly required approximately $600 billion to correct. Rework, digital scrap and inefficiencies in data center operations are significant contributors to these costs.

Naturally, the cost and impact of poor data quality varies widely by industry, by company within industry, by business process and by how one partitions the problem. Still, some generalizations are possible and useful to consider. Costs of data creation and maintenance are relatively easy to capture but are incomplete. Much of the value of data escapes narrow cost analysis and becomes visible when lack of a data backup means the company is out of business. In calculating costs, it is essential to look beyond the representational issues and include the consequences of the inaccurate information. Hence, it is necessary to look at the procedural and judgmental aspects of defective data.

If an input file is loaded twice into the same database (double posted due to a faulty procedure), the consequences can be very costly in terms of the effort to recover the database as well as its unavailability during the recovery period. If it is an order-entry database, the firm may effectively be "out of business" during such a period. For firms that move physical goods through a supply chain, the costs of a returned package due to inaccurate shipping data are the paradigm. If the information is unusable because of the way it is presented at the GUI, the cost of the staff time in designing a work-around and the extra time thus incurred on a daily basis is chargeable to the poor information quality. In an extreme case at NASA, the cost of a data quality defect was the entire mission when data was entered as English, not metric measurements, causing the $100 million spacecraft to crash.

Costly consequences of inaccurate data abound. Management must make decisions based on inaccurate information and may not even know it. After enough representational and procedural defects have occurred, the trustworthiness and history of experiences with the system in question reach a critical mass where the system is not credible. If the information quality issues are chronic, the customer will take the business elsewhere. Even if there is no product or service substitute, the customer may document the complaints to the responsible regulatory agencies, resulting in increased customer service costs and costly investigations or regulatory proceedings.

If clients receive inaccurate billing statements, the cost is delayed collections. Customers generally will not pay an inaccurate or unintelligible bill. In addition, the enterprise will incur increased customer service costs as clients make inquiries about the statements and initiate disputes. If enough payments are delayed, the firm might have to draw down its short-term credit, resulting in additional interest expenses. This would be directly traceable to the inaccuracies and is quite tangible.

In general, the cost and impact is greater if the firm does more business with a given account or customer. Naturally, the cost and impact is greater if the firm generates an error that occurs across the board with all the customers or a large subset of them. The value of a million-dollar error with one customer and a $1 error with a million customers is the same. However, the coordination costs in correcting the million-person error are greater if it is necessary to communicate with each person individually. To measure the cost of a customer lost due to data defects, a measure of the lifetime value of the customer (or account) is needed. This would require aggregating a lifetime of customer transactions. For those firms that have consolidated, integrated customer history in a data warehouse, this is feasible.

When examining the costly consequences of data and information defects, remember the story "For Want of a Nail" from the days when horses and riders were used to deliver messages. For want of a nail, the horse lost its shoe, the message was not delivered, the battle was lost, the empire fell and the king was beheaded. That's a severe consequence, and all for want of a nail. Likewise, small data quality errors can sometimes produce results that are disproportionate to their size. Therefore, exercise caution in making hasty generalizations about the value of a single data element. Caution is also appropriate in guarding against data quality paranoia. If a negative scenario is highly improbable (so to speak, an uninsurable risk in a given market), management may be justified in omitting it from the design. However, this decision should be made with eyes open, mindful of possible consequences.

...............................................................................

For more information on related topics visit the following related portals...
Data Quality.

Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to lagosta@acm.org.

 

 

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Award-Winning Database Administration Tools
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!

Online Backup and Recovery for Business Servers
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.

Data Mining: Strategy, Methods & Practice
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy

Thomson Media

2005 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.