Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Plain English About Information Quality:
Defining and Measuring Accuracy

  Column published in DM Review Magazine
July 2003 Issue
  By Larry English

I have been amazed in recent months to find how many people have an inaccurate understanding of the information quality (IQ) characteristic of "accuracy" and how to measure it. I am not referring to practitioners, but to consultants, authors and educators who write and teach about it.

In this column, I address how to define accuracy, measure accuracy, design accuracy measurement tests and solve accuracy problems.

Accuracy is one of the most fundamental and important of all IQ characteristics. Without accuracy of data values, some processes may operate acceptably, but other processes will fail. The meaning of accuracy is, or should be, crystal clear.

Information, whether electronic or on paper, is simply a representation of real world objects or events. Data elements hold values that are facts that represent some attribute of a real world object or event. Therefore, the definition is: Accuracy is the degree to which data correctly reflects the real world object or event being described.1

Either the value of the attribute is correct or it is not. It is that simple. While some analog attributes such as weight or latitude/longitude of an object may be correct within some allowable variation or tolerance, this represents a measure of the IQ characteristic of the "precision" of the value. For example, the U.S. official time provided by the United States Naval Observatory and the National Institute of Standards and Technology via www.time.gov is accurate to within 0.2 seconds. At the exact moment the screen displayed 11:49:00 CDT, the real U.S. official time could have been anywhere from 11:48:59.8 CDT to 11:49:00.2 CDT. For binary data, however, such as birth date and product (selling) price, the value is right or wrong when compared to the object or event.

Kaoru Ishikawa, the great Japanese quality guru who gave us the fishbone diagram tool we use for cause-and-effect analysis, also provides the key to measure "accuracy." When you take a sample of manufactured goods from an entire lot of products produced, you measure its quality by comparing the characteristics of the product to the data (the product specification data).2 Therefore, for physical manufactured object quality, you measure the object and compare the measurements to the data.

You measure data accuracy by comparing the data values to the real world object or event. Accuracy of nearly all business attributes, such as person name, birth date and marital status, cannot be measured electronically with software. It can only be measured by going to the object itself, or to an observation or recording of an event, to confirm that the data values are correct to the object or event characteristic.3

Some examples illustrate this. When in London, Diane and I frequently attend classical concerts. On a trip a few years ago, we noticed in Time Out (a weekly calendar of events) that Placido Domingo, the great tenor, was singing that Friday. Diane purchased tickets from the local ticketing service. When we arrived at the Royal Albert Hall on Friday to pick up the tickets, the hall was strangely silent. Only after we entered did we find out the concert had been on Thursday, the night before. The concert date in the calendar of events was not accurate. The date listed was a valid and reasonable date, but we missed the concert regardless. In another example, an assessment of 2,000 persons found no invalid values for marital status. However, when the persons were contacted, 23.3 percent - nearly one out of four -- of those valid values of marital status were not accurate.

One technique for attempting to measure "accuracy" is to compare data to other reference data, such as postal address or change of address data, or other transaction data collected by third-party information sources. Technically, however, this does not measure accuracy as it is reflected in the real world object, but as reflected in some reference or surrogate source "considered" to be accurate. However, the reliability of this assessment will be dependent on the accuracy of the data in that reference source. You must know the reference data accuracy level to understand the confidence level and bounds (margin of error) in the accuracy level of your own data. One of my students used a data cleansing service to "cleanse" name and address data to such reference data. Afterward, a physical accuracy assessment of the results showed that 12 percent of the "cleansed" addresses still had inaccuracies, from address number errors to people no longer living at stated addresses.

The message is clear. Validity to valid values and validity of conformance to defined business rules can be measured electronically. However, accuracy cannot be measured electronically; it can only be measured through physical inspection.

Measurement of accuracy is complicated and expensive, but it must be done. Accuracy (to reality) tests require physical comparison of the data to the real world object or event. Accuracy tests will be determined by the different categories of objects or events. People must be contacted via telephone, mail or e-mail. For physical objects, one must extract samples and measure them. For locations, one must survey and inspect the actual location. Events must be observed in real time (this measures the current process effectiveness) or recorded so the data can be confirmed. For example, measuring the accuracy of a medical insurance claim requires a qualified person to review the actual patient file at the medical provider's office. For details on how to measure accuracy, see Improving Data Warehouse and Business Information Quality, pp. 182-188.

For the best of both worlds, measure validity electronically to exploit the efficiency of electronic tests. However, design accuracy tests and apply them to a small yet statistically valid sample to measure accuracy and know the difference in validity and accuracy of the data. You may conduct accuracy assessments less frequently to reduce costs, but you must conduct them periodically.

When you report your assessment findings, you must differentiate and correctly label validity and accuracy assessments. If knowledge-workers misinterpret a measure of validity assessment as a measure of accuracy, they may have false expectations of the quality of their data.

The real solution to information quality problems is to conduct root-cause analysis on the types of problems you have, and then implement process improvements to eliminate recurrence of defective data. Implement processes that minimize information quality decay. Decay is the phenomenon in which characteristics about real world objects change without being updated in your database.

A contributing cause of our missing the Placido Domingo concert was not verifying the date of the concert when we ordered the tickets. The defect-prevention technique we now use is to verify the information, date, time and location with the source to assure the information we have is accurate.

What do you think?

1. English, Larry. Improving Data Warehouse and Business Information Quality (IDW&BIQ;). New York: Wiley & Sons, 1999. p. 147.
2. Ishikawa, Kaoru. Guide to Quality Control. Tokyo: Asian Productivity Organization, 1982. p. 109.
3. English. p. 184.


For more information on related topics visit the following related portals...
Data Quality.

Larry P. English is president and principal of INFORMATION IMPACT International, Inc., Brentwood, Tennessee, and the author of the widely acclaimed book, Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. English is cofounder of the International Association for Information and Data Quality (www.iaidq.org). English is an internationally recognized speaker, teacher, consultant and author and may be reached at larry.english@infoimpact.com or through his Web site at www.infoimpact.com. For more on how to improve your IQ principles and techniques, and prevent your organization from wasting millions in information scrap and rework, join the IAIDQ (visit www.iaidq.org).

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.