Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab
Monthly Product Guides
Buyer's Guide

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Ask the Experts Question and Answer

Ask the Expert

Meet the Experts
Ask a Question (Names of individuals and companies will not be used.)
Question Archive
Ask the Experts Home

Q:  

I'm trying to make a benchmarking about data quality software. What are the most important characteristics to compare?

A:  

Sid Adelman's Answer: Data quality encompasses many characteristics of the data, including compliance with business rules, conformance to valid values, completeness - especially for mandatory fields - timeliness and referential integrity. Data should be understandable, non-conflicting and non-redundant.

These are the starting characteristics of what needs to be evaluated for data quality:

  • Data elements that do not correspond to the valid values
  • Missing values in mandatory fields
  • Other missing values
  • Non-unique values in fields where the values should be unique
  • Violations of business rules (for example, a negative number of dependents, year of birth greater than the current date)
  • Invalid data types (for example, a "character" type that should be "packed decimal")

Joe Oates' Answer: The basic capabilities that the tool should include:

  • Domain value checking
    • Domain value checking deals with whether the values for a particular column conform to formal and/or logical value rules. Many products allow the user to specify these rules. Examples include:
      • If the column contains anything other than valid values that have been predefined for the column;
      • If the social security number contains all zeroes;
      • If a retired person's age is 17, something is wrong.
  • Data type
    • Are alpha characters in a numeric field and vice versa?
  • Frequency counts
    • If most of a company's customers are in the United States, then most rows should contain data about customers in the U.S.
  • Statistical counts
    • Min
    • Max
    • Average
  • Pattern checking
    • Telephone numbers in North America should be three characters for area code, three characters for the exchange and four characters for the subscriber number.
  • Interdependency between certain fields
    • Postal code is dependent on country, state/province and city.

There are other things that certain tools check, but these are the basics.

(Posted )


ARCHIVE OF QUESTIONS & ANSWERS FOR DATA QUALITY
BACK TO THE LIST OF CATEGORIES



Advertisement
advertisement
Site Map Terms of Use Privacy Policy

Thomson Media

2005 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.