Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events
Archived Events

RESEARCH VAULT
White Paper Library

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Ask the Experts Question and Answer

Ask the Expert

Meet the Experts
Ask a Question (Names of individuals and companies will not be used.)
Question Archive
Ask the Experts Home

Q:  

We have a large customer database and are interested in identifying "duplicate customers." We want to try to establish a metric to indicate a tolerable level of duplicates so we can establish processes for combining and balance that with meeting this metric goal. We have multiple sources of customer information and need to search and identify the appropriate customer to process an update. What data is available on creating good metrics?

A:  

Sid Adelman's Answer: The notion of what is a tolerable level of duplications will depend on three factors:

  1. How automated is the process. There are a number of software tools that can do the majority of the job.
  2. What is the cost of finding duplicates? It may be reasonably inexpensive to find 95 percent of the duplicates but very expensive to find the last 5 percent.
  3. What is the cost of not finding the duplicates? Duplicate mailings are a known cost but the lost information of not having a complete picture of a customer could be far more expensive and the perception of your organization as disorganized can also hurt you.

Joe Oates' Answer: The duplicate customer is just one of many data quality problems found both in data warehouses as well as transaction processing systems. I am not sure what you mean by establishing a metric.

There are several companies as well as some software products that specialize in finding customer duplicates. All of them use certain fields and combination of fields from the customer records. These typically include full name or name components, address or address components, tax ID (or social security number for people), date of birth (for people), etc.

However, as good as the above-mentioned companies and software products are, all of them leave some customer records that cannot be automatically resolved. These records must be individually researched by your own employees. Many of them will have to be altered or updated and then reentered into your customer database.

Based on my experience, the best that you can expect is for an automated solution is about 80 percent. Many organizations results are less and sometimes much less than 80 percent. This may be the metric that you are referring to.

(Posted )


ARCHIVE OF QUESTIONS & ANSWERS FOR DATA QUALITY
BACK TO THE LIST OF CATEGORIES



Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.