Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Building Business Intelligence:
The Perfect Match - Seven Steps to a Master Data Match, Part 1

  Column published in DM Review Magazine
January 2006 Issue
  By William McKnight

The following article is excerpted from the white paper, "The Perfect Match: 7 Steps to a Match" written by Cory Shouse. Visit http://www.csiwhq.com/news/whitepaper_requests.asp for a copy of the full paper.

Master data management (MDM) is a series of processes put in place to ensure that reference data is kept up to date and coordinated across an enterprise. A solid MDM approach can have impacts on the business and quality of your systems that make it well worth the investment; however, it is no trivial undertaking. Perhaps one of the most complex and underestimated tasks associated with master data management is matching. Matching is at the core of MDM. For our purposes, let's define a "match" as finding similar customers, vendors or products in multiple disparate systems. Preparing for the match, performing the match and perfecting the results of the match can significantly increase the quality of the reference data within your organization.

In a simple example, Customer A may be represented in two disparate systems. In system 1, the customer may be known by its trade style (i.e., Paws-N-Claws Pet Sitting); however, in system 2 it may be known by its legal name (i.e., PNC, Inc.). If we want to get a good understanding of Customer A's spend across both systems, then we must have a way to match the trade style with the legal name to identify this customer as the same entity.

The value of understanding this relationship goes beyond business intelligence (BI) solutions. Enterprise resource planning (ERP) systems, customer relationship management (CRM) applications and other IT applications can all benefit from a dependable matching approach.

Preparing for the Match

Preparing for a match involves investigating the conditions of the source data, standardizing common elements and, when available, finding a partner to leverage as an external master reference.

Step 1: Clean Your Data

Clean data increases the probability of a match. Attempting a match with dirty data is similar to searching for a friend on Google by using only his or her first name. You may get results, but culling the numerous pages of results will take time. To ensure the same does not happen with MDM, take the time to investigate and understand your data source(s). Investigation of data can range from a simple count of the nulls by column to a complex statistical analysis finding tendencies in the data. A good picture of the primary keys, relationships, constraints and domain values (distincts/ranges) will later help define the rules for the match.

Step 2: Standardize Common Elements

The next step in preparing for a match is to identify and standardize common elements that will be leveraged for matching. In order for the elements to be standardized, they must exist in all files you are trying to match. These elements will be the core set used in comparing across the files. "Customer" is often the first subject area that companies attack due to the high value associated with acquiring a single view of the customer. In the case of a customer, seven elements are often leveraged as the standardized columns. These elements, known as the "Super 7," include:

  1. Business Name,
  2. Address,
  3. Industry Classification,
  4. SIC Code,
  5. Sales,
  6. Number of Employees, and
  7. Phone Number.

In the case of item/product, the standard elements might include SKU (for internal systems) or UPC, ISBN and EAN for external matching. In addition, product description, weight or even generic name can be included as required columns for matching.

Step 3: Identify External Reference Source

Finally, where available, identify an external source that can provide a common industry key. Continuing with the customer example, a Social Security number for consumers or DUNS number for companies is common. In the case of a product, it can be more difficult, as products are often unique; however, many industries have established standards. For example, the FDA created the national drug code (NDC) number to uniquely identify pharmaceutical drugs. Some important questions to consider as you look for the right partner include:

  • How often is your database updated?
  • What are the sources for the data?
  • What is coverage of the file for the specific set of data?
  • What formats are available?
  • What other value-added elements are available?

To achieve an optimal match, it may be necessary to attempt matching against multiple vendors. Provide the vendors a set of your data and see which one provides the best match. Leverage that vendor as your primary, and use the secondary vendor in cases where no match was found in the first case.

William wishes to thank Cory Shouse for his contribution to this month's column.

Cory Shouse is a senior architect with Conversion Services International. With more than 10 years of experience in business intelligence, Shouse specializes in helping companies establish, organize and deliver value to the business. He has assisted a number of Fortune 500 companies define quality assurance programs, organizational and staffing plans, change control procedures, and appropriate information and technical architectures. He may be reached at cshouse@csiwhq.com or (469) 939-5385.

In the next column, we will cover performing the match with steps four through six.



For more information on related topics visit the following related portals...
Business Intelligence (BI) and Data Quality.

William McKnight has architected and directed the development of several of the largest and most successful business intelligence programs in the world and has experience with more than 50 business intelligence programs. He is senior vice president, Data Warehousing for Conversion Services International, Inc. (CSI), a leading provider of a new category of professional services focusing on strategic consulting, data warehousing, business intelligence and information technology management solutions. McKnight is a Southwest Entrepreneur of the Year Finalist, keynote speaker, an international speaker, a best practices judge, widely quoted on BI issues in the press, an expert witness, master's level instructor, author of the Reviewnet competency exams for data warehousing and has authored more than 80 articles and white papers. He is the business intelligence expert at www.searchcrm.com. McKnight is a former Information Technology Vice President of a Best Practices Business Intelligence Program and holds an MBA from Santa Clara University. He may be reached at (214) 514-1444 or wmcknight@csiwhq.com.

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.