DM Review Published in DM Review in January 2006.
Printed from

Building Business Intelligence: The Perfect Match - Seven Steps to a Master Data Match, Part 1

by William McKnight  and by Cory Shouse

The following article is excerpted from the white paper, "The Perfect Match: 7 Steps to a Match" written by Cory Shouse. Visit for a copy of the full paper.

Master data management (MDM) is a series of processes put in place to ensure that reference data is kept up to date and coordinated across an enterprise. A solid MDM approach can have impacts on the business and quality of your systems that make it well worth the investment; however, it is no trivial undertaking. Perhaps one of the most complex and underestimated tasks associated with master data management is matching. Matching is at the core of MDM. For our purposes, let's define a "match" as finding similar customers, vendors or products in multiple disparate systems. Preparing for the match, performing the match and perfecting the results of the match can significantly increase the quality of the reference data within your organization.

In a simple example, Customer A may be represented in two disparate systems. In system 1, the customer may be known by its trade style (i.e., Paws-N-Claws Pet Sitting); however, in system 2 it may be known by its legal name (i.e., PNC, Inc.). If we want to get a good understanding of Customer A's spend across both systems, then we must have a way to match the trade style with the legal name to identify this customer as the same entity.

The value of understanding this relationship goes beyond business intelligence (BI) solutions. Enterprise resource planning (ERP) systems, customer relationship management (CRM) applications and other IT applications can all benefit from a dependable matching approach.

Preparing for the Match

Preparing for a match involves investigating the conditions of the source data, standardizing common elements and, when available, finding a partner to leverage as an external master reference.

Step 1: Clean Your Data

Clean data increases the probability of a match. Attempting a match with dirty data is similar to searching for a friend on Google by using only his or her first name. You may get results, but culling the numerous pages of results will take time. To ensure the same does not happen with MDM, take the time to investigate and understand your data source(s). Investigation of data can range from a simple count of the nulls by column to a complex statistical analysis finding tendencies in the data. A good picture of the primary keys, relationships, constraints and domain values (distincts/ranges) will later help define the rules for the match.

Step 2: Standardize Common Elements

The next step in preparing for a match is to identify and standardize common elements that will be leveraged for matching. In order for the elements to be standardized, they must exist in all files you are trying to match. These elements will be the core set used in comparing across the files. "Customer" is often the first subject area that companies attack due to the high value associated with acquiring a single view of the customer. In the case of a customer, seven elements are often leveraged as the standardized columns. These elements, known as the "Super 7," include:

  1. Business Name,
  2. Address,
  3. Industry Classification,
  4. SIC Code,
  5. Sales,
  6. Number of Employees, and
  7. Phone Number.

In the case of item/product, the standard elements might include SKU (for internal systems) or UPC, ISBN and EAN for external matching. In addition, product description, weight or even generic name can be included as required columns for matching.

Step 3: Identify External Reference Source

Finally, where available, identify an external source that can provide a common industry key. Continuing with the customer example, a Social Security number for consumers or DUNS number for companies is common. In the case of a product, it can be more difficult, as products are often unique; however, many industries have established standards. For example, the FDA created the national drug code (NDC) number to uniquely identify pharmaceutical drugs. Some important questions to consider as you look for the right partner include:

  • How often is your database updated?
  • What are the sources for the data?
  • What is coverage of the file for the specific set of data?
  • What formats are available?
  • What other value-added elements are available?

To achieve an optimal match, it may be necessary to attempt matching against multiple vendors. Provide the vendors a set of your data and see which one provides the best match. Leverage that vendor as your primary, and use the secondary vendor in cases where no match was found in the first case.

William wishes to thank Cory Shouse for his contribution to this month's column.

Cory Shouse is a senior architect with Conversion Services International. With more than 10 years of experience in business intelligence, Shouse specializes in helping companies establish, organize and deliver value to the business. He has assisted a number of Fortune 500 companies define quality assurance programs, organizational and staffing plans, change control procedures, and appropriate information and technical architectures. He may be reached at or (469) 939-5385.

In the next column, we will cover performing the match with steps four through six.


William McKnight has architected and directed the development of several of the largest and most successful business intelligence programs in the world and has experience with more than 50 business intelligence programs. He is senior vice president, Data Warehousing for Conversion Services International, Inc. (CSI), a leading provider of a new category of professional services focusing on strategic consulting, data warehousing, business intelligence and information technology management solutions. McKnight is a Southwest Entrepreneur of the Year Finalist, keynote speaker, an international speaker, a best practices judge, widely quoted on BI issues in the press, an expert witness, master's level instructor, author of the Reviewnet competency exams for data warehousing and has authored more than 80 articles and white papers. He is the business intelligence expert at McKnight is a former Information Technology Vice President of a Best Practices Business Intelligence Program and holds an MBA from Santa Clara University. He may be reached at (214) 514-1444 or

Cory Shouse is a senior architect with Conversion Services International. With more than 10 years experience in business intelligence, Shouse specializes in helping companies establish, organize and deliver value to the business. He has assisted a number of Fortune 500 companies define quality assurance programs, organizational and staffing plans, change control procedures, and appropriate information and technical architectures. He can be reached at (469) 939-5385 or

Copyright 2006, SourceMedia and DM Review.