Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Integration:
Householding Basics How to Define Relationships

online columnist Greg Mancuso and Al Moreno     Column published in DMReview.com
September 9, 2004
  By Greg Mancuso and Al Moreno

A couple of months ago we wrote about the concept of householding. At that time we talked about the major concepts a high level. This month, we'd like to expand further the first step of the householding process. That first step is defining the business rules that determine what makes up a "home." While the answer seems obvious, if you don't take the time to adequately define the rules and plan the process carefully, you will end up with records grouped incorrectly, unique records purged or no records matched at all. None of these outcomes will satisfy your project sponsor.

Before you can define the rules, you have to analyze the purpose of the grouping into a "household." Note there is absolutely no reason that a given record cannot belong to multiple households. For example, John Smith is married with two children. One household could be made up of all four people. There may also be a reason to group John Smith with his "immediate extended" family - John, his wife, his parents and his wife's parents. Further, households are not limited to the physical homes we dwell in. They may include the places we work, shop or congregate. Step one is to define the scope of the householding effort: the Why.

Once the team understands the scope of the householding task, two groups need to be organized. The first is the core group of technology and business users who will be directly responsible for defining and implementing the householding rules. The second group is an extended group comprised mainly of business users and other subject matter experts from the organization who will be responsible for validating the business rules, verifying the resulting household assignments and supporting the core team as issues arise. Summarized below are some of the types of rules that should be considered regardless of the project scope. How each of these rules is defined is directly related to the project scope.

Rule 1: Duplicate Records

Identifying and eliminating duplicates (merger/purge) is the first step in any householding implementation. But, there are multiple levels of duplicates. For personal (not business householding) the most obvious rule is to identifying duplicates is First Name|Last Name|Address. This identifies many duplicates and will allow you to consolidate the information. However, there are times when this is not sufficient, such as when a person moves and the addresses no longer match. Keep in mind, that even in the best of times, processes that allow address change notification and standardization may lag by several months. Therefore, John Smith residing at 123 Main Street, New York, NY and John Smith residing at 4101 West 75th Street, New York, NY will not match as the same individual. Adding another slowly changing data element such as e-mail address to the rule can provide an increased level of accuracy. In this case John Smith residing at 123 Main Street, New York, NY with the e-mail address jsmith@stockbroker.com and John Smith residing at 4101 West 75th Street, New York, NY with the e-mail address jsmith@stockbroker.com will match as the same individual. Rules defining duplicates can be made increasingly more complex to yield greater accuracy but processing time increases so the trade off becomes processing time vs. percent accuracy.

Rule 2: Acceptable Duplicate Match Rate

Determine ahead of time what level of accuracy your householding project requires. The previous example demonstrates how the level of accuracy that can be increased and how it is directly related to the quality of your data and the sophistication of your matching rules. The level of accuracy provided in the example above by adding the e-mail address may not be required in your set of circumstances. Therefore, it is incumbent on the business users on the core team to drive the unmatched rate requirement. This can be based on some cost estimate (savings achieved by reducing duplicate mailings) or on a desire to reach a given percentage of responses driven by percent of mailings.

Rule 3: Grouping Records

Once the records are assumed to be unique, the next step in the process is to define what makes up a household, keeping in mind that a household is simply a grouping of related records. It may be a family unit, unrelated people living at the same address, members of a club, employees at a company or subsidiaries of a large corporation. Using the family unit case, an obvious rule would be Last Name|Address. This is a valid rule in the vast majority of circumstances. Smith residing at 4101 West 75th Street, New York, NY will group both John and his wife, Jane. But what if Jane's mother, Janet Doe, lives with her daughter and son-in-law? By using last name as part of the match rule, Janet Doe would not be grouped in the Smith household. Additional data elements may need to be added or subtracted based on the unique business requirements.

Rule 4: Acceptable Grouping Rate

As with duplicate record processing, the level of accuracy achievable when grouping common records will also vary based on the sophistication of the grouping rules and the quality of your data. With the example above, you may not want to group Janet Doe with the Smiths. That is a decision for the business owners and the reasons for the household groupings. Here too, it is incumbent on the business users on the core team to drive the unmatched rate requirement. If the purpose of the householding was to reduce the pieces of mail for a marketing campaign, you may want to eliminate the Last Name qualification as mail pieces are expensive, and the return may not be worthwhile enough to justify the expense of two mailing to the same address where the incoming mail is likely viewed by all residents. However, if the mailing is for a political campaign, it may be desirable to send mail to each unique name at an address.

Rule 5: Data Confidence Factors

This is probably the most nebulous of the rule categories. It is also the rule that will likely prove most difficult to define, if not implement. Data confidence is based the concept that every data element to be used in the householding process may be invalid. To define it, first, you have to consider the source of the data record. Names and address can often be cleansed and standardized. However, other types of data, such as demographic data, are not as easily guaranteed to be accurate. The sources of data must be considered and a confidence factor assigned for each data record. For example, one of the most famous householders for business information is Dun & Bradstreet. If an organization is registered with D&B, it is given a unique identifier at the headquarters level. This DUNS number is provided only after D&B has contacted the company and verified the data provided. Then, any company location is assigned that same DUNS number, so all office locations are known and matchable with a unique identifier. Further, any subsidiaries are provided with their own unique DUNS numbers, but they are also affiliated with their parent organization and its DUNS number. This is useful when developing householding processes to group employees from different subsidiaries under a master company that can have access to the parent company's DUNS data. Such business processes ensure accuracy and allow you to be very confident that the grouping logic will be accurate.

While there are other categories of rules that need to be considered when planning your householding process, this column has attempted to deal with the initial rules of duplicate record identification and common record grouping. In future columns we will look into the more sophisticated rules that need to be defined to complete the process. Having the rules defined and the ability in place to verify, validate and use the grouped information is the key to successful household logic.


For more information on related topics visit the following related portals...
CRM, Data Acquisition, Replication and Market Segmentation.

Greg Mancuso and Al Moreno are principals with Sinecon, a business intelligence consultancy specializing in data integration and BI/DW solution architecture design. Together they have more than 29 years of data warehouse and business intelligence experience and have implemented many large-scale solutions in both the U.S. and European markets. They may be reached at gmancuso@sinecon-llc.com or amoreno@sinecon-llc.com.

Solutions Marketplace
Provided by IndustryBrains

Customer Relationship Management for IT
Web-based CRM and more with Autotask: Great business management software optimizes resources and track billable project and service work. Get a demo, then try it free with sample data. Click here for your free trial!

Numara Track-It! Help Desk & CRM Software
Numara IT Solutions provides Track-It! - the leading help desk software solution for employee & customer self-help, call tracking, problem resolution, remote control, asset management, LAN/PC auditing, and electronic software distribution. Free demo

CRM & Accounting Software Consultants NJ NY CT PA
Shelko Consulting is a provider of consulting services for acccounting & CRM software. They will analyze your needs, design, implement, customize, manage and support the right accounting and CRM software for your organization in NJ, PA, NYC area.

ACT! CRM Helps You Help Clients.
Discover how ACT! Database automation can make your life easier. Learn More.

LogMeIn Rescue: Fast and Easy Remote Control
LogMeIn Rescue is easy for you and them. Ad-hoc remote support for only $99/month. Reliable and feature-rich service saves hours. Free trial (no credit card required), no up-front costs, no commitment.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.