|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Plain English About Information Quality:
Thanks to a reader who inspired this month's column by asking about the meaning of the terms "standardization" and "remediation" as applied to the work of creating data format standards for the data element "Organization Name."
Two problems exist with an attribute such as Organization Name. (While this column discusses Organization Name, the principles also apply to other formatted text fields such as Product Name, Asset Name, Facility Name and other structured text attributes.)
The problems this creates are horrific. Databases with 20 to 40 percent duplication are not unheard of. The real problem is the organization's inability to understand its customers or partners, the process failure caused and the complexity created by having multiple records that represent a single real-world object.
The problems of matching and consolidating records across multiple source files are exceedingly complex, compounded when source files also have different lengths and formats for the data.
The reader's question revolved around when creating a common name value for the different occurrences really "standardization" and when is it "remediation." To answer this, we must define the following terms:
Standardization means that we define the data and process in a way to create data that minimizes subsequent scrap and rework. To accomplish this, we may establish "standards" in one area of control, such as a data warehouse, and then improve the upstream processes to comply with the new standards.
To qualify as true data standards, they must apply across the entire enterprise or scope of the data impact. If each business area has different "standards" for creating organization name, does the enterprise have data standards? Let's look at a business analogy. If every business area created its own standard budget accounts for developing and reporting on its budget expenditures, does the enterprise have a standard general ledger chart of accounts? Of course not. Is there an enterprise standard format for defining new budget codes? You bet. Do business area managers follow that when creating codes for new accounts? Absolutely.
The same is true for data standards. Data "standards" are standards if they apply to the full scope of the organization affected by that information type, whether it is organization, product or asset.
The root cause of inconsistent organization-name formats is twofold: 1) failure to recognize that organization name is not a single-value fact in order to model and design the database properly, and 2) failure to standardize the create processes and provide standards and guidelines, including training for correct data creation.
Some questions to ask:
If the answer to both of these questions is yes, work with this group to create or revise data standards and guidelines, including format structure.
If the answer to either of these questions is no, then create a data standards team to create universal and global data standards and guidelines.
Once data standards have been defined and adopted by representatives of the information stakeholders, document them in an enterprise repository or data dictionary that is accessible to all knowledge-workers, and train information producers and knowledge-workers on the data standards and guidelines.
If legacy processes and applications cannot support the new standards, analyze how to capture data close to the standardized data; and map that data value to the standard data value in migrating data for downstream standardized databases.
When data standards are in place, prioritize any data correction to existing data based upon costs of the nonstandard data (cost of process failure, inability to reconcile data, rework activities required and miscommunication, etc.), the cost of standardizing it and the resources available to convert it.
Data standards and guidelines for creating data values are part of the "information product specification" along with the definition and business rules. They are an important part of data definition for information quality improvement.
What do you think?
1. DuPont is a registered trademark of EI du Pont de Nemours and Company.
2. Imai, Masaaki. Gemba Kaizen. New York: McGraw-Hill, 1997, p. xxviii.
3. American Heritage Dictionary.
Larry P. English is president and principal of INFORMATION IMPACT International, Inc., Brentwood, Tennessee, and the author of the widely acclaimed book, Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. English is cofounder of the International Association for Information and Data Quality (www.iaidq.org). English is an internationally recognized speaker, teacher, consultant and author and may be reached at firstname.lastname@example.org or through his Web site at www.infoimpact.com. For more on how to improve your IQ principles and techniques, and prevent your organization from wasting millions in information scrap and rework, join the IAIDQ (visit www.iaidq.org).
|View Full Magazine Issue|
|E-Mail This Column|