-
Marketplace
-
Channel Resources
Articles from this Site
Experian QAS Offers Newest Version of Core Batch Engine
DataFlux Unveils Improved Master Data Management Solution
Pitney Bowes Group 1 Software Unveils AuraT
Experian QAS Announced QAS Email and Phone
Rotary International Selects DataFlux
White Papers
Data Warehousing Ensuring Data Integrity
Making Data Work: Addressing Data Quality at the Enterprise Level
Can your SharePoint Backup Harm Your Business?
The Value Behind Integrity
Building Profitable Customer Relationships and Personalized Retention Strategies
Web Seminars
Master Data Management: Best Practices for Success
Getting In Synch: Creative Ways to Reconcile Data Between Apps
Closing the Loop: Real-Time Event Detection and Response
Books
Corporate Information Factory, 2nd Edition
The Data Warehouse Challenge: Taming Data Chaos
Data Quality for the Information Age
Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits
Metadata Management for Information Control and Business Success
Business Intelligence: The Dirty (and Costly) Little Secret of Bad Data
Poor data quality is insidious. The phrase "garbage in, garbage out" summarizes the problem, but the real-world extent of data corruption and the red-ink trail it leaves behind are not always obvious. Worse, the systems that are undermined by poor data are often those responsible for masking the problem. While a sophisticated business intelligence (BI) system can tell in detail what data says, it cannot tell if data is lying. Data quality is a necessary element to successful BI.
Any company can produce trending and segmentation reports that look complete, but if individual customers are counted multiple times because of data discrepancies, the reports will be inaccurate. Cost analyzes can be skewed because of delayed billing, returns or erroneous shipping due to data inaccuracies. Customer relationship management (CRM) and marketing functions can be negatively impacted by the inability to profile customers or identify them across transactions.
Ironically, the enterprise application integration (EAI) that can make BI so effective can actually speed the rate of corruption in databases. For example, Web data is notoriously less accurate and more complex and varied than call center data. Co- mingling of bad data across systems will increase data corruption in relatively accurate databases. Experts estimate that about two percent of accurate records can corrupt each month with no data quality system in place. Thats the equivalent of 20,000 corrupt records monthly in a 1 million record database. With data pulled from the Web, the numbers can grow dramatically. According to some industry experts, including the META Group, some e- businesses estimate that only 10 percent of their records are accurate. If those records are populating the same repository that accounting systems access, the impact on billing and revenue could be dramatic.
Little or none of the underlying corruption is likely to be discernable in BI reports. Moreover, because end users tend to request reports along narrow lines of responsibility, few will have the opportunity to observe inconsistencies that might appear in cross-departmental reports. The more flexible, specific and user friendly the report, the less likely it is that the end user will deduce data-level corruption.
To prevent data quality from undermining BI, a data quality solution must be a major component of any BI initiative. Many organizations deprioritize data quality issues because they lack perceived value, according to a Gartner report on business intelligence published last year. In reality, however, bad data is almost certainly undermining the advantage of BI, unless the organization has an enterprise-class data quality solution in place.
Trouble at Every Touchpoint
Most companies dont fully grasp the extent to which bad data jeopardizes business performance. Although 81 percent of managers surveyed in 1999 said data quality was the top IT priority for the next year, experts continue to estimate that up to 70 percent of data warehousing projects fail because users reject them as unreliable. Generally these failures are not technical, but practical. The technology works, but data warehouse users cant trust data output; therefore, projects fail to meet expectations.
In a 2001 data management survey by PricewaterhouseCoopers, fully 75 percent of respondents reported significant problems, costs or losses related to bad data.
The statistics are surprising only if you underestimate the degree to which most data sources inherently allow corruption and how many of those sources exist. In reality, low-quality data is pervasive and insidious, and the sources of bad data are irrepressible. From the moment it is input, data tends to meet only operational requirements, which often are completely different from strategic or tactical requirements. Consider just a few common ways that bad data enters systems:
- Call center operators take short cuts, entering default or fake values in tracking systems to increase call turnover rates.
- Online customers submit false information in Web forms to protect their privacy.
- Data from multiple sources and formats conforms to varying metadata standards
- Point-of-sale systems submit European and U.S. dates in different formats (i.e., mm/dd/yy vs. dd/mm/yy) to a centralized BI system.
- Separate operational systems contain customer data with conflicting spellings and varied information subsets.
- Third- party (vendor) data contains outdated customer addresses and phone numbers.
In fact, many organizations have far more customer numbers than actual customers (more vendor numbers than vendors and even more employee numbers than employees) as well as vendor statistics although they might never know it, according to a report by the Cutter Consortium in 2000. Without a data quality solution, duplicate records remain identified as separate customers and outdated duplicates might never be reconciled.
Even systems that prevent strict duplication can easily be fooled by slight variations in customer information. Duplicate customer records fracture the customer view, spurring redundant mailings, erroneous analysis and perhaps most significantly higher fraud incidence. Only the ability to consistently identify customers between interactions and reconcile a single, unified customer view from varying records can prevent the increasingly high operational costs associated with a fractured view.
Data quality problems are dramatically worse in online data sources. Information from Internet sources is typically more complex and unpredictable than from other sources. As noted earlier, some dot-coms that collect consumer data online have reported that only one in ten of their user accounts are valid, the balance being duplicates caused by re-registrations and deliberate misinformation to obscure private information, notes META Group analyst Doug Laney in a report from 2000. Database marketing, data warehousing and enterprise integration solutions that enable CRM to benefit business practices also introduce new data quality challenges. Each new data entry can degrade the accuracy of all organizational data. Once bad data enters a system it can be practically impossible to extract.
For example, a date entered as 6/1/02 could indicate June or January, depending on whether the information was input in the U.S. or Europe. The source system would interpret the date correctly, but once the data is integrated into a data warehouse, the date might be subject to different business rules. The potential misinterpretation could dramatically impact analysis by skewing sales reports for all months, reflecting inaccurate promotion-response rates and more.
Incorrect business data (i.e., non-name/address customer data) is particularly devastating to BI. Even the most sophisticated analysis will be wrong if it is based on inaccurate customer counts, sales or marketing information, demographic or psychographic data, industry classification or any other bad information.
Ultimately this can lead businesses to ignore profitable market segments, waste revenue marketing to customers that dont exist, dismiss cross- and up-sales opportunities and miss important trends in customer attrition and churn. Over time, data inaccuracies that lead to relatively minor tactical errors can snowball into long-term segmentation inefficiencies and trending errors that produce major strategic missteps.
Functional Criteria
Without high data quality, BI cannot reflect the realities of business. Specifically, analytical processes must be able to recognize statistically significant patterns in vast volumes of data to provide useful information. A data quality solution creates the accurate, consistent and unified views of customers and their relationships (for example, multiple contacts within a business or household) that is the foundation of analytical pattern recognition.
Because BI is most effective as an enterprise-wide process, it demands enterprise-class capabilities in a data quality solution. For holistic customer knowledge based on diverse customer touchpoints and channels, the data quality process must address disparate, even incompatible data sources across the enterprise. Empowering BI with data that supports accurate business and analytical processes requires a data quality solution that can meet a variety of key functional criteria:
- Accurate, drill-down data analysis and reporting,
- Real-time processing to ensure incoming data meets organizational data quality standard,
- Replicable processes to facilitate enterprise rollouts without substantial services investments, and
- User-customizable business rules to ensure ongoing correlation of the data quality to evolving data processes.
These capabilities are critical, because even minor data quality problems can add up to major inefficiencies. For example, senior managers at a large telecommunications company moved from a project-oriented to an enterprise-wide data quality perspective when they fully understood the stakes. The manager of the companys integrated customer view initiative during its data quality implementation noted that in a market with potentially 100 million customers, a 99 percent success rate still means you have 1 million inaccurate records.
Beyond data profiling, cleansing, reengineering and relationship identification that comprises a robust data quality solution, the three key functional criteria needed to manifest data quality enterprise-wide include batch and real-time data processing, international data quality processing, and business and customer data processing
Batch and Real- Time Data Processing
High-performance batch processing creates a single source of high quality data available throughout the enterprise by quickly processing millions of records to create a clean central data file. Real-time online data processing prevents new data from corrupting the reliable data source by cleansing data as it enters the company through various channels. Together, these operations create and maintain a consistently accurate, relevant and reliable data foundation for all enterprise activities.
International Data Processing
International data quality processing supports global growth and allows BI to reflect the entire business, even international data sources. As businesses cross more borders, their ability to gain a unified view of customers from around the world, understand local languages and cultural differences and acquire street-level address verification play a growing role in the organizations ability to understand the business itself
True international data processing requires both a content- and context-oriented approach to data comprehension. The ability to recognize an alphabet (i.e., character set or script) is obviously a prerequisite for data processing, making Unicode enablement a necessary component of international data quality. Unicode allows software to understand the worlds major languages from English to Chinese to Hebrew to Cyrillic and others. Unicode alone cannot guarantee data comprehension; contextual understanding is equally important. Only through context can a data quality application discern which of seven meanings for a particular Japanese character is correct. Only contextual comprehension can reveal the inaccuracy of data elements due to their relative placement.
Business Data Processing
Sophisticated BI systems rely on exceptionally diverse data. In fact, business data such as marketing and sales codes, phone numbers and account numbers are often just as important to total business comprehension as customer name and address data.
Business data processing, the ability to ensure the high quality of data beyond names and addresses, provides a more flexible and complete customer and business understanding. By providing more and more reliable information about customers and accounts, it promotes more granular segmentation and facilitates more revealing analytics.
Companies relying on data understand an accurate analysis of customers is impossible if it is based on inaccurate information. A complete, complex and correct customer view based on information from across the enterprise is indispensable in BI for a variety of reasons. First, clear views of customers helps consistently identify a single partner, reseller or supplier with multiple individual contacts; it also helps identify individual purchasing entities based on multiple departmental contacts as well as helps identify individual customers with complex buying patterns through multiple channels.
While it is virtually impossible to eliminate the variation that causes data discrepancies, prevent data-entry errors or guarantee third-party data quality, it is possible to ensure that all data entering a companys data systems can contribute to the organizations total business value. A robust data quality solution provides the complete, accurate and multilevel customer view that is the basis for realistic decision-making and accurate customer assessments.
ROI and Data Quality
The return on investment (ROI) from effective enterprise-wide data quality flows from every point in the enterprise that uses data. A data quality solution aligns customer views and customers, even as customers, the organization and its operational and markets change. By having a strong data foundation based on accurate records, businesses are empowered to communicate effectively, target accurately and enact fully.
Beyond underpinning BI, data quality supports more effective CRM, enables more accurate invoicing and shipping, provides more potent fraud detection (particularly in e-business) and many other immediate and long-term benefits. For example, data quality within a companys billing system can impact both revenue collection and reporting. Mergers and acquisitions often change customer-billing contacts without regard for the errors that may be introduced when integrating data from multiple sources, which in turn could have a direct impact on invoicing and payment receipts. In an instance like this, the value of the invoiced amount is not the only loss. Billing disputes can easily and immediately alienate previously loyal customers; meanwhile, resolving disputes can cost more than the invoice is worth.
Another important area to consider for data accuracy is customer relationship management (CRM). Fundamentally the data quality argument for customer intelligence mirrors that of data quality and BI. Without a data quality solution in place, a companys executives cannot assume they are getting a complete and accurate customer view. Virtually all CRM functions from direct mailings to personalized customer support to consistency in meeting customer privacy requirements can be made more accurate, more efficient and less costly with support from a unified customer view across accounts and across the enterprise.
Similarly, in customer service centers and call centers, a good customer experience for both the company and customer is one that takes place smoothly, accurately and quickly. More accurate, complete and relevant customer data provided by a data quality solution can minimize the cost of call center service, improving the ROI per call center operator by providing better response times for customer requests, reduced cost of sales, more responsive technical support, increased understanding of buying patterns and motivations, and a stronger recognition of cross- and up-sell windows.
In the area of fraud detection, the need for clean data is crucial. In certain industries, such as financial services, fraud is a major liability. In the U.S. alone, false credit card applications accounted for more than $41 million in lost revenue in 1999. The ability to identify an individual across visits and touchpoints despite variations in key identity elements can be a major component of a fraud reduction process. Within banking, insurance and other industries, identification of name variations and aliases, for example, is required by law to uphold U.S. sanctions (OFAC compliance). Through potent record matching, relationship identification and identity recognition enterprise-wide data quality (EWDQ) makes both fraud detection and OFAC compliance much more effective.
While most companies would agree they need clean, accurate data to populate their business intelligence systems, its not always easy to understand how to get there. For starters, companies that have no systems in place for data quality should start researching to find a solid solution. First, it is important to ensure that any solution employed has the capability to support EWDQ. Some systems use a callable, component-based architecture that engages customer information for batch, transaction and online processing. Its also critical to ensure universal access to data and application files so the information can easily be shared throughout the organization.
Another important consideration is whether the system is capable of processing data from multiple channel sources, which allows the company to interact with its customers in the ways they prefer. Also important is the ability to conduct real-time data quality management, allowing companies to identify, standardize, cleanse and enrich data as it arrives and before bad data can enter the corporations databases. Other considerations include whether the system is flexible and customizable to meet a companys changing needs, and whether it is scalable to handle large and complex data files and process data from multiple parallel data sources. For companies with plans to expand internationally or that have international operations, international data support is also crucial.
While it is clear that even minor data quality problems can add up to major inefficiencies in a companys ongoing BI initiatives, the industry has tools available today that can put those worries aside so companies can spend more time focusing on their core business.
Len Dubois is the vice president of Marketing for the Trillium Software division of Harte-Hanks LLC. He has been with Harte-Hanks for 10 years. Dubois is responsible for the development and execution of worldwide marketing initiatives for Trillium Software. He has authored numerous articles on data quality and CRM and presents frequently at national and international conferences. Dubois can be reached at ldubois@trilliumsoftware.com.
For more information on related topics, visit the following channels:


