Data Warehousing Lessons Learned:
The Meaning of Data Quality
Philosophers, marketing executives, linguists and scientists have struggled with the distinctions between data, information and knowledge for decades if not centuries. The suggestion is to take a practical approach to defining these distinctions, but to do so in way that preserves consistency with both logic and experience. We know that lack of data quality costs money - misdirected mail is returned, effort is wasted, rework is incurred, sales and customers are lost and inventory outages occur. Quality implies differences, differences imply distinctions of value and distinctions of value imply market value. Market value implies the dollar value. Like so many things, information quality is a bootstrap operation requiring iteration, a process of learning from one's mistakes and commitment to business results.
When stated out of context, "data quality" is a misnomer. Data in itself is meaningless, data is what is given - it is basic raw material. Whether unstructured or structured content, it is data. Data itself is worthless. It is what you do with the data that has value. Data is the content; and when it is structured in such a way as to reduce uncertainty, then it has value as information. Thus, data plus structure produces information. Information provides differences and distinctions that reduce uncertainty.
A simple example is that the attribute of gender tells us something about a customer. If I am confident that the customer is either male or female but I am not sure which one, then I have not reduced my uncertainty one bit. I do not have any more information than when I started. Whereas if I have the distinction male/female and, literally, the bit of information that the customer is male, then I will plan on selling him a tie rather than a dress. The data without the structure is meaningless; the structure without the data is empty. The structure - the simple male/female distinction - is not information in itself. The application of the structure to the data yields information and provides a reduction in uncertainty.
Figure 1: From Data to Information
A working definition of information and how to transform dumb data into quality information is depicted in Figure 1. As the attributes of the data are structured according to a defined process for transforming the data along the three high-level dimensions of objectivity, usability and trustworthiness, the information quality improves in precisely those dimensions. In particular, information = objective (data) + usable (data) + trustworthy (data). Knowledge is not on the same continuum as data and information. The commitment needed might be represented as a point in one of the quadrants or a circle encompassing the entire diagram. Knowledge = commitment (information).
From a business perspective, knowledge is qualitatively different than information. There is a gap separating information, no matter how high the quality, from knowledge. The "best available information" never results in knowledge without something additional. That something is commitment - commitment to goals relevant to the business enterprise such as customer service, launching a new product or attaining operational excellence. (Knowledge = commitment (information).)
Data, information and knowledge are overlapping categories that describe different aspects of the world of business. They are different ways of describing the same phenomena. One person's data is another's information and vice versa. Yet the distinctions are valid or they would not exist in the first place. Data is what is given - subjective, uncertain and unclear in its use or interpretation. Add structure to data in the interest of reducing uncertainty and the result is information. (Information = structure (data).) Information is built out of data by applying structure, categories and processes - including data models, functional transformations (ETL), queries and representation - in a process that generates increasing objectivity, usability and certainty. Each of these dimensions is further decomposed. Objectivity includes aspects such as accuracy, existence, causality, consistency, timeliness, completeness, unambiguousness and precision. Usability includes ease of interpretation, availability and security. Trustworthiness includes credibility, believability and the accumulated lessons of experience. Start by employing data profiling to build an inventory of data assets and evaluate the state of information quality within the enterprise on a system-by-system basis but from an enterprise perspective. Be prepared for "roll-up-the-sleeves" hard work. This is likely to be both a top-down and bottom-up task because the impact on information quality of relations between systems can only be evaluated by including both sides of the interface. Thus, information quality is improved. However, regardless of how much it is improved and how certain it is, information is still not knowledge. To get knowledge from information, something else - a commitment to a business decision - must be added.
For more information on related topics visit the following related portals...
Lou Agosta, Ph.D., joined IBM WorldWide Business Intelligence Solutions in August 2005 as a BI strategist focusing on competitive dynamics. He is a former industry analyst with Giga Information Group, has served as an enterprise consultant with Greenbrier & Russel and has worked in the trenches as a database administrator in prior careers. His book The Essential Guide to Data Warehousing is published by Prentice Hall. Agosta may be reached at LoAgosta@us.ibm.com.
Provided by IndustryBrains
|Manage Data Center from Virtually Anywhere!|
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.
|Backup SQL Server or Exchange Continuously|
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.
|Data Mining: Levels I, II & III|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Free EII Buyer's Guide|
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.
|KOM Networks Archiving and Data Storage|
KOM Networks, a leader in archiving and data storage for more that 37 years, offers organizations a cost effective means to secure their growing data stores.
|Click here to advertise in this space|