Data Quality - It Is All About Not Being Worse Than Anyone Else
At my job, data quality has suddenly become an issue. For years we have ignored this question, even though we knew that our data quality was poor and resulted in less than optimal customer service. Our salespeople, for example, are often very frustrated. They feel that it is embarrassing and unprofessional to meet a customer and give them incorrect information. This happened because our systems simply did not handle the data correctly. Nevertheless, we just said for years that it was not a major problem, as we were not worse than anyone else in our business. We were not better either.
All of a sudden we have decided that our data quality should be improved. Our marketing division finally got tired of having 20 percent of our mailings returned for incorrect customer addresses. Somehow they were not pleased either to see that the average age for our customers is 335 years. Fifteen percent of our customers have default birth date 00000101, as we do not know the real date. This is enough to screw up the average completely. Furthermore, according to our systems, 25 percent of our customers do not have a sex. They may have sex, though. And we have customers paying, but we do not know for what product. This is at least a dream-situation; customers that simply pay, period.
Now that we have decided to finally fix our data quality, how will we go about? We have created an operational data store (ODS), and we check that no data that enters it is incorrect according to certain business rules. For example, we do not accept wrong dates or empty fields when it is mandatory to fill them in. When this happens nevertheless, we will send back a message to the people responsible for entering the data, telling them to do it correctly. This process has been greatly elaborated but not yet put in place. And we have not told our business users about it. Won't they get surprised when they get the messages telling them to correct their data! No change management here. We simply hope that the purpose is so good that all our business users will automatically accept more work, correcting the data for sales reps they do not know (we are a pretty big organization). Heck, why bother telling our business users when we do not know if this whole thing will work in the first place? We will wait, crossing our thumbs and holding our toes. Or just be on vacation once we launch the whole business. In any case, we know that we are not worse than anyone else in our business when it comes to data quality.
This whole data quality project will be done without any benchmarks set on a predetermined level of quality we must have in order to please our different business communities. In other words, what should the tolerance toward unclean data be? Actually, we have not even thought about the fact that different activities demand different levels of data quality. In any case, we intend to cleanse all the data that goes into our ODS at a rate of 100 percent. Given that most of our transactional data is destined to pass the ODS and then move to data marts and other data warehouses, we should get all the data very clean. This should please our BI users who will work with this data. Please do not tell us that we are over-ambitious in our data quality problem. And please do not ask us if we have a meta data tool for all this. If we document our data quality activities in Microsoft Word, then it becomes a meta data tool, right?
Now that we have finally decided to undertake a data quality project, we do not want to be discouraged. Therefore, we just look at the positive side and stay optimistic. As we have not even thought about the tolerable level of dirty data and how it may differ between various business units, we just say that we shall clean it all. And as we have not thought about these tolerable levels, we have not gotten to ask ourselves who actually owns the data. Instead the imposition will be on all the business users to clean whatever dirty data they may have entered. Given the importance of data quality, we think that data quality responsibilities should be widespread. Also, it makes us more comfortable if everyone and no one is responsible once our sales reps in the field start to cry again about poor data quality.
Anyway, should we not be able to reach our ambitious targets, let the salespeople complain! They are used to this situation and as all IT people know, the users will always complain, no matter what. We shall just have to tell them that we are not worse than anyone else. Or better.
For more information on related topics visit the following related portals...
Gabriel Fuchs is a senior consultant with IBM. His column Reality IT takes an ironic look at what real-world IT solutions often look like - for better or for worse. The ideas and thoughts expressed in this column are based on Fuch's own personal experience and imagination, and do not reflect the situation at IBM. He can be reached at firstname.lastname@example.org.
Provided by IndustryBrains
|Design Databases with ER/Studio: Free Trial|
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.
|Data Quality Tools, Affordable and Accurate|
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.
|Free EII Buyer's Guide|
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.
|dotDefender protects sites against Web attacks|
30-day evaluation period for dotDefender, a high-end cost-effective security solution for web servers that protects against a broad range of attacks, is now available. dotdefender supports Apache, IIS and iPlanet Web servers and all Linux OS's.
|Click here to advertise in this space|