|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Ask the Experts Archive
ARCHIVE OF QUESTIONS & ANSWERS FOR DATA QUALITY
Please define data reconciliation and database reconciliation. I also need a definition of data synchronization.
Could you please provide me with some resources/direction for data profiling tools that will give me a better understanding of the quality of my data?
I'm trying to make a benchmarking about data quality
software. What are the most important characteristics to compare?
It has been suggested that we can skip user acceptance testing of the Dimension tables in order to speed up implementation. This is regardless of whether the tables are new or
existing. I am adamantly opposed to the idea. I would appreciate your
thoughts on the subject.
If a piece of information traverses several applications before
placement in the DW is the data quality of that information more
in question than data that arrives directly from the
operational source? If so, is there research that supports this?
If a given DW field/element/column can be updated from several
operational sources, it would seem that the risk of erroneous
data is heightened. Is this true and if so is there an
authoritative source that I can quote?
According to Mr.Inmon, data cleansing is most
important phase for data warehousing. He says that it's almost 60
percent of the whole DWH project. Do you have any special
handle the data cleansing process?
Is there a "known" list of data quality principles?
As part of a corporate-wide, one-instance implementation
of ERP (SAP R/3) we are rethinking master data maintenance and
quality ? both the process and how to organize to implement the
process. Initially, it would seem major changes are needed
the areas of customer, material and related financial data. I am
with Larry English's work. What other sources are there to
efforts? Any body of knowledge of real-life lessons learned,
We are working with a data warehouse project where we
extract data from the source (ETL) and create fact tables and
depending on them. Actually, the end users would query our cubes
their UI Cognos. I need quality assurance and I need to test the
application. Is there any tool available which can be useful to
If not, how should I approach the problem at hand of quality
It seems that there is very little academic research done in the
data warehouse area in spite of the need for this by the
business community. What areas in data warehousing can benefit
from academic research?
What kind of tools can you recommend to perform data auditing
and quality analysis?
Can you please direct me to a resource that touches on how you
place value, preferably financial value, on data, intellectual
and information quality management?
My question is what
is "recommended" related to the hardware configuration for
production environment. In the client/server development arena,
I am used
to the classical Production, Quality, Development environments
versioning etc. However, since we are "fixing" so many data
issues getting the data correct in "one" environment [i.e.,
challenge enough. What is typical? Are there research sources
you can point me to?
Data quality is a critical issue affecting data warehouses,
(DWs) yet there appears to be no sustainable solution with
regard to quality input into operational systems that ultimately
feed DWs. Is there any evidence to suggest tying bonuses or
other forms of incentive to pre-determined data quality hurdle
rates improves data quality at the input end, thereby
alleviating some of the dependency and additional expense of a
DW data cleansing tools?
Could you please give me an idea of the typical range on a per
record basis for a deduplication(data cleansing) project?
When we are cleaning data from a relational source, and some
values cannot be cleaned. Should it be put in an exception/error
table? What do you suggest the structure of this table be?
Can you please provide me with articles/white papers that discuss the quantified benefits of the application services like enterprise information portals, data cleansing and data duplication?
Should data be changed in the data warehouse and/or data mart making it different from the source system it came from? What are the pros and cons of making the change in the data warehouse/mart vs. the source system?
Where can we find information about the types of tests to apply in the databases of the data warehouse?
Is there a rule of thumb that would give me a 95 to 99 percent confidence level that the data in our warehouse is correct?
Do I need a data warehouse to do data mining?
We are interested in measuring the quality of the data populated in our data warehouse. Are there publications out there from which we may borrow some ideas?
How can we justify the startup cost of a data warehouse infrastructure (for migration, transformation, administration tools, reporting technologies and OLAP tools)?
What are the quality edits and procedures for loading warehouses and verifying data before releasing to the business groups?
Could you let me know how to tackle such large data warehousing design issues?
What is the typical rate within data warehousing for promoting changes that effect transformation and data cleansing for existing subject areas that are in production?
Where can I find a product that extracts data from any platform, cleanses the data and reformats it in a consistent manner?