-
Marketplace
-
Channel Resources
Articles from this Site
Plymouth State University Gets the Picture with Rapid Insight
Egenera Simplifies Management of Virtual and Physical Resources
Loblaw Companies Limited Selects JDA Software
Baja Fresh Awards Five-Year Contract to Casdex, Inc.
City of Tallahassee Law Enforcement Implements Mydials
White Papers
Pragmatic Approach to Compliance Data Collation
Informatica - Handling Variable Length Files Using XML
Putting Metadata to Work to Achieve the Goals of Data Governance
Enterprise Information Management - Insights and Strategies into the Direction of EIM
Automated Analysis Technology
Web Seminars
Making the Business Case for Predictive Analytics: Innovative Strategies for Maximizing ROI
Master Data Management: Best Practices for Success
Modeling Unstructured Data
Creative Strategies for Achieving 24/7 Uptime
The Economy Catalyst: Four Pillars of Strategic Storage
Books
Data Management: Databases and Organizations, 3rd Edition
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
Effective Databases for Text & Document Management
Mobile Handheld Devices - Enabling Enterprise Communications and Data Management
Mobile Data Management (MDM 2002), 3rd International Conference
Would you recommend deleting inconsistent data in our sources?
Q: Some of the data in our sources is inconsistent. Would you recommend deleting it, and would the deletion be best done in the ETL (extract, transform and load) process?
Sid Adelman's Answer:
You need to find out what the data owner wants to do with this inconsistent data. The data owner may want it available even though it is inconsistent.
Chuck Kelley's Answer:
It depends on whether you need that data or not. If the data is not needed, then you can delete it. However, if it is not needed, why is it in your data structures? I would, therefore, either fix the data in the source system and allow the data to move into the data structures or delete the row in the source system and have that drive the "deletion" from the data structures.
Clay Rehm's Answer:
I would not delete the source data. I would involve your business users to identify and correct the data. If this is not possible, or time does not permit this to be done, then your last resort would be to remove the data only after it has been backed up and documented, and this information and direction has been approved by your users and management.
Adrienne Tannenbaum's Answer:
Here's the first question: How do you know which data is "right"? That will totally determine what to do with the data that is "wrong."
Here's the second question: Does the "right" data have "right" data associated with it (logically and physically - such as a purchasing record for the "right" customer)? That also determines your answer.
Here's the third question (which I put my bets on): Does the "right data" sometimes have "right data" associated with it and sometimes have "wrong data" associated with it? Are the answers inconsistent across the sources? I think your answer is probably "yes."
If this is the case, then you must set up a "master" or "correct" view. Your ETL process would work on converting all of the "wrong data" to this view, if possible - you determine what specific fields must be correct within an incoming record, and then the ETL process can evaluate and determine what is "wrong" or "not worth it."
In general, any data that is too "wrong" for inclusion or for translation is flagged and kept aside. Someone is usually responsible for dealing with this data, and makes the determination whether to correct or delete.
Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or visit his Web site at www.sidadelman.com.
Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.
Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, metadata and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a co-author of the book, Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science and a Masters Degree in Software Engineering from Carroll College. He can be reached at clay.rehm@rehmtech.com.
Adrienne Tannenbaum is president of Database Design Solutions, Inc. (www.dbdsolutions.com), a New Jersey-based consulting firm specializing in the revitalization of corporate data. The firm focuses on data issues within large organizations and supports all data reconstruction efforts with a solid meta data backbone. Tannenbaum is the author of two popular meta data-focused books: Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information on Demand (2001, Addison Wesley) and Implementing a Corporate Repository (1994, Wiley).
For more information on related topics, visit the following channels:


