Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

What is the best solution for a data capture issue, where the source tables don't have proper timestamp capture for the updates?

    Ask The Experts published in DMReview.com
April 10, 2006
  By Tom Haughey and Chuck Kelley

Q: We are in the design phase of a data warehouse, stuck with a change data capture issue, where the source system is JDE and the source tables don't have proper timestamp capture for the updates and inserts, which deprives us of proper incremental load capture. It is not an option to use the source transaction logs for change data capture. A third option of full-table comparison leads to major overhead, so even that is ruled out. What would be the best solution other than the three options above? Do you have any suggestions regarding the "check sum" method? How feasible is the method and how would we achieve it?

Chuck Kelley's Answer:

I did some testing and wrote about the check sum method about three years ago. I think that it works and may be the only solution you have left to do. You may be able to use a different algorithm (I like the MD5) to do this. Also, the full table comparison should not be ruled out. There are great techniques that can deal with this, unless you are running on a Windows server.

Tom Haughey's Answer:

Figure 1 is a summary of different change data capture methods, modified from Microsoft. The check sum method is not among them. Perhaps you can find an appropriate method that will suit your needs.

Figure 1: Data Capture Methods

In the check sum method, you create a hash check-sum for the existing records and stage it. You only need to stage the key and the check-sum for the existing records, say a dimension. A check-sum is generated for each incoming record, based on columns whose change you care about. It is possible to generate a check-sum on the entire record as long as you care about any change in the entire record. The new check-sum is compared with the staged one. Clearly, if the input extract is huge, there is some level of performance issue in generating the incoming check-sum. If however the keys of both are indexed, and the check-sum included in a covering index in both, performance of the compare can be improved. The bottleneck is the volume of input records and the number of columns you need to use to generate the check-sum, which could only be those whose change you are interested in.


For more information on related topics visit the following related portals...
Data Management and DW Administration, Mgmt., Performance.

Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.

Chuck Kelley is a senior architect in the business intelligence practice for Hitachi Consulting (www.HitachiConsulting.com), a globally recognized leader in delivering value-based business and IT Solutions. Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored three books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@hitachiconsulting.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Ask The Experts E-Mail This Ask The Experts
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.