What is the best solution for a data capture issue, where the source tables don't have proper timestamp capture for the updates?
Q: We are in the design phase of a data warehouse, stuck with a change data capture issue, where the source system is JDE and the source tables don't have proper timestamp capture for the updates and inserts, which deprives us of proper incremental load capture. It is not an option to use the source transaction logs for change data capture. A third option of full-table comparison leads to major overhead, so even that is ruled out. What would be the best solution other than the three options above? Do you have any suggestions regarding the "check sum" method? How feasible is the method and how would we achieve it? Chuck Kelley's Answer:
I did some testing and wrote about the check sum method about three years ago. I think that it works and may be the only solution you have left to do. You may be able to use a different algorithm (I like the MD5) to do this. Also, the full table comparison should not be ruled out. There are great techniques that can deal with this, unless you are running on a Windows server. Tom Haughey's Answer:
Figure 1 is a summary of different change data capture methods, modified from Microsoft. The check sum method is not among them. Perhaps you can find an appropriate method that will suit your needs.
Figure 1: Data Capture Methods
In the check sum method, you create a hash check-sum for the existing records and stage it. You only need to stage the key and the check-sum for the existing records, say a dimension. A check-sum is generated for each incoming record, based on columns whose change you care about. It is possible to generate a check-sum on the entire record as long as you care about any change in the entire record. The new check-sum is compared with the staged one. Clearly, if the input extract is huge, there is some level of performance issue in generating the incoming check-sum. If however the keys of both are indexed, and the check-sum included in a covering index in both, performance of the compare can be improved. The bottleneck is the volume of input records and the number of columns you need to use to generate the check-sum, which could only be those whose change you are interested in.
...............................................................................
For more information on related topics visit the following related portals...
Data Management and
DW Administration, Mgmt., Performance.
Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.
Chuck Kelley is a senior architect in the business intelligence practice for Hitachi Consulting (www.HitachiConsulting.com), a globally recognized leader in delivering value-based business and IT Solutions. Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored three books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@hitachiconsulting.com.
 | Solutions Marketplace
Provided by IndustryBrains
|  | Data Validation Tools: FREE Trial Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

| Speed Databases 2500% - World's Fastest Storage Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

| Design Databases with ER/Studio: Free Trial ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

| Data Mining: Levels I, II & III Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

| Free EII Buyer's Guide Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

| Click here to advertise in this space |
|
|