-
Marketplace
-
Channel Resources
Articles from this Site
With respect to ETL integration from one system to the other, what steps needs to be considered when planning an aquisition or merger?
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
How can I enhance the productivity of my fact table load?
Please give some suggestions to prepare an estimation effort for our ETL process.
How important is data reconciliation for an ETL application?
White Papers
Sunopsis Integration Suite: An Evaluation by Bloor Research
Third Generation ETL: Delivering the Best Performance
Advanced ETL with Pentaho Data Integration
Evaluating Real-Time Data Integration Solutions
To V or Not To V: Business Intelligence Gets Virtual
Books
Should the DW use current day as the effective date?
| Q: |
Question:When extracting daily from a source table that does not maintain history, should the DW (normalized model, not mart) use current day as the effective date (YMD), date time, a day or day time for the actual extract day (see note below) or a modified date on the source record? Do we use a date or a date and time?
|
| A: |
Evan Levy's Answer: I typically prefer to include two date/time fields in the table: load date/time and effective date/time. It's always important to identify when the data was loaded. The effective date/time field should reflect the origination or creation date of the data. It's important to remember that you may not be able to load a single day's data within the expected timeframe. There's always reasons this could happen; system outages, file transfer problems, source system difficulties are just a few. I worked at a client that pulled data from five different time zones throughout the day. Because of operational or network issues, a day's data sometimes sometimes loaded two or three days later. Having both date/time fields allowed for these loading circumstances - and kept data loading and management simple. Les Barbusinski's Answer: I assume you are talking about daily "snapshots." In such cases, you should use the date of the "snapshot," not a timestamp. A timestamp would vary for each row captured by the snapshot, making it difficult to tie together all of the rows in a particular snapshot. Also, the "snapshot" date should reflect the start of the nightly schedule rather than the actual time of the "snapshot" ... thus avoiding confusion if-and-when the ETL process spans the midnight hour. Hope this helps. |
Les Barbusinski is vice president of technology and co-founder of Digital Symmetry, LLC, a consulting firm that specializes in the design and development of data warehousing and business intelligence solutions. He has more than 20 years of experience in data warehouse and operational systems development and provides hands-on expertise in data warehouse design, development and project management. Les can be reached at dwexpert@dsym.com.
Evan Levy is a partner and co-founder of Baseline Consulting Group, a multivendor systems integration and consulting firm. As the partner in charge of Baselines largest practice, Levy leads both executives and practitioners in delivering technology solutions that help business users make better decisions. He has led strategic technology implementations at commercial and public sector organizations and advises vendors on their product development and delivery strategies. Levy has been published in a wide array of industry magazines and has lectured on a range of technology delivery experiences at leading conferences and vendor events. He has been a featured speaker at the Marcus Evans Analytical CRM symposium, DCIs Data Warehousing conference, the CRM Association, DAMA International, the AMA and the Data Warehousing Institute. His current work involves delivering and lecturing extensively on the topic of data integration. You can contact him at evanlevy@baseline-consulting.com.
For more information on related topics, visit the following channels:


