-
Marketplace
-
Channel Resources
Articles from this Site
With respect to ETL integration from one system to the other, what steps needs to be considered when planning an aquisition or merger?
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
How can I enhance the productivity of my fact table load?
Please give some suggestions to prepare an estimation effort for our ETL process.
How important is data reconciliation for an ETL application?
White Papers
Sunopsis Integration Suite: An Evaluation by Bloor Research
Third Generation ETL: Delivering the Best Performance
Advanced ETL with Pentaho Data Integration
Evaluating Real-Time Data Integration Solutions
Books
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
Question: My question is twofold, but related to the use of tools as opposed to code. We have a small business intelligence (BI) team, and the members are very technical and very good analysts and developers. They do not believe that an extract, transform and load (ETL) or a data modeling tool makes sense in our environment. Could you please provide your views on the advantages and/or disadvantages of the use of a tool for ETL and data modeling versus code to provide the same functional capabilities?
Tom Haugheys Answer: This is really three entirely different questions, so I will address it as such.
1. Do you need an ETL tool or can you write your own ETL?
If an organization already has a large body of transformation programs written in COBOL or some other language, then I would stick with that strategy for now. If you have such a body of code, you probably have either procedures in place or a code generator that simplifies code creation. Many organizations successfully use what I call roll your own (RYO) code to do ETL. This can be a very effective solution in many situations. The advantages of ETL tools are:
- Retention of metadata.
If you do not have an existing body of ETL code and other templates in place, I would recommend going to the ETL tool. However, in my opinion, as long as you handle metadata efficiently in your current environment, I would consider sticking with your current RYO solution. This point needs emphasis. Good metadata (in whatever form it takes) is essential to the usability of a data warehouse. As long as this is provided, I see no immediate benefit in going to an ETL tool. However, consider also your future plans and the growth plans for the DW. If you anticipate a large growth in BI applications and in data sources, then conversion to an ETL too would be a sound investment and should be added to your DW strategy.
2. Do you need a data modeling tool in general?
-
There is no question about it. If you intend to be a serious data management organization, or especially a world-class data management organization, there is no other way. Im not saying which one, or even which type of one, or which features you need, but you absolutely need one or you cannot maturely deal with management of data and with database design. A smaller organization may be able to survive well with a simpler tool like Visio; a larger more mature organization needs a tool like PowerDesigner, ERwin, ERStudio, Visible or others.
3. Do you need a data modeling tool for ETL?
-
It depends on how you plan to do staging and transformation in your ETL. It is always useful to model any data structure, whether it is a permanent data store or a temporary data structure. Some ETL staging structures have special purposes in that they are not used for transaction processing or reporting but for the transformation and preparation of data. It is often acceptable to use various file structures rather than relational tables. There may be a good deal of inevitable redundancy and variability to these file structures, for example when the data is being sorted and aggregated. Consequently, data modeling may not seem like a natural choice. But I stick to my earlier point, it is always useful to model any data structure.
-
However, some structures really should be modeled, such as, those that are used in RI checking, shaping history, generating aggregates and managing surrogate keys. These should be modeled because there are significant relationships within these structures and between them and the output tables.
Sid Adelmans Answer: Data modeling tools are inexpensive and writing code to provide the same function is pointless. You have too much work to do to try to duplicate this capability.
With ETL tools, it depends on the complexity of your cleansing and transformations and the volatility of expected changes in the source and the target. If your ETL process is, as Larissa Moss puts it, just suck and plunk, you dont need an ETL tool. If you plan to make significant transformations, aggregations and cleansing, or if your sources are constantly changing and you are making serious changes to your data warehouse (DW) database, you will want an ETL tool and some ETL tools can help with performance when you are loading very large tables.
Chuck Kelleys Answer: Most likely they are young and/or do not do much documentation. Almost every site I go into (or have been to) suffers from the same view (ETL more than data modeling). Ask them why they are using SQL or C or Java or any other language for that matter. Why should they not just write it in machine code? The answer is that while in the short term it might take longer to develop, maintenance and documentation will be easier moving forward. Unless they want to be maintenance programmers the rest of their lives, I would think you should get them to start using a tool.
Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.
Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or sid@sidadelman.com. Visit his Web site at www.sidadelman.com.
Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.
For more information on related topics, visit the following channels:


