|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Ask the Experts Archive
ARCHIVE OF QUESTIONS & ANSWERS FOR ETL
We are building a data warehouse with many source systems that provide data feeds. In order to reduce the volume of data, we only apply the changes (Change, New, Delete) into the data warehouse. We have two choices:
1. Every source system provides a change image file, or
2. Every source system provides the current image file. A data warehouse process would compare the current image file and previous image file and output the change image. Do you have any recommendation on which approach is preferable?
I am in a migration project which involves data extraction from Oracle tables to flat files for different systems. I need to decide on the header and footer section for the flat files for the completeness of the extract. What is mandatory for those sections: timestamp, total number of records, etc.?
Please define data reconciliation and database reconciliation. I also need a definition of data synchronization.
Could you clarify the difference between the terms: data hub and staging area and identify where they fit in a corporate architecture?
I have been asked to do a performance test of the ETL
process of a data warehouse. The
testing needs to be done in isolation for all the stages of the ETL, as well as
an end-to-end test at the end of the testing cycle. The EDW handles some 4 TB of data, and I have to do a volume
test. Please suggest some ways in which
I can go about this.
We are extracting data from a legacy system that has summary data in it that we will need in the DW, assuming that the legacy data has some degree of integrity, should we re-derive these counts as functions of ETL to ensure accuracy?
Can you point me to a white paper or case study where
adding an ETL layer to an ODS (containing retail store POS data) has
enabled speedier nightly data quality processing work on a high volume of
records so that retail store registers can have cleansed records on time
at the start of each business day - as opposed to an ongoing backlog?
What aspects do I need to consider for an ETL tool for a strategic data warehousing development project in banking domain? The existing old warehouse is 3+Terabyte size and it is increasing on average rate of 17.5 percent per year.
We are looking for an ETL tool to replace our current tool which was developed internally. I'm under the gun to find an ETL tool that will support Unicode (for a new customer in Asia). My question is where can I find reviews on the major players.
The new trend has been demanding real-time data
from the data warehouse: 1) to generate operational report at high
frequency 2) to provide a golden source (real-time reference data) for
multiple OLTP Applications. Which is the right ETL tool to extract data
from multiple OLTP sources with no latency and importantly no performance
impact on OLTP. RDBMSs are Informix, Oracle and DB2 UDB?
Our situation is that we need to build a system having: stage (loaded daily)-->ODS (loaded monthly)--> DW (loaded monthly).
The stage is being fed from flat files and houses tables a structure which is exactly same as the flat files. The DW is a dimensional structure, having common dimensions. We have no idea about the best way to model the ODS in this scenario. The ODS needs to act as stage for the DW and also takes care of operational reporting needs. Can you validate this architecture and suggest best ways to model the ODS? An example will be really helpful.
Read It! (Posted July 8, 2004)
Any time I get to a data related project I have a choice of ETL tool vs. homegrown solution. Over the years I have created my own methodology for homegrown ETL solutions. This solution "actively" uses meta data to derive transformation logic and apply data quality routines. I have some production processes which I have not modified for three years. All data changes have been maintained by business analysts through modifying meta data. This approach cuts the entire ETL development stage while maintaining DW. Only BA,QA and operations are involved and this leads to cost savings. To me this approach has huge advantages over commercial ETL tools especially given they steep purchase price tag. Your opinions please!
I am also trying to determine a feasibility of the use of an ETL tool (DataStage) and don't see any benefits. Our environment:
Am I the only one not
I will be extracting data from SAPR/3 and flat
files and loading into my SAP-BW system. How do I go about evaluating ETL
tools and which ones would you suggest?
We can have a staging area between extract and
transform or we can have staging area between transform and
load. I would
like to know three to four advantages and disadvantages of each.
We are using a bottom-up approach in our project. How do I
integrate two data marts that are in different SQL Servers
at different locations for the same company, having the same
business context but little variation in data mart structures,
and what should be the data mart update/load strategy to
follow? Is it appropriate to load data from one data mart to
another and make a single data mart having both locations? data
I am currently managing the feasibility study phase
of a data warehousing project. However I am currently having a
persuading the IT head of the benefits of using an ETL tool
Ascential or Informatica) over hand coded programs to perform
extract, transformation, and load process. How should I go about
particularly in terms of demonstrating real cost savings or other
Which are the ETL tools that are generally recommended to be
used with MicroStrategy?
We have an LDM in ERwin representing a comprehensive
data repository expected to collect data from many sources. We
for a tool to store information linking the source data,
logic and the target attributes in the LDM. What are advantages
an ETL tool (e.g., Informatica) vs. a meta data repository
We are working with a data warehouse project where we
extract data from the source (ETL) and create fact tables and
depending on them. Actually, the end users would query our cubes
their UI Cognos. I need quality assurance and I need to test the
application. Is there any tool available which can be useful to
If not, how should I approach the problem at hand of quality
Are you aware of any tools that analyze COBOL programs to
produce a traceability/transformation map that could be used to
migrate to a commercial ETL tool?
I am providing information for a proof of concept (my first
time) on an ETL tool for a major data conversion project. Could
you tell me the most important issues I should be exploring
within the proof of concept?
I would like to know how to make effort estimations for ETL,
query and reporting, and data mining. Are there any standard
templates for computing effort estimates? Please also suggest
some references Web sites and publications.
Can we and should we adopt different data modeling schemas
during the extraction, transformation and staging method? What
is the consistency that needs to be maintained during this stage?
What strategy do you suggest to update dimensions using flat
files as a source that do not have a time stamp? I've tried to
compare the source record to dimension and update the dimension
if the two are different. Performance drops as more fields are
compared so I have moved to an "update all records" strategy
regardless of changes. Is there an alternative?
Industry standards state that the typical star schema will have
about six dimension tables and the associated fact tables
dependng on grain, etc. Standards also say the dimensions will
have a surrogate key which becomes one of the set of keys in the
fact table(s). However, is there any reason that these keys
cannot be foreign keys through an indirect relationship, and
thereby produce a fact table with no actual key?
Are there sample documents on what to test in a data
warehouse environment? What are some test strategies for data
We have Oracle as our RDBMS for all the applications. Because
the infrastructure exists for Notes already, they want to use
notes database to enter in the customer information. For
reporting purposes the notes data have to be moved to Oracle
database. Do we really need an ETL tool to take the data from
Lotus Notes to Oracle or we can use some simple way of doing
this (e.g., SQL Loader)?
We are looking to implement a new messaging/middleware tool before our existing middleware tool causes havoc or grinds to a halt. What are the pros and cons of EAI vs ETL as a messaging/middleware tool? My gut feeling is EAI, but I need to back up my instincts with something more concrete.
How important it is to have a separate ETL tool when our target RDBMS supplies ETL capabilities, e.g., PL/SQL in Oracle and DTS in MS SQLServer?
We have looked at Oracle Warehouse Builder and although for simple ETL it seems it will handle the job, for more complex transformation logic it seems to suffer. My question is whether you have used the product successfully and whether you could recommend any other ETL products that may fit the bill.
I would like to know how best to test the warehouse in ETL if we are not using any tools for ETL (everything has been written in programs)? If we are using ETL tools like Informatica, how do you draw up test cases for the ETL process?
I am looking for benchmark data indicating the average cost
per gigabyte for implementing a data warehouse.
I work in a big data warehouse and we are using ETI*EXTRACT as ETL tool. I know EXTRACT very well, but I'd like to get an independent opinion about Informatica. Would it be a good idea to change to Informatica?
Do you have any recommendations as to the effectiveness of building your own ETL tool or buying an existing tool?
We are planning to move legacy data from IMS/MVS, DB2 etc.
Can you suggest good extraction and transformation tools for
doing the same?
Should data be changed in the data warehouse and/or data mart making it different from the source system it came from? What are the pros and cons of making the change in the data warehouse/mart vs. the source system?
What tools would you recommend for extracting and the manipulation of large data sets?
When should we use an ETL tool, and when should we develop one by ourselves?
What are the best practices when it comes to deciding between a mainframe and NT environment for a data warehouse?
I was asked to define a method to determine the growth of our data warehouse. How can I achieve this in the most accurate way?
Are there any benchmark tests that have compared ETL tools?
I read in an article that using ERP as the main source and doing dimensional modeling for data warehousing is difficult. Is that true?
What would you recommend as a good tool to build a data mart or star schema?
Do you think it is a clever approach to extract information from the network traffic and let users combine it with data they can extract from whatever database they have?
Can I use QMF for Windows to access our data from the Oracle database? Would we have to modify our present queries to do so?
Is there information about salary averages for data warehousing professionals and/or individuals trained in ETL and OLAP tools?
Is there a rule of thumb that would give me a 95 to 99 percent confidence level that the data in our warehouse is correct?
Could you please give me some suggestions about how we can pick the right ETL product and what we should consider?
What exactly is the role of meta data and its relation with an ETL tool?
What would be the best solution to link databases?
What are the quality edits and procedures for loading warehouses and verifying data before releasing to the business groups?
I am consultant conducting an evaluation on several business intelligence companies, and I am looking for information or resources regarding specific criteria.
Is there technology available to do incremental loading (just the new transactions) and populating of a fact table?
What is the typical rate within data warehousing for promoting changes that effect transformation and data cleansing for existing subject areas that are in production?
How you measure SAP's solution against other, more "traditional" tools?
Where can I find a product that extracts data from any platform, cleanses the data and reformats it in a consistent manner?