-
Marketplace
-
Channel Resources
Articles from this Site
Netezza Enters Location Intelligence Market
Data Warehousing Meets Data Archiving in Information Lifecycle Management
The Role of Packaged BI Applications in Delivering Enterprise-Wide BI and DW
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
Where is the best place for a BI application to reside?
White Papers
Data Warehouses: What are they and how will they benefit your organization?
Advances in Data Warehouse Performance
Books
What is the difference between data warehouse testing and conventional testing?
What is the difference between data warehouse testing and conventional testing? In data warehouse testing what type of testing should we use (sanity, load, UAT)?
Evan Levy's Answer: Actually, traditional application testing doesn't look at both acceptance testing of an application and the data content. A data warehouse project must actually ensure that acceptance testing is executing against the data (loading, value accuracy, transformation, completeness, etc.) as well as the application functionality (reporting accuracy, display completeness, response time, etc.).
Joe Oates' Answer: Testing for data warehouse is quite different from testing the development of OLTP systems. The main areas of testing for OLTP include testing user input for valid data type, edge values, etc.
Testing for data warehouse, on the other hand, cannot and should not duplicate all of the error checks done in the source system. Even though there are some data quality improvements, such as making sure postal codes are associated with the correct city and state that are practical to do, data warehouse implementations must pretty much take in what the OLTP system has produced.
Testing for data warehouse falls into three general categories. These are testing for ETL, testing that reports and other artifacts in the data warehouse provide correct answers and lastly that the performance of all the data warehouse components is acceptable
Here are some main areas of testing that should be done for the ETL process:
- Making sure that all the records in the source system that should be brought into the data warehouse actually are extracted into the data warehouse: no more, no less.
- Making sure that all of the components of the ETL process complete successfully
- All of the extracted source data is correctly transformed into dimension tables and fact tables
- All of the extracted and transformed data is successfully loaded into the data warehouse
Once the extracted and transformed data is loaded into the data warehouse, testing should be done to ensure that all scripts, reports and aggregations produce correct results from the detailed data in the data warehouse.
Once the above has been validated by testing, the next step is to concentrate on performance testing. If the ETL process takes longer than the allowable time window, then it must be tuned to meet the time window allowed.
If reports take an unacceptably long time to run, use of aggregation or other tuning techniques will be necessary to bring them to an acceptable level of performance.
In testing for data warehouse, it is critical to make sure that things are done properly and get the right answer first. Only then should you try to optimize performance. Trying to do both at the same time is usually not successful.
The explanations here are by no means exhaustive but should provide a good overview of the differences between testing in a data warehouse environment versus a conventional OLTP environment.
Joe Oates is an internationally known speaker, author and consultant on data warehousing. Oates has more than 30 years of experience in the successful management and technical development of business, real-time and data warehouse applications for industry and government clients. He has designed or helped design and implement more than 30 successful data warehouse projects.
Evan Levy is a partner and co-founder of Baseline Consulting Group, a multivendor systems integration and consulting firm. As the partner in charge of Baselines largest practice, Levy leads both executives and practitioners in delivering technology solutions that help business users make better decisions. He has led strategic technology implementations at commercial and public sector organizations and advises vendors on their product development and delivery strategies. Levy has been published in a wide array of industry magazines and has lectured on a range of technology delivery experiences at leading conferences and vendor events. He has been a featured speaker at the Marcus Evans Analytical CRM symposium, DCIs Data Warehousing conference, the CRM Association, DAMA International, the AMA and the Data Warehousing Institute. His current work involves delivering and lecturing extensively on the topic of data integration. You can contact him at evanlevy@baseline-consulting.com.
For more information on related topics, visit the following channels:


