Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

I would like to know how best to test the warehouse in ETL if we are not using any tools for ETL (everything has been written in programs)?

    Ask The Experts published in DMReview.com
February 7, 2002
  By Sid Adelman and David Marco and Clay Rehm


I would like to know how best to test the warehouse in ETL if we are not using any tools for ETL (everything has been written in programs)? If we are using ETL tools like Informatica, how do you draw up test cases for the ETL process?


Sid Adelman’s Answer: Whether or not you are using an ETL tool, you need to test and validate that the data has been created correctly. There are a few areas that can and should be validated. If your source data had a million rows, and if you kept them all, the data warehouse should also have a million rows. If the source system has $3,684,679.96 in sales for Region 6, the data warehouse should have that exact dollar amount. If the source system has 85,832 customers in Region 3, the data warehouse should have that exact number of customers. This testing and validation is not a one-time activity. It needs to be performed every time the ETL process is run. Some of the ETL tools have these validation capabilities.

David Marco’s Answer: You draw up test cases for a custom ETL process in the exact same way you do for a tool-driven ETL process. A good test states the beginnings state of the data/process, the processing that should occur and then the end state of the data/process. Make sure to construct test cases for fatal problem errors and for duplicate data.

Clay Rehm’s Answer: Testing has absolutely nothing to do with what tool you use. I am sure you will get differences of opinion here, but the fact of the matter is that no matter what tool you used to load the database, you need to be able to test and validate the data loaded. Period. An ETL tool is an ETL tool. Testing is such an overlooked and underrated function of the data warehouse that books could be written on “data warehouse testing – preventing failure”. Also, ETL means Extract, Transform and Load. This means you can write ETL in any programming language – COBOL, SAS, VB, C, UNIX Shell script, BAL, etc. You get the point. It is that some languages or applications are better than others in terms of ease of use and the other functionality it provides. The biggest benefit of an “official” ETL tool is that it automatically creates meta data. Somewhat cryptic meta data, but it is meta data and it is stored in a relational database management system. Drawing up the test cases requires a test strategy, a test plan, test scripts, testers (people who are committed to the project and who will actually test), documentation, and oh, more documentation. I noticed you did not specify what type of testing you want to perform. Are you going to development test, unit test, system test, integration test, acceptance test, regression test, performance test, etc??? Most data warehouse projects do not include time in their project plans/schedules for the time needed to iteratively test and retest and get the darn thing right. Development testing is made up of unit testing and the combination of units. Unit testing examines individual modules, subroutines or procedures. After the units have been tested individually, they are combined or integrated into larger and larger objects until the complete function has been built. System testing determines if a system satisfies its requirements and if applications within a system operate effectively as a whole. System testing verifies the functional, performance, interface, security and other system requirements in a simulated production environment. This includes system acceptance testing of third-party software and includes performance and regression testing. Acceptance testing verifies that a completed system meets the original business objectives and that the client finds the production version acceptable for use by the intended audience. End to end testing assesses the extent to which the interrelated data warehouse systems support a core business process or activity in an operational environment. This testing covers the complete data warehouse and ensures that any data that is passed anywhere in the system is passed with appropriate values.


For more information on related topics visit the following related portals...

Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts. His new book, Data Strategy, is scheduled for publication this year. He can be reached at 818-783-9634 or sidadelman@aol.com.  Visit his Web site at www.sidadelman.com.

David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40."  He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.

Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, meta data and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a coauthor of the book Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science from Carroll College and is currently working on his Masters degree in Software Engineering. He can be reached at clay.rehm@rehmtech.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Ask The Experts E-Mail This Ask The Experts
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.