Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Enterprise Architecture View:
Performance Impacts on the Data Warehouse Environment

  Column published in DMReview.com
December 1, 2004
  By Michael Jennings

How are you detecting and measuring performance of your data warehouse?

The complexities and cross dependencies of today's IT infrastructures pose continuous challenges for maintaining acceptable and consistent performance in the data warehouse environment. Data warehouse processing activities (reports; analytics; extract, transform and load [ETL]; queries; etc.) that initially performed optimally may begin to degrade over time. The operational impact to the data warehouse environment may not be readily apparent due to the rate of change or the seemingly sporadic occurrence of the these types of issues. These changes in the environment can be caused by a variety of factors that are directly or indirectly associated with the warehouse.

This column is the first portion of a two-part series examining various data and infrastructure issues that may affect performance in the data warehouse environment. Techniques for measuring and/or detecting performance issues in the data warehouse environment will be examined in part two.

Environment Factors

You deployed your data warehouse to the production environment for user access after extensive testing of all processing activities. Stress testing was performed against the data warehouse environment to ensure that adequate performance would be maintained during periods of high demand. What other factors could affect warehouse performance? To start, it is likely that none of your testing was done on an exact copy of your production infrastructure. Typically, due to cost, the testing environments do not mirror the infrastructure components found in production. The number of server tiers (Web, application, database, firewalls) or the redundancy of parallel data centers for load balancing and failover is not available in the testing infrastructure. Additionally, there may be differences in network bandwidth between the testing and production environments.

How complete is your view of processing activity occurring on the data warehouse infrastructure? What effect does your ETL processing cycle have on front-end performance of the data warehouse? Did you measure front-end performance of the data warehouse during an ETL processing cycle to gauge its impact? ETL processing typically consumes significant CPU and disk I/O resources from the servers. If your ETL product resides on the same application and/or database servers servicing user queries, there can be an increase in processing times for these requests.

Other database activities in the environment can also affect warehouse performance. If other databases reside with the data warehouse, their activities will consume server resources. Ideally, transactional applications should be on separate servers from the warehouse to ensure optimal performance of both environments. If transactional and data warehouse databases must be on the same server, they should be in separate database instances in order to take advantage of DBMS and database tuning parameters specific to their use. Depending on growth and usage changes in the data warehouse, the DBMS instance and database may require additional tuning in order to maintain performance.

Database and log file backups can also be potential factors affecting overall performance of your warehouse environment. These activities should be scheduled during off-peak hours to minimize impact on the environment.

Shared Infrastructure Factors

Many data warehouse environments are taking advantage of the various shared storage device options available today. These shared storage devices allow for data growth for all servers on the network, eliminating the need and maintenance of dedicated disk for specific servers. Application and database server resources are freed from data storage processing activities because data does not reside on them. Network attached storage (NAS) and storage area network (SAN) are examples of these shared storage devices. While these storage options provide overall maintenance and growth advantages to the entire enterprise, they can affect performance of the data warehouse in ways that are not easily detectable. Because a shared device server is shared in the enterprise, processing activity from a variety of application servers can be requesting resources from the same storage server that the data warehouse accesses. During periods of heavy data processing from multiple applications, the shared storage server can reach saturation points that affect the processing performance provided to the data warehouse and other applications. From a data warehouse perspective, the degraded performance will not be apparent or measurable on the warehouse servers because of the use of shared storage. The degraded performance may be sporadic and reached only during certain periods when the combined application processing requests all simultaneously hit the shared storage server.

In data warehouse environments that are deployed across multiple data centers for availability and load balancing of application processing, the interconnect infrastructure, sometimes referred to as the fabric, can cause performance issues with the data warehouse that are not easy to detect. Degraded data transfer media and/or interconnect components can affect data processing activities of the warehouse, resulting in slower performance. Again, the cause of the performance issue on the data warehouse will not be apparent due to the use of this shared infrastructure service of the enterprise.

As you can see, many direct and indirect factors may affect overall performance of your data warehouse environment. The impact these factors will have on data warehouse performance cannot typically be measured during development and testing phases. Getting a complete picture of all processing that can impact performance of the warehouse will require a total understanding of all data warehouse administration, maintenance, tuning and shared services activities. Next month, we will investigate various methods for detecting and measuring performance of the data warehouse for these types of issues.


For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance and Enterprise Achitecture.

Michael Jennings is a recognized expert with more than 20 years of information technology experience and speaks frequently on business intelligence/architecture issues at major industry conferences and has been an instructor at the University of Chicago's Graham School. He is a co-author of the book Universal Meta Data Models and a contributing author of the book Building and Managing the Meta Data Repository. He works for EWSolutions, a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing, enterprise architecture and managed meta data environment technologies (www.EWSolutions.com). He may be reached directly via e-mail at MJennings@EWSolutions.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Backup SQL Server or Exchange Continuously
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.