Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Real-Time Data Warehousing:
Hardware and Software

    Column published in DMReview.com
February 12, 2004
  By Simon Terr

In this final installment of the real-time data warehousing building blocks series, I am going to discuss hardware, software and staff.

The challenge with real-time data warehousing is to ensure that data flows continuously. Let's consider the plumbing through which the data flows starting with the source system and ending with the customer's desktop:

  • Source system
  • ETL server
  • Network
  • Data warehouse server
  • Reporting server

Source System

There are two principal challenges associated with source systems. The first challenge is how to extract the initial load and the second challenge is how to capture changed data without impacting the source system. The first challenge typically requires the source data to be copied to an intermediary location from which the data can be prepared and loaded into the data warehouse. Although the initial load is a one-time requirement, this requirement repeats itself for each new acquisition. Hence, having a dedicated server that can be configured appropriately to handle all anticipated initial loads may be a necessity, especially in organizations that have multiple source systems.

Meeting the second challenge depends largely on the ETL strategy for capturing changed data. In the case of non-invasive ETL strategies, such as those utilizing changed data views or log tables, the impact is typically limited to ensuring the source system can handle the additional stresses of the changed data read operations. For invasive ETL strategies, such as those utilizing triggers, the resource impacts can be more extensive. If captured data is stored in the source system, the source system has to have the storage capacity to hold the captured data. If triggers are used to capture data, depending on their complexity, the triggers may require additional processing resources.

ETL Server

The job of processing the initial load and ongoing changed data capture is relegated to the ETL servers. The ETL servers have to be able to perform the initial load fast enough so the changed data does not accumulate to unmanageable volumes. The change data capture ETL job has to process changed data, including the accumulated data during the initial load, fast enough to live up to the "real time" in real-time data warehousing.

To accomplish all of the above, the ETL servers need the hardware and software to handle the initial load and changed data requirements. The servers need to have enough CPUs, memory and storage to perform their job. In addition, the servers have to be scalable and allow for additional CPUs, memory and/or storage as needed.

The ETL software has to have the features that will allow it to take full advantage of the ETL servers' hardware. If the ETL servers begin to reach high utilization rates and more processing power is needed, the ETL software needs to have the features to allow adding more ETL servers so processing can be distributed seamlessly across the ETL servers with minimal, if any, down time.


Moving data between servers requires network bandwidth. The network must be able to handle the initial load and changed data bandwidth requirements, especially during peak hours. The network should be tested to see if it can handle the data bandwidth requirements. Interestingly enough, sometimes the network is not the issue although it may appear to be so at first blush. For instance, a bad network card on the source server may give the appearance that the network is the culprit. In some instances, a long haul network segment such as a transatlantic connection may be the bottleneck and upgrading the segment may be too costly. Hence, prioritizing the network traffic may be a solution. The message here is if network performance is an issue, the network needs to be analyzed to find out exactly what is causing the performance issues and why.

Data Warehouse Server

Since there are so many flavors of CPUs, memory, storage devices and software available for servers, choosing a particular combination of hardware and software for a data warehouse server can be a daunting task. The first order of business is to capacitate the data warehouse server by calculating the volumetrics. For example, consider Acme Corporation that has a point-of-sale (POS) system that has about five years of data, about 100GB. Acme expects its sales to go grow about five percent per year which translates to about 21GB/year or 58MB/day. Volumetrics should drive the performance requirements the data warehouse needs to meet, and the data warehouse should be capacitated accordingly.

Reporting Server

Although the data warehouse typically stores data in a format that is easy to report on, rarely is that sufficient to satisfy all reports which is the crux of the problem. To illustrate what I mean, reports that go against OLAP cubes that are built overnight from data retrieved from the data warehouse - the data in the data warehouse may be available in real time, but the OLAP cubes negate this timeliness by being built nightly.

Real-time data warehousing adds requirements that the reporting server may not be able to meet either today or tomorrow. Consequently, reporting volumetrics need to be computed and utilized to either purchase or upgrade a reporting server solution.


There are a lot of moving parts to real-time data warehousing and these parts are impacted differently by the ebb and flow of business. The staff must not only have the technical skills to respond effectively to these impacts but also have the fortitude, self-motivation, attitude and persistence to initiate and manage change when needed. Growth in the business is bound to tax systems. Mergers and acquisitions are bound to change the architecture. New functional requirements will evolve systems.

Source system, ETL server, network, data warehouse server and reporting server are the major components in a real- time data warehousing framework. That is a lot of moving parts and understanding the role of each component ensures that it can be appropriately selected or tuned to let data flow continuously. Having the right staff ensures data flows continuously.


For more information on related topics visit the following related portals...
DW Basics and Real-Time Enterprise.

Simon Terr is CEO and founder of KnowledgeSoft, Inc., a provider of products and services in the areas of real time data warehousing and business intelligence. Simon Terr can be reached at knowledgesoft@att.net.

Solutions Marketplace
Provided by IndustryBrains

SAP Software Migration for Customers
If your current applications are at risk, SAP Safe Passage provides a clear roadmap for solution migration with maintenance support & integration technology. View free demos now!

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

SAP Master Data Management
Learn about SAP Master Data Management to easily consolidate, augment, store & distribute master data w/ NetWeaver. Manage complex & rich product, customer, supplier, & employee info w/ SAP.

Scrub Bad Addresses from Your Customer Database
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Save on Business Intelligence and Data Warehousing
Leverage Open Source database software and PC-based commodity hardware for an unsurpassed price/performance value. ExtenDB transforms the economics in developing a Business Intelligence infrastructure.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.