Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search
advertisement

RESOURCE PORTALS
Business Intelligence
Compliance
Corporate Performance Management
Data Integration
Data Quality
Data Warehousing Basics
ETL
Master Data Management
Real-Time Enterprise
View all Portals

WEB SEMINARS
Scheduled Events
Software Demo Series

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
DM Review Extended Edition
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Tech Evaluation Center:
Evaluate IT solutions
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Pick Your Data Warehouse Architecture

  Article published in DM Review Magazine
October 2006 Issue
 
  By Ashok Nayak

One of the primary reasons for the failure of a data warehousing project is lack of an appropriate architecture. This article talks about a new architectural framework for data flow from the source systems all the way through the presentation layer and outlines the questions one must ask before selecting a particular component within the architecture framework.

The 5x5 Architectural Framework

Figure 1 shows an outline of the 5x5 architectural framework for an enterprise data warehouse in a medium to large organization. This framework contains five stages of data movement and storage combined with five sets of processes to move data from one logical stage to the next. Of course, the number of components and the size of each component will vary from one project to another based on a number of factors, including, but not limited to:

  • Size and regionalization of the organization,
  • Number and size of the source systems,
  • Strategy for future growth,
  • Type and distribution of operational data across the organization, and
  • Willingness and level of sponsorship from senior management in support of decision support systems in general.


Figure 1: The 5x5 Architectural Framework

The Five Stages of Data Movement and Storage

The following list details the five stages and offers typical questions one must ask before deciding if a stage should exist or not.

Starting Point

The operational databases and other external systems constitute the starting point in the overall data flow. Source systems could include tables from relational database management systems, flat files from mainframes, spreadsheets and other unstructured data.

Stage One

This is the first stage of data movement from the source systems. There is rarely any data modeling activity performed at this stage. Data in this area can be used for a variety of purposes, such as operational reporting. There is no need to carry out any transformations from the source to the staging area.


Figure 2: Stage One

Stage Two

At this stage, data still resides in a normalized form closer to the source system data model. The data granularity is exactly the same as the source systems. No summarization or aggregation is required at this stage. Cross-application reporting is performed here because data from different sources is available in its original form and quality. True operational reporting is available here.

If summarization is needed, the same can be achieved in two ways: 1) by creating a set of programs that would run at specified intervals and compute summarizations before storing to the summary tables and 2) by using the relational database management system (RDBMS) utilities, such as materialized views in Oracle.

Organizations and smaller data warehousing projects may skip this stage if the online transactional processing (OLTP) database can also be used for operational reporting purposes and no cross-domain reporting is required at this point. Whether the OLTP database can be used for reporting depends on a number of factors, including server features, the utilization level by the transactional system and available resources within the server, network and the database.


Figure 3: Stage Two

Stage Three

This is one of the key stages in the data warehouse framework. Typically, the data model within the data warehouse is normalized. Data integration takes place within this stage. This is the central data repository for the enterprise. No reporting activities are allowed from the data warehouse. Data needed by data marts and other entities within the organization are extracted out of the central data warehouse. No summarizations or aggregations are performed


Figure 4: Stage Three

Stage Four

This stage consists of a number of data marts with data extracted, transformed and loaded from the central data warehouse. Each data mart could consist of data related to one or more subject areas. The data marts are designed based on the business requirements. Summarized data is available within the data marts.


Figure 5: Stage Four

Stage Five

This stage is necessitated by the business intelligence (BI) tool in use. Typically, if the data mart contains multiple subject areas, this stage can be used to create multidimensional databases (MDDBs), also called cubes, for each of the subject areas used by a specified user community. MDDBs can also be created with data spanning across multiple subject areas if requirements exist.


Figure 6: Stage Five

The Five Processes

The following sections discuss the five processes involved in the 5x5 architecture framework.

Process One

This is the process of moving data from internal as well as external sources to the staging area. The tool used will probably depend on the number of source systems, the database platform and operating system of the source systems, the volume of data being extracted and the extraction rules being applied.

As shown in Figure 7, the extraction and load mechanisms are pretty straightforward. No data transform is needed here.

The process could be either a replication process (where near real-time operational reporting is required) or a batch extract, transform and load (ETL) in other situations.


Figure 7: Process One

Process Two

During this process, data is fed from staging into the operational data store (ODS). Complex transformations may not be needed. Storing historical data for up to a year is very common. Historical changes (transformations similar to slowly changing dimensions) are not required because the ODS caters to the operational reporting for the organization.

Additional columns may be available within each of the tables to store information such as when the rows were loaded or last updated. This helps design the incremental loading strategy from ODS to the central data warehouse.

If the intent is to use ODS for near real-time reporting, then a data replication tool can be used. If both staging and ODS are present within the architectural framework, then the same replication instance can be used to push data from the source systems to the staging area and the ODS.


Figure 8: Process Two

Process Three

This process involves the data load from the ODS to the central data warehouse. Complex ETL processes are involved, and organizations typically resort to a commercial off-the-shelf software suitable to handling the volume of data. This is where transformations related to slowly changing dimensions are carried out. Data is stored at the most granular level possible. The volume of historical data to be stored depends on the business needs of the organization. Lately there have been legal mandates in various business sectors to retain data for more than five years for a number of reasons. Batch ETL processes are used here.


Figure 9: Process Three

Process Four

Process four moves data from the central data warehouse to the data marts. During this process, summarizations and aggregations are performed as required by the business needs for the end-user community the data mart is catering to.

The process may include simple data extracts as required by various business units within the organization.

Oftentimes there may not be any aggregation and/or summarization for the data marts if the intent is to get detailed information from the data mart, and speed of report execution is not an issue (in situations where reports can be scheduled to run at night, and the results cached for viewing the next day).


Figure 10: Process Four

Process Five

This process is typically driven by the BI/OLAP tool being used. The tool connects to the data mart, creates MDDBs as needed and stores them on a server for access by the users at a later date. Depending on the architecture of the tool being used, this MDDB may or may not contain data. If no data is present, the data mart is accessed on the fly to get the latest information.


Figure 11: Process Five

This is a standard generic framework for an enterprise data warehousing architecture. Criteria for including or excluding a specific component within the overall framework have been discussed. I hope that this will guide data warehouse/BI architects to pick the best architecture for their data warehousing efforts. 

...............................................................................

For more information on related topics visit the following related portals...
DW Design, Methodology and Enterprise Achitecture.

Ashok Nayak has more than 10 years of experience in the IT industry specializing in the areas of data management, data warehousing and business intelligence. He has designed, architected and managed a number of data warehousing and business intelligence projects for several Fortune 500 companies. He is a project management professional (PMP(R)), and is currently working as a senior manager with Accenture Information Management Services. He may be reached at ashok.k.nayak@accenture.com.

Solutions Marketplace
Provided by IndustryBrains

Create, capture, manage, and archive endless data.
Learn how EMC enterprise content management software helps you control content to achieve success.

Address Validation & Address Verification Tools
Validate & verify the addresses in your database at point of entry. Data Quality Tools Free Trial.

Free DB Modeling Trial with ER/Studio
Design and Build More Powerful Databases with ER/Studio.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2007 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.