Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Empowering the Information Enterprise:
The Information Repository

  Column published in DM Review Magazine
September 2004 Issue
  By Jonathan Wu

Advances in software and computing technologies have led to great efficiencies by automating and streamlining manual processes while amassing large volumes of data. Entity-relationship modeling, the database design technique often used in these applications, organizes data by reducing data redundancy in a manner called data normalization. While this technique works well for transactional applications, performance and functionality issues arise as these databases are accessed for reporting, analysis and decision-making purposes because entities have indexes to optimize data entry and processing as opposed to data extraction. Because redundancy is eliminated, multiple entities must be accessed to obtain the desired information. As the number of relationships increases, so does the complexity of those relationships, which also impacts performance. Functionality issues occur because data needed for reporting and analysis is often stored in disparate databases, thereby causing individuals to access multiple databases to obtain the data they need. These issues spawned the movement toward the creation of data structures and information repositories.

Data Structures

Data needed for reporting and analysis can be arranged and stored in numerous structures including data files, denormalized tables, OLAP/multidimensional cubes, operational data stores (ODSs), data marts and data warehouses (DWs). Each is vastly different, and each serves a specific purpose. Data files are typically provided to external parties requesting a set of detailed information or are used internally for analysis by another application. Denormalized tables are designed to contain redundant information and do not adhere to the rules of data normalization. They are often created as a quick solution to satisfying multiple reporting and analysis needs for the same set of data. Similar to a denormalized table, an OLAP cube contains highly redundant data. It also contains dimensions, which are used for trend or comparative analysis. ODSs, data marts and data warehouses are data structures that use relational database management systems. Through their database schema, they provide broader capabilities by containing both detailed and summarized data while providing extensive reporting and analysis possibilities. To create an organized and comprehensive information repository for your organization, an understanding of the role of each data structure is needed.

Information Repository Architectures

The organizational structure of your company often influences whether the information repository architecture is centralized or decentralized. In a centralized architecture, an enterprise perspective can be created, thereby providing management with a broad range of analytics for monitoring, assessing and decision-making purposes. In a decentralized architecture, information repositories are created for each sponsoring business unit or group. This architecture is often reflective of standalone operations and satisfies the information needs of that respective group.

Within an information repository, there are several approaches to organizing data structures and designing technical architectures. The three common approaches presented here hold both benefits and drawbacks.

Federated Data Warehousing

The federated approach to data warehousing enables virtual access to disparate data sources as if they were one. In disparate information environments, individuals often seek an architecture that seamlessly integrates information without needing to tackle internal issues of data stewardship, nomenclature and standardization.

However, the practical realities of this approach highlight the issues of performance, integration and functionality required for an information repository. Current data integration technologies have made advances but have not yet become enterprise platforms for information. In addition, meta data standards among transactional systems and business intelligence (BI) applications hamper the effectiveness of federated data warehousing.

Dimensional Warehousing

The technique for designing a star schema or dimensional model is the foundation of dimensional warehousing. It is an elegant approach to achieving the performance and functionality required for reporting, analysis and decision-making purposes. Conforming dimensions are incorporated into several star schemas, thereby reducing the number of similar data structures.

As a rule of thumb, dimensional warehousing is best suited for less than three unique source systems. This is due to the increased complexities of integrating data on the same subject matter from multiple unique source systems. For example, integrating customer data by matching customer name from two unique source systems is challenging but feasible in a dimensional model. As the number of unique source systems increases beyond two, the data integration challenges grow exponentially. To address the complex data integration challenges from multiple unique source systems, we have found that the corporate information factory (CIF) architecture works best.

Corporate Information Factory

The CIF is the most comprehensive and effective information repository architecture. It provides a framework for organizing data in a manner that supports the integration of subject areas and data supplied by various transactional systems. In a CIF, the data warehouse and/or operational data store are designed using entity-relationship modeling techniques. Data marts are then created from the DW or ODS. The design used to create the data marts is dimensional modeling, which provides the rapid extraction and functionality needed by information consumers.

For organizations with less than three unique source systems, the CIF architecture may be more work than is necessary to support the information needs of the organization.

An information repository contains different types of data structures to satisfy the needs of information consumers. Organizing and cataloging these data structures creates an information repository architecture that fosters efficiencies and data consistencies needed for reporting, analysis and decision-making purposes. The CIF and dimensional warehousing are the two best architectures to consider.


For more information on related topics visit the following related portals...
Business Intelligence and DW Administration, Mgmt., Performance.

Jonathan Wu is a senior principal with Knightsbridge Solutions. He has extensive experience designing, developing and implementing information solutions for reporting, analysis and decision-making purposes. Serving Fortune 500 organizations, Knightsbridge delivers actionable and measurable business results that inform decision making, optimize IT efficiency and improve business performance.  Focusing exclusively on the information management disciplines of data warehousing, data integration, information quality and business intelligence, Knightsbridge delivers practical solutions that reduce time, reduce cost and reduce risk. Wu may be reached at jwu@knightsbridge.com.

Solutions Marketplace
Provided by IndustryBrains

Extract Greater Value from Your Data
Your data is the engine that drives your business. Let the Experts at IBM help you put the information you need right at your fingertips. Click to register for more information.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Eden - Industry Leading Help Desk Software
Offering cutting edge Help Desk and IT Service Management software solutions, built to leverage your existing IT infrastructure

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Use MS Word as your Report Generator
Create reports in PDF, RTF, HTML, TXT, XLS & more. Use MS Word to design the reports and reduce development time by 90%. Easy-to-use custom secure report generation - Fast! Free Demo.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.