Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

Resource Portals
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
More Portals...


Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Industry Events Calendar
Vendor Listings
White Paper Library
Software Demo Lab
Monthly Product Guides
Buyer's Guide

General Resources
About Us
Press Releases
Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Warehousing Lessons Learned:
Hub-and-Spoke Architecture Favored

  Column published in DM Review Magazine
March 2005 Issue
  By Lou Agosta

A data warehousing architect at a large insurance company told Forrester: "When it comes to architecture, keep it simple. A few forms using plain geometry should be enough. The idea is to see through the complexity to the underlying pattern. Of course, the real world will be messier; but the value of architecture is to see through the intricacies to the underlying simplicity to empower planning, design, and implementation." Inspired by these and many similar remarks, Forrester surveyed 213 practitioners at the Data Warehousing Institute San Diego Conference in August 2004 (see Figure 1). The most common approaches to data warehousing architecture are:

  • Centralized with hub and spokes. The data warehousing architecture reported most frequently is the data warehouse with attached data marts (42% of respondents chose this option).
  • Centralized, pure and simple. A special case of the centralized architecture is one that implements the central data warehouse only, an option chosen by 18% of the practitioners.
  • Decentralized. Independent data marts without consistent design are reported by 19%. Independent data marts often form a distributed ("decentralized") architecture.
  • Federated. The key term "conformed" signals dimensional modeling - acknowledged by 15% of practitioners.
  • Virtual data warehousing. The real loser is virtual data warehousing, which has been superseded by enterprise information integration and registers barely 1% of respondents.

Figure 1: Survey Results

In the real world, firms that are highly centralized in geography and governance should pursue a centralized data warehouse architecture to reap the greatest operational efficiencies and business benefits. In practice, the hub and spokes are implemented on different platforms and database instances, but there is no reason that this must be so. In some cases, both could be on the same platform - though it would have to be a large one if the number of combinations is also large. In contrast, those firms that are highly decentralized will prefer a distributed architecture, and those with a mixed organizational pattern should implement a federated one.

It should be noted that the survey did not ask about data modeling philosophy, and this survey is perfectly consistent with practitioners implementing dimensional models in different architectures - centralized, hub-and-spoke, as well as "conformed" designs. When the definitions of key structures such as customer, product and related data entities are specified by a consistent, centralized design but implemented according to the local priorities of the individual lines of business, the result is a federated architecture. These are also described as pure, conformed data marts. However, note that a careful reading of Ralph Kimball reveals that he emphasizes a centralized design and sees no requirement for a persisting, centralized database (source: Ralph Kimball, The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons, 1996).

What Does it All Mean?

Data warehousing systems distribute control over information for decision making in the enterprise. Therefore, the form of the data warehouse architecture aligns with how the modern corporation is governed with centralized decision making radiating from the center to the periphery. This means:

No data warehousing architecture is right or wrong in itself. Enterprises have succeeded with all the alternative architectures surveyed - and individual cases of failure have also occurred. The data warehousing architecture will often mirror the form of the enterprise that implements it. Thus, highly centralized enterprises such as financial services, telecommunications and airline transportation will find that a centralized data warehouse is the line of least resistance. Those enterprises with distributed operations will get the best results with distributed data warehouses, while those with a mixed pattern of governance will do best with a federated approach.

Data marts are a useful, but limiting compromise. The proliferation of data marts in an otherwise centralized architecture means that enterprises are planning centrally, but end up making compromises. Data marts often represent a compromise forced on a centralized design such as the need for an interim deliverable, incremental result, a response to a powerful political constituency that wants its own system or performance considerations.

Virtual data warehousing is dead, long live EII. In spite of being an interesting idea, virtual data warehousing is not getting traction in the market. This is because on-the-fly data integration is computationally complex, requiring significant bandwidth for data movement and horsepower to perform JOIN operations. Except for small volumes of data, a strictly limited number of data stores, or an exceptional workaround, the intricacies of real-world data warehousing applications limit applications of the virtual data warehouse due to complexity, the unmet need for schema integration and performance. Enterprise information integration has replaced virtual data warehousing. 


For more information on related topics visit the following related portals...
DW Design, Methodology.

Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to lagosta@acm.org.



Solutions Marketplace
Provided by IndustryBrains

Embarcadero ER/Studio 6.6
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Help Desk Software Co-Winners HelpSTAR and Remedy
Help Desk Technology's HelpSTAR and BMC Remedy have been declared co-winners in Windows IT Pro Readers' Choice Awards for 2004. Discover proven help desk best practices right out of the box.

Dedicated Server Hosting: High Speed, Low Cost
Outsource your web site and application hosting to ServePath, the largest dedicated server specialist on the West Coast. Enjoy better reliability and performance with our screaming-fast network and 99.999% uptime guarantee. Custom built in 24 hours.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.