Data Warehousing Lessons Learned:
Hub-and-Spoke Architecture Most Popular for Data Warehousing
According to the recent survey with our partners at The Data Warehousing Institute (TDWI), the most frequently implemented architecture is the data warehouse with attached data marts (43 percent of respondents). This corresponds most closely to what is also described as a hub-and-spoke architecture ?- a central data store with attached (dependent) data marts. Also of interest is a large group (19 percent) that is committed to centralized data warehousing pure and simple, and a significant number (11 percent) who are not sure what they have.
Interestingly, five percent of respondents report that "other" best describes their architecture. The first five options -? federated data warehouse and marts through non-conformed data marts (highly distributed), etc. ?- are exhaustive distinctions. Therefore, if an enterprise claims to have a hybrid of these, it means it really does not know what to call the spaghetti-like structures that are its implied architecture.
These results suggest that enterprises are planning centrally, but occasionally decide to or are forced to make compromises. Sometimes data marts represent a compromise forced on a central design by issues such as the need for an interim deliverable, an incremental result, a response to a powerful political constituency that wants its own system or performance considerations. Data warehousing architectural options are many and varied. However, five main overall patterns are found both logically and practically. A firm can operate a distributed data warehouse architecture, in effect, by moving and translating data between nodes in a network, the nodes of which are data stores (data marts only). A firm can build the centralized consolidation data warehouse only or operational data store (ODS), or a centralized data warehouse leveraging a hub-and-spoke form with attached data marts. A firm can build a federated system of distributed warehouses that in its more successful implementations also sometimes exploits a data hub, but without a persistent centralized data store. Or, a firm can try to avoid physical data warehousing altogether and address decision support issues by deriving business intelligence (BI) directly from operational, transactional systems. The latter is sometimes described as a "virtual data warehouse." In the so-called virtual data warehouse, there is no persisting physical implementation. Data is repeatedly transformed instead of being transformed and stored persistently. This has sometimes resulted in performance issues that provide at least a partial explanation of the low turnout for virtual data warehousing. These survey results suggest that virtual data warehousing is not gaining traction (with less than two percent of respondents reporting virtual warehouses as their architecture).
The hub-and-spoke approach is not the only approach, but it is no accident that it is popular. The number of times the data must be transformed is optimal for a majority of scenarios involving many- to-many nodes in a network of source and target data stores. The critical path to success lies through the design and implementation of unified and consistent data dimensions relating to products, customers, promotions, channels, costs, revenues and other dimensions important to a given business. The recommendation is that the single most important action a business can take from an architectural point of view is to design consistent and unified definitions of product, customer, channel, etc. It can then implement either federated data marts or a centralized ODS or some combination of the two. This also enables management of diverse online analytical processing applications as dependent data marts rather than disconnected and dysfunctional silos. The data stores will interoperate, and the design will be sufficiently robust to support a flexible architecture that will accommodate business requirements that business managers cannot necessarily foresee today.
For more information on related topics visit the following related portals...
DW Design, Methodology.
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to email@example.com.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Test Drive the Standard in Data Protection|
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?
|Click here to advertise in this space|