-
Marketplace
-
Channel Resources
Articles from this Site
Teradata Launches Data Warehouse Packages
Navy Exchange Service Command Selects Netezza
Netezza Enters Location Intelligence Market
St.George Bank Upgrades Enterprise Data Warehousing with Teradata
Kalido Strengthens New Ministry Information Library
White Papers
Spend Data Warehouse on Steroids
An Architected Approach to Integrated Information
KALIDO Business Modeling
Data Warehouse Lifecycle Management
Data Warehousing Ensuring Data Integrity
Books
The Enterprise Data Warehouse: Planning, Building, and Implementation
Enterprise One to One: Tools for Competing in the Interactive Age
Data Warehousing Advice for Managers
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses
90 Days to the Data Mart
The Data Warehousing Satisfaction Survey, Part 3: A Single Fact is Worth a Thousand Opinions
The IBM Data Warehousing Satisfaction Survey (2007) consisted of an invitation to some 200 end-user enterprises to participate in an anonymous, Web-based survey about data warehousing architecture, latency, size and related business issues. Invitations were sent to enterprises regardless of the data warehousing platforms they were using, and respondents included the complete spectrum of what is in the market at this time, including IBM, Microsoft, Netezza, Oracle and Teradata platforms. The emphasis was on surfacing trends that apply regardless of the specific data warehousing platform. Here is a look at some of the initial results of the survey.1
Part 1 appeared in DM Direct Special Report on October 2, 2007.
Part 2 appeared in DM Direct Special Report on October 9, 2007
The unavoidable conclusion to be drawn from a survey of volume points for data warehousing is that the big are getting bigger. This survey result is a bold statement of the obvious. Data volumes are exploding over the top. If there was any doubt, this surely confirms it. What it is engaging to consider is that most of the data warehouses that currently weigh in at 10TB or under are expected to show a decline, obviously not in their own volume, but as a percentage of the total number of data warehouses in the given category. In other words, there will be more data warehouses in the 10 to 20TB range and fewer in the 4 to 10TB range. This is consistent with those that are 4TB today growing to be 10TB within 18 months, sustaining a rule of thumb metric that volumes are doubling about every 18 months. (Incidentally this is not a statement of Moore's law that applies to computing power, not volume points.) It is not clear from the survey question alone whether the anticipated volume growth includes as one its variables, the integration of unstructured and quasi-structured data, for example, as a function of XML in dynamic data warehousing. However, the popularity of XML as a new data type as called out in Table 4 suggests the answer is "yes." Enterprises are planning for the integration of at least some forms of unstructured data using XML as part of the emerging trend to dynamic data warehousing.
A brief explanation on how to read the number indicated in the category of percentage change in Table 2 is appropriate. Of those surveyed today in Table 1, 50 percent of those reporting a data warehousing volume point of under (<) 500TB expect to no longer be in that category but to have shifted upwards to one of the other categories. It does not mean there will be 50 percent less data (which would be a misreading of the number). It means that data warehouses in the 10 to 20TB range will become more common. Likewise, with those in the 20 to 100TB as well as those above 100 TB - bigger is expected to be more common.

Table 1

Table 2
Diverse, Heterogeneous Data is Common
Data integration and information integration is and remains a growth industry and the data warehouse is one of the key tools in the data integration portfolio along with service oriented architecture and message brokers. Enterprises have made progress in capturing and managing new forms of diverse, heterogeneous data. These include packaged applications in CRM, supply chain management (SCM) and enterprise resource planning (ERP) systems, Web logs, eXtensible Markup Language (XML) and message brokers, such as IBM 's MQ Series, Microsoft MSMQ or TIBCO Rendezvous. When industry analysts ask data warehouse practitioners what it is that they really do, the common thread that emerges after all the buzzwords about CRM, enterprise application integration (EAI) and real-time B2B is two words - data integration. This activity remains critical in overcoming the silos of operational data in transactional ERP, CRM and SCM systems.
Given that data integration is a growth industry, the leading edge of growth is expected to occur in three areas - XML, Web logs and message queues (and enterprise application integration (EAI)). These are data sources that have, so far, not been widely tapped as sources of data warehousing input, but which are now coming on stream. It makes perfect sense since most enterprises started by integrating the obvious transactional data sources such as mainframes, relational databases and packaged applications.
One unambiguously positive development disclosed by the survey is that the rate of growth of spreadsheets (e.g., Excel) is expected to be a negative 50 percent. What Wayne Eckerson of the Data Warehousing Institute (TDWI) calls "spread marts" are in decline, at least in terms of their rate of expansion.

Table 3 (Percentages will be more than 100 percent due to multiple responses.)

Table 4 (Percentages will be more than 100 percent due to multiple responses.)
A brief explanation will be useful as to how to read the "% Change" category in Table 4. This refers to the percent change that occurs between Table 3 and Table 4. For example, as stated in Table 4 the rate of growth of new relational data sources is expected to decline by 63 percent. This is reasonable in that most relational data sources are already being integrated into the data warehouse. Thus, the 75 percent decline in the rate of new growth of mainframe data sources does not mean that mainframes will be taken offline, but rather that most of them are already being tapped as sources of data warehousing systems. It is a decline in the rate of growth. Again an example will be useful. Today, 78 percent of data warehousing respondents extract from a mainframe data source for their data warehouse. Going forward 18 months another 20 percent of respondents expect to bring mainframe data on stream whereas 56 percent expect to do the same with XML and some 22 percent expect to go after Web logs, 32 percent message queues (e.g., EAI) and 32 percent packaged applications.
Latency Shrinks, Dynamic Warehousing Grows
Daily update will continue to be the predominate approach in the majority of enterprises. Daily update is so common that it will still occur in about 80 percent of the installations even if the expressed expectations of survey respondents occur as anticipated. Even so, data warehousing latency continues to shrink as demonstrated by comparing Tables 5 ("Frequency of Update or Load Today") and Table 6 ("Anticipated Frequency of Update or Load 18 Months from Today"). Clearly daily update and loading of the data warehouse is the state of the art today with 88 percent of enterprises responding that is what they do on a daily basis (Table 5). As indicated in Table 6, daily, weekly and monthly updates are anticipated to decline by about 11 percent, 11 percent and 17 percent, respectively.
This leads us to the big news. Near real-time updating of the data warehouse and multiple daily update will really take off and explode if enterprises behave as expectedly reported. From an admittedly relatively modest base, the update frequency within these "dynamic" categories grows by 117 and 200 percent, respectively. Thus, IBM concludes that dynamic data warehousing is reaching an inflexion point in the market. It is reaching take off speed. Those enterprises still operating legacy data warehouses that are trying to use a trickle utility to manage a fire hose of data will increasingly be at a disadvantage.

Table 5 (Percentages will exceed 100 percent due to multiple responses.)

Table 6 (Percentages will exceed 100 percent due to multiple responses.)
Editor's Note:
1. Phase 2 of the IBM Data Warehousing Satisfaction Survey is now live! Interested readers are invited to participate by clicking on the following URL and spending 20 minutes answering some 23 questions in this anonymous, Web-based survey. https://www14.software.ibm.com/iwm/web/swg-dwss/entry.shtml
Lou Agosta is an independent industry analyst in data warehousing. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, data mining and data quality. He can be reached at LAgosta@acm.org.
Kevin Modreski is manager of the Information Warehousing Competitive team. Modreski has spoken and written extensively on the competitive dynamics in the data warehousing market and related topics. He can be reached at kmodresk@us.ibm.com.
For more information on related topics, visit the following channels:


