Reality IT: Who Needs a Data Warehouse?
 |
 |
|
|
|
Column published in DMReview.com September 1, 2005 |
|
|
By
Gabriel Fuchs
|
|
 |
 |
Our data warehouse is so big, it only comes into work when it feels like it.
At my job, we have realized that we need a data warehouse. It was actually quite easy to make this decision - everyone else has one so, therefore, we should obviously have one as well. In order to be better than everyone else, we have, however, not stopped at a data warehouse. We decided to go for the whole kit; an operational data store (ODS) that feeds an enterprise data warehouse that feeds a load of domain specific data marts. Now that we have a killer infrastructure, we just have to figure what we want out of all the stuff.
With the risk of being sacrilegious, I would nevertheless like to ask the following: How many actually need the data warehouse architecture that they have implemented? Yes, reports and analyses may be quicker to produce, but given the limited number of power users, IT is still largely responsible for providing the bulk of reports. Consequently, many organizations are very much where they were 20 years ago. It may not be SQL programming anymore and the Web has certainly improved the distribution of reports and analyses, but IT is still very much solicited when new reports and analyses are needed. And if you want more data in a report or for preparing analyses, IT is definitely involved in extracting this data from the legacy systems.
In many cases, when there is a need to put new data in a data warehouse (or ODS or a data mart) what happens? After all, some necessary data does tend to be forgotten and left out when doing the initial definition and modeling of a data warehouse. Well, the user goes to IT with a demand, whereby IT will answer when it can be done. In reality, it will often take some time to update a data warehouse. If the data warehouse is fed through an ODS, more work is needed. If there are data marts involved, these need to be fed as well. There is a lot of feeding going on in such a situation. I am not going to tell you what happens when the source systems that feed the ODS change. Let's just say that it may take some time, resources and yet more feeding to get whatever you want out of the whole hodgepodge.
The point here, is to ask oneself if a "complete" data warehouse architecture, i.e., ODS, enterprise data warehouse and specific data marts, is actually more cost-efficient than a smaller solution, e.g., a sole data warehouse? As an example, let's use customer dimension - something that is likely to exist in several data marts, at least if the organization is interested in its customers. Modify it in one data mart and it shall have to be modified elsewhere (or you can forget having one version of the truth). Modify directly in the enterprise data warehouse and its repercussion and validity in the concerned data marts still need to be verified. Domain-specific data marts tend to depend on each other in one way or another; so the more data marts, the more complex the maintenance risks becoming .
A data warehouse is nice, but it is always likely to increase the workload for the IT department. If the resulting benefits do not outweigh the extra expenses, the data warehouse has failed. Check the number of users that actually benefit from the whole stuff and ask yourself if these users are now so much more efficient that this covers the extra expenses.
As cool as a complex data warehouse architecture may be, it is not the number of resulting data marts that will be the indicator of success. More data marts will not necessarily mean higher returns on investment. More is not necessarily more. Instead, more can often be less. More is cooler, though.
At my job, we are back to square one. Whenever we need data that is not in our big and, therefore, cool data warehouse system, we shall have to wait for this data to be integrated. And wait. Often, it would have been quicker and more efficient to get the data directly from the source systems, but we cannot do that. What would the use of our data warehouse be, if we were allowed to circumvent it? Our data warehouse system has locked us all in a situation that whenever the demanded data is not in it, the data shall have to be put there - no matter how much time it may take. The result is often that the whole system makes us less efficient in some cases than when we were not endowed with this rich and cool architecture.
So, do not get into situation where your data warehouse has become so big that you are all a part of it, and it is all a part of you.
And if your data warehouse is so big that Stephen Hawking has a theory about it, then you might really need to ask yourself questions about its efficiency.
...............................................................................
For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance.
Gabriel Fuchs is a senior consultant with IBM. His column Reality IT takes an ironic look at what real-world IT solutions often look like - for better or for worse. The ideas and thoughts expressed in this column are based on Fuch's own personal experience and imagination, and do not reflect the situation at IBM. He can be reached at gabriel.fuchs@ch.ibm.com.
 | Solutions Marketplace
Provided by IndustryBrains
|  | Data Quality Tools, Affordable and Accurate Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

| Design Databases with ER/Studio: Free Trial ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

| Free EII Buyer's Guide Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

| Click here to advertise in this space |
|
|