-
Marketplace
-
Channel Resources
Articles from this Site
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
Where is the best place for a BI application to reside?
Are there best practices or tools for creating a data dictionary?
When do you use a star schema and when to use a snowflake schema?
What are dimension mapping and fact mapping?
White Papers
Data Warehouses: What are they and how will they benefit your organization?
Advances in Data Warehouse Performance
Books
Single View of the Truth
Marketing Systems
One of the bedrock goals of data warehouse projects is a single version of the truth. Yet truth is rarely so simple. In the classic example, a customer is one thing to a salesperson, another to the shipping department and something else to accounts payable. Good data warehouse designers recognize this and build different definitions into their systems, so users can access whichever version they need whenever they need it.
Yet the notion of a single version of the truth persists - and warehouse teams invest huge resources negotiating shared business models to define it. Why?
Dueling Spreadsheets
The problem is often described as dueling spreadsheets, where managers argue over whose data is correct. This is apparently something to avoid at all costs.
Personally, I love a good debate over data. But if you want to prevent those arguments, you have to understand what causes them. Just providing a single version of the truth wont do the trick. Any data warehouse rich enough to be useful will contain enough variations of the truth that it, too, can produce conflicting results.
Lets start at the beginning. Managers rely on whatever data sources they have available. In the absence of a warehouse, these are usually their local operational systems. Managers use these systems not just because they are handy, but also because they understand their contents. Because learning about a data set is often the hardest part of an analytical project, it is perfectly reasonable for managers to rely on the data they know.
Dueling spreadsheets happen because each managers local data set is an incomplete view of an entire problem. Call center managers can see call center information and might do an analysis that shows how to minimize call center costs. But the service manager will see service costs that result from poor call center treatments, such as dispatching repair people for problems that could have been resolved over the phone. Each manager can analyze her own data correctly and reach opposite conclusions about the best course of action.
Putting all that data into one warehouse wouldnt solve the problem. The single version of the truth (that is, a shared data model) will include both call center data and service data. If each manager simply extracts her own departments information, she will still end up with conflicting results.
The only thing that will change this is if both managers pull both departments data. Indeed, each manager really needs all the relevant data, which probably comes from many departments. Here is where the central data warehouse truly adds value: it makes all that data accessible in an integrated format.
But this brings us back to the original problem. Managers will use the data they find most familiar. Even if they have access to a comprehensive warehouse, pulling data for all different departments requires understanding where to find that data and how to combine it. Managers are unlikely to take the time to learn how to do this. Instead, theyll either go back to their familiar local sources or pull the equivalent data from the central warehouse. Either way, they get the same incomplete result.
This problem can be mitigated but not really solved. It cant be solved because managers dont have the time or inclination to learn about proper warehouse procedures. Mitigation means making it easy for managers to see the data they really need, even if they didnt think to look for it. It also means making it at least as easy to get that data from the warehouse as from local systems.
Waiting for the IT department to build a new data cube definitely does not count as easy, particularly if the manager could already pull it from the local system for herself. It is tempting to solve the problem by mandating use of warehouse information, but that probably wont work. Many managers will just do without the information rather than invest major time in learning a new system. Or theyll look at data from the local system and not show it to anyone else. Or theyll ask an analyst to do the work for them, but only when it is worth the extra time, cost and trouble. Remember, were talking here about managers who have considerable discretion in how they do their jobs.
Analysts are another story. Learning new systems is part of their job, and, if theyve made the right career choice, it is something they enjoy. Moreover, they are part of a community of other analysts that should enforce its own standards for the quality of work. Therefore, it is perfectly plausible to require that analysts draw their data from a warehouse. Not that this should be a problem: all possible data assembled in one place is an analysts fondest dream come true. Requiring them to use a warehouse should be about as hard as requiring a kid to visit a candy shop.
In short, the warehouse is an analysts tool. Getting managers to use it adds requirements that companies may or may not decide to meet. Achieving a single version of the truth takes more than a unified data model: it means a thorough change in how companies analyze their data. Warehouse teams that fail to recognize this can never meet expectations.
David M. Raab is president of ClientXClient, a consulting and software firm specializing in customer value management. He may be reached at draab@clientxclient.com.
For more information on related topics, visit the following channels:


