-
Marketplace
-
Channel Resources
Articles from this Site
Meijer Selects QuantiSense for Retail Business Intelligence
Ford Motor Company Selects Endeca
Concurrent Use of Business Analytics by Multiple Workers Soars in Popularity
Gallup Organization Selects Oracle Business Intelligence Suite
London NHS Foundation Trust Improves with Information Builders
White Papers
HP ERP Business Intelligence
Business Intelligence for Tax Planning: Value, Strategy, and Vision
Single Sign-On for Webintelligence
A Structured Method for Specifying Business Intelligence Reporting Systems
Business Intelligence in a Real-Time World
Web Seminars
Looking for speed and accuracy in your financial planning and budgeting?
Hyperion Visual Explorer: Improve Visibility into Performance Management
Reducing the Cost of Deploying and Managing Data
Combining Microsoft Business Intelligence with the Teradata Warehouse
Espresso Shot Web Seminar: Uncorking the Data Bottleneck with Operational BI
Books
Ask the Experts
Look for more responses to your frequently asked questions by our expert panel at www.DMReview.com.
Q: In order to implement aggregate awareness, what are the precautions to be taken?
Tom Haughey's Answer:
First, understand the capabilities of your technology. This could be you RDBMS or your OLAP tools. While they almost all purport to support it, not all support it equally or equally well.
How does your technology maintain the aggregates and achieve its aggregate awareness? Generally there are two choices for keeping the aggregate up-to-date: immediate or deferred. Immediate updates the aggregates each time the underlying data changes. There can be ongoing performance implications in this. The deferred option updates the data in batch at a specific time. Deferred usually requires the entire table to be recalculated and reloaded. Does the aggregate update change only the affected rows or do you have to reload the entire aggregate table? Does the DBMS restrict you from joining the aggregate data to data not contained in the aggregate? Or can you join the result set from query aggregates to other data?
Second, one precaution I strongly recommend is this: don't take at face value everything the vendor tells you. They will almost all tell you they can do everything you want. The solution to this? 1. Do your homework and research the vendor. 2. Get the vendor to demo it using examples you provide. Their examples will all be stacked to work perfectly. 3. Test it on your own.
Third, evaluate your need for aggregates, regardless of aggregate awareness. Here are some criteria.
There are two main reasons to aggregate, namely, to:
- Improve performance or
- Provide consistent numbers for further usage.
Given this, aggregate where:
- There is a large volume of original data. If the volume is low, why aggregate?
- You can achieve a reduction of the data of at least 90% (from 10 to 1). Even smaller reduction ratios will often help but this is a good guideline.
- Aggregate for queries that are run often (collect sample day's queries and do predicate analysis).
- Aggregate data that is stable. This is the problem of restatement. If the underlying data is not stable and would require restatement, then the aggregates would have to be redone. This is very costly. For example, a territory is moved to a new branch office. Your business rule is that the last two years of sales for the territory are moved to the new branch office. If you have aggregates at the Branch level, you will have to revise them. This can be hugely costly because you have to go back over two years of data to do so. This could easily be more costly than creating a weekly aggregate in the first place.
- Aggregate for data that is reused frequently. Say you have a data mart that supports Activity Based Costing. Your costing routines regularly use the following values for each customer: number of orders, number of shipments, number of contacts, etc. It is pointless not to precalculate these They should be calculated and stored so that each algorithm can simply refer to them.
Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.
For more information on related topics, visit the following channels:


