Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Volume Analytics:
Transform Early and Often

online columnist Guy Creese     Column published in DMReview.com
February 16, 2006
  By Guy Creese

We're all familiar with the ETL acronym - extract, transform and load - which has been around since business intelligence (BI) as a discipline was invented. Yet the underlying assumption - that BI systems must generate their refined data from previous records in operational systems - is a product of a 10-year-old worldview.

In the mid-90s, the only place data turned up was in operational systems, so it made a lot of sense to distill information from enterprise resource planning (ERP) and customer relationship management (CRM) systems. But advances in raw processing power are starting to call this bedrock assumption into question. Why not just write the BI-friendly format immediately and be done with it? A number of vendors are starting to do just that. Maybe not enough to call it a wave - but certainly enough to call it an interesting ripple.

From Collating Snapshots to Monitoring Streams

This change in BI architecture is occurring due to a general trend I've talked about in the past - the shift from snapshots to streams. Because it was too difficult to do otherwise, companies made do with snapshots: surveying their customer base once per year or writing a customer record only when a really significant operational event occurred, such as the customer buying a product. An enterprise didn't track a customer's offhand comment to a sales rep because it was too much work.

However, today, because everything is now digital, companies can track a whole stream of events - employing a point of sale (POS) system to log what products a customer purchased or using a Web analytics system to monitor the Web pages a customer traversed at the company's Web site. These details are no longer a series of isolated snapshots of behavior, but rather a steady stream of information. With processing power increasing and storage costs dropping, vendors are starting to say, "Why analyze a huge set of snapshots, when we can monitor the stream in real time as it goes by?" This fundamental shift in viewpoint is giving birth to a variety of specialized network appliances that do exactly that. Perhaps a few examples will clarify what I mean. I've omitted the vendor names to drive home the point that the important takeaway is the architectural principle, not who's selling what.

Two Examples

One solution monitors visitor behavior on a Web site. It watches the back-and-forth HTTP stream via a sniffer installed on a network port or at the Web server, using a set of rules to rapidly convert the huge data stream into tiny "signatures." Using a process of semantic compaction, the software can compress a pattern of 40 file downloads - signifying that the user looked at two Web pages, for example - into a short code meaning "Closing an Account - Step 1." These signatures track both customer actions (e.g., placed item in shopping cart) and intent (e.g., spent more than one minute reading the product page).

By storing the compact signatures rather than the raw data, the software is incredibly space efficient. For example, it can store the results of 100 million sessions within only 500GB of disk space.

A second example is a solution that can replay an online user's actions. By storing the data that streamed by the network tap, it can recreate a user session whenever the business asks it to. For example, business analysts can 1) replay a user's online session or 2) watch multiple sessions at a critical process step, such as a Search results returned page.

These capabilities help an enterprise understand the user actions behind metrics such as "Shopping Cart Abandonment: 32 Percent," and "Fraud: 8 Percent." Analysts can put themselves in the user's shoes, and figure out how to make the site easier for loyal customers to navigate - and harder for fraudsters to fool.

Because these applications capture customer behavior in real time, they aren't the classic, "wait-an-hour, wait-a-week, wait-a-month for insight" BI solutions that we're used to. If it wishes to, an enterprise can analyze visitor behavior seconds after it has occurred - an ability that is crucial if the business is focused on preventing fraud, for example.

In Short, Instant Insight by Storing Only the Essentials

What we're witnessing is that systems themselves - not just humans - are starting to suffer from a case of information overload. By screening out extraneous data and storing only the salient points, these specialized appliances are tightly compressing the extraction and transformation steps, leading to virtually instant insight for the corporation. In this "gotta have the info now" world, this stream-oriented analysis capability is not a bad thing.


For more information on related topics visit the following related portals...
Business Intelligence (BI), Data Quality, ETL and Web Analytics.

Guy Creese is an analyst with the Burton Group, covering content management and search. Creese has worked in the high tech industry for 25 years, at both Fortune 500 companies and small startups, in positions ranging from programmer to product manager to customer support engineer.  He can be reached at gcreese@burtongroup.com.

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.