Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Warehousing Lessons Learned:
Data Warehousing After the Bubble Burst

  Column published in DM Review Magazine
February 2003 Issue
 
  By Lou Agosta

Within the Internet bubble, clickstream data warehousing was an early and innovative development. It encouraged the use of the first significant new type of data source since the relational database ? the Web log. It was supposed to be possible to get inside the Web visitor's head in a way not previously imagined to create revenue opportunities. Clickstream data warehousing emerged. However, while it brought forth a tidal wave of new data points about customer clicks, it did little to support the consumption of the data and its integration and transformation into accurate, usable, trustworthy information. That transformation required a series of data processing steps, which are now part of the standard IT repertoire. The enduring truth of the post-Internet era is that data warehousing was not a paradigm shift. Therefore, it has not participated in much of the hype and meltdown that characterized the dot-com meltdown.

Fast forward to today, and successful e-tailers such as Amazon do indeed operate with terabyte clickstream data structures that are the source of significant analysis of customer behavior, CRM-type promotions and collaborative filtering. That material is now mainstream, and the surviving Web e-tailers understand the practices. Their enterprise data models (and implementations) now include key data dimensions and attributes essential to the Web such as page hierarchies, sessions, user IDs and shopping carts.

The process of handling clickstream data is now well-understood, though not necessarily simple. For example, the process requires activities such as reformat the Web server logs, parse log event records, resolve IP addresses, match sessions, identify pages, identify user IDs, perform match-merge processing, and identify customers, products, abandoned shopping carts and click-throughs. The extent to which such processes have become de facto industry standard is indicated by the ready availability of such Web log extractors from the ETL vendors. The process is often facilitated by using a connector or adapter from one of the best-of-breed ETL tools such as Ascential, Hummingbird, Informatica or SAS. As indicated, building an intelligent information integration process to get from a click on a Web page to a relationship with a customer was indeed the right thing to do ? and those Web-oriented firms that did not do so (regardless of the reason) no longer exist. The outcome is that the clickstream is similar to other data in its life cycle in that it starts out being transactional, and by various transformations on the information supply chain, it becomes decision support and a source of analytic insights. In data warehousing, the new realities are the old data management realities:

Perception of business value migrates in the direction of the user interface. If successful, all the work of upstream data integration, will result in an "Aha!" experience as the business analyst gains an insight about customer relations, product offerings or market dynamics. However, a new or better user interface is not in itself the cause of the breakthrough. Without the work of integrating the upstream data, the result would not have been possible.

Data integration requires schema integration. Data integration is arguably a trend with many of the enterprise application integration (EAI), extract, transform and load (ETL) and customer data integration (CDI) service vendors leading the charge. This is useful and valid, but as a trend is subject to marketing hype that leaves users with incomplete results. A schema is a database model (structure) that accurately represents the data in such a way that it is meaningful. To compare entities such as customers, products, sales or store geography across different data stores, the schemas must be reconciled as to consistency and meaning. If the meanings differ, then translation (transformation) rules must be designed and implemented. The point is that IT developers cannot "plug into" data integration by purchasing a "plug in" for a tool without also undertaking the design work to integrate (i.e., map and translate) the schemas representing the targets and sources.

Design consistent and unified definitions of product, customer, channel, sales or store geography, etc. This is the single most important action an IT department can undertake regarding a data warehousing architecture. Key data dimensions and attributes now also include those relevant to the Web such as page hierarchies, sessions, user IDs and shopping carts. Every department (finance, marketing, inventory, production) wants the same data in different form ? that's why the star schema design and its data warehouse implementation were invented. Extensive research is available on how to avoid the religious wars between data warehouses and data marts by means of a flexible data warehouse design.

Though this author was no better at timing the April 2000 signal of the bursting bubble than anyone else, this prediction ? the destiny of the clickstream to become just another enterprise data source ? is a correct call that I made early and often. In summary, the Web ? e-commerce logs, Web-sourced e-mail submissions, etc. - is now just another enterprise data source. The transformation of the clickstream into a source of insight about fundamental business imperatives ? what customers are buying which products or services ? is now part of the mix of heterogeneous data assets.

...............................................................................

For more information on related topics visit the following related portals...
DW Basics.

Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to lagosta@acm.org.

 

 

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Award-Winning Database Administration Tools
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!

Online Backup and Recovery for Business Servers
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.

Data Mining: Strategy, Methods & Practice
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy

Thomson Media

2005 The Thomson Corporation and DMReview.com. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.