Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search

Business Intelligence
Corporate Performance Management
Data Management
Data Modeling
Data Quality
Data Warehousing Basics
Master Data Management
View all Portals

Scheduled Events

White Paper Library
Research Papers



DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Enterprise Content Management:
The Most Important BI Merger Isn't Making Headlines

  Column published in DM Review Magazine
January 2004 Issue
  By Dan Sullivan

Mergers in the business intelligence (BI) market have been big news lately. Hyperion acquired Brio. Informatica picked up Striva. Business Objects bought Crystal Decisions. Ascential bought Mercator. The market is maturing and consolidating. Vendors are improving features such as reporting tools and real-time extract, transform and load (ETL). Do any of these mergers really change the BI landscape? No. Query tools are still too complex for many users. Real-time data warehousing is still difficult. These mergers are not the fundamental shifts that change a discipline such as BI. The real news in the analytics market is about a merger that has nothing to do with what we see in the headlines.

The most interesting work in business intelligence today surrounds the merger of two technologies -- structured data analysis and unstructured data analysis. Megaputer was one of the earliest vendors to integrate text analysis with data mining, but others are joining in the trend. BI vendors, especially the statistical software companies such as SPSS and SAS, are integrating data and text mining functions to expand the scope of BI. Expect to see more.

The basic idea is straightforward: analyze unstructured text, identify important terms and concepts, and map that information into a more structured format that is suitable for data mining and statistical analysis. We are seeing this merging of technologies now for three reasons.

First, we are running into the law of diminishing returns. Conventional BI tools can squeeze only so much information out of a data set. Consider the staples of structured analysis: ad hoc query tools, online analytical processing (OLAP) technologies and data mining applications. They work well with the attributes that fit neatly into the rows and columns of databases. There are only so many ways to look at the same set of data before you stop finding new insights. The problem is not that we lack good analytic techniques, but that we need more data to work with.

The second driver is the millions of pieces of unstructured data that organizations have gathered which wait untapped. These comments and notes reflect information that does not fit into the coded attributes of transaction processing systems. Customer service representatives document detailed reasons why a customer is changing mobile phone services. Mechanics describe recurring problems with machinery covered under warranty. Claims adjusters note patterns indicative of fraud. All of this unstructured text contains valuable information if only it could be put to use. Information that could be culled from those comments and notes has been effectively off-limits to conventional BI.

Third, text analysis tools are mature enough to effectively analyze unstructured data in a BI context. In some cases, text analysis can be as simple as searching for the occurrence of particular words or phrases. For those problems we have fast, bit-parallel algorithms that match simple patterns and regular expressions. Some tolerate errors in matches, a feature that is essential when dealing with comments written in a hurry such as those in call center databases.

When simple pattern matching is not enough, we have tools such as part-of- speech taggers, noun-phrase extractors and lexical databases that help identify entities and, in simple cases, their relationships. Company names, locations, dates and currency amounts are commonly extracted entities. Clear Forest's Clear Tags and InXight's Thing Finder fall into this tool category.

We are in the early stages of the merging of this technology, and there are definite limits. Text analysis tools, especially those with strong linguistic analysis capabilities, do not scale well. Pattern-matching techniques can handle the millions of rows of data found in customer relationship management (CRM) systems, but more complex text analysis is best limited to smaller data sets.

Some tools require specialized knowledge for customization. Unless you have a linguist on staff, beware of what you attempt. Vendors are conscious of this problem and have already made strides to improve the situation.

Combining structured and unstructured analysis is paying off. Some companies are realizing 10 percent lifts over models based on structured analysis alone. Others are adapting call center scripts to address potential problem areas identified by analyzing CRM notes in real time. It is still too early predict the overall impact of combined structured/unstructured analysis, but early indicators are favorable.

BI is fundamentally changing. Unstructured data is now targeted for analysis. We have the tools to extract patterns and entities from text and make them accessible to conventional BI techniques. Mergers will continue to make news in the BI market, but few will be as important as this one.


For more information on related topics visit the following related portals...
Data Analysis and Unstructured Data.

Dan Sullivan is president of the Ballston Group and author of Proven Portals: Best Practices in Enterprise Portals (Addison Wesley, 2003). Sullivan may be reached at dsullivan@ballstongroup.com.

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.