Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search

Business Intelligence
Corporate Performance Management
Data Integration
Data Quality
Data Warehousing Basics
Master Data Management
View all Portals

Scheduled Events

White Paper Library
Research Papers



DM Review Home
Current Magazine Issue
Magazine Archives
DM Review Extended Edition
Online Columnists
Ask the Experts
Industry News
Search DM Review

Tech Evaluation Center:
Evaluate IT solutions
Buyer's Guide
Industry Events Calendar
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

A Clear View:
Comprehensive Insight: Structured and Unstructured Analysis

online columnist Michael L. Gonzales     Column published in DMReview.com
February 8, 2007
  By Michael L. Gonzales

Editor's note: DM review would like to welcome Michael Gonzales as our newest online columnist. He brings a wealth of knowledge and insight with his Hands-On BI experience. Many of you may have read his book, BI Strategy: How to Create and Document or taken one of his classes. Look for his column the second week of each month.

A clear view to customers, suppliers, regulatory requirements and patients requires access to all data - structured and unstructured. It is estimated that more than 85 percent of all business information exists as unstructured data, commonly appearing in emails, memos, reports, letters, presentations and Web pages.1

Organizations are buried in unstructured content. But unstructured data does not mean irrelevant or lacking business intelligence (BI) value. Quite the contrary, this data describes much of your business activity, providing important insights about customers' habits, tastes, product use, employee work habits and business process efficiencies and/or failures.

Figure 1

Unfortunately, the structured and unstructured areas of data analysis have historically been separated by technology, technique and staff expertise. Typical analysis of unstructured data has been limited to search tools that locate documents stored in file-based servers (such as Web servers, document management servers, etc). In comparison, structured data analysis utilizes BI tools for query and reporting or slicing and dicing business activity stored in relational database management systems (RDBMSs). The staff trained in BI techniques and technologies, moreover, is typically not skilled in the linguistic and other specialized techniques required for analyzing unstructured content. Consequently, unstructured data analysis is rarely attempted by the BI team.

What is needed is a means to converge the two areas of analysis. When unstructured and structured data are blended for analysis, decision-makers are armed with comprehensive insight in order to drive the prescriptions they apply to improve business operations, including:

  • Automatically identifying top issues in call center logs (unstructured) and proactively routing calls to the right person based on the issue can save millions through reduced call time, not to mention improved customer service.
  • Rapidly detecting emerging product trends in problem reports (unstructured) coming in from all over the globe can avoid recalls and lawsuits, potentially saving companies millions of dollars.
  • Analyzing patient comments (unstructured), doctor notes (unstructured) and symptom data can lead to better disease management and identification of new uses for drugs.
  • Capitalizing on customer feedback (unstructured) following a product launch can help adjust marketing campaigns months ahead of competitors.
  • Reducing hundreds of boxes of documents (unstructured) down to the two that are relevant as part of the legal discovery process reveals previously hidden information in less time than if all documents were read by human beings, which focuses critical resources on higher value tasks.
  • Automatically mining thousands of SEC reports (unstructured) to predict poor corporate governance can help identify issues before they turn into major crises.

The Evolution of Unstructured Analytics

Text processing and unstructured data analysis have evolved over time. The underlying technologies continue to improve, just recently achieving a level maturity to support the types of analysis previously discussed.

First Generation: Keyword Search

The first generation text analysis technologies afforded "search" capability. Keyword search is conducted to help a user find documents containing words and concepts described by the keywords. While great for retrieving and grouping keywords within documents, these tools have many well-known problems that make them impractical to use for unstructured analytics.

  • These tools are unable to track or quantify the evolution of ideas or the changes in activity levels of tracked people, processes or organizations that may be searched.
  • Search tools were designed to be easy to use, which restricts their analysis capability to simple Boolean (not/and/or) expressions.
  • Although great at rapidly returning documents, a user still must take the time to read through the returned documents to extract meaning from them.
  • A search tool relies on the user to identify the right combination of keywords to extract the desired information.

As a result of these limitations, search applications typically require a great deal of manual effort to sift through documents and connect bits and pieces of information to make decisions from unstructured data. For example, many law firms hire paralegals or junior lawyers to manually sift through documents using search interfaces during a discovery process to tag those that are relevant.

Second Generation: Point Text Analysis

The limitations of the keyword search applications led to a second generation technology, point text analysis. These tools solve a variety of problems related to understanding the meaning of a document. They can scan a text document, for example, pulling out names, or identifying events, locations, products, opinions about products, problems, methods, etc. Vendors refer to their products as "entity extraction, "concept extraction" or "name matching" products. And while they are valuable at helping users to resolve documents, the technologies all tend to drive stovepipe solutions in that they solve a specific problem or work in a specific functional area of the business.

The Next Generation: Content Mining Platform

As organizations adopt analytical approaches to unstructured data, they will need to address a number of challenges.

  • Data comes from multiple unstructured repositories (file servers, document management systems, intranet sites, Internet sites, database notes fields, etc.).
  • Data in unstructured documents is of widely varying quality, much more so than structured data.
  • The use of different types of unstructured data tools varies greatly from environment to environment and from problem to problem.
  • In many cases, maximum value in analyzing unstructured data comes from analyzing in conjunction with structured data stored in data marts or data warehouses

Fortunately, many of the challenges with unstructured data analytics can be overcome by applying lessons from the BI and data warehousing (DW) sectors. Over a couple of decades, departmental, single-point solutions have evolved to robust enterprise DW and BI implementations, establishing platforms for data integration, storage and reporting solutions.

To be successful in the unstructured world, organizations need a content mining and analysis platform that leverages existing BI investments. This platform focus must be to efficiently and effectively source, transform, store and analyze unstructured data on a consistent and continuous basis.

Implementing a formal content mining platform, using proven BI/DW methodologies and exploiting (where possible) familiar BI technologies, increases the chance of successfully converging the two areas of analysis: structured and unstructured.

This approach leverages the best from both, unlocking the true potential of unstructured data and providing decision-makers with the necessary and comprehensive insight to drive the modern organization.

Acknowledgement: Content for this article was taken from a white paper,"Converging Text and BI: The Case for a Content Mining Platform," published by Clarabridge, March 6, 2006.


1. Wikipedia

For more information on related topics visit the following related portals...
Analytics, Business Intelligence (BI) and Unstructured Data.

Michael L. Gonzales has been a chief architecture and solutions strategist for more than a decade, specializing in business intelligence technologies and techniques. Gonzales is currently a Principle at Claraview, Inc., where he leads the Education department teaching a series of DW/BI courses internationally. He is a successful author; his latest book is BI Strategy: How to Create and Document. You can reach him at michael.gonzales@claraview.com.

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2007 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.