Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Unified Business Intelligence:
Part I, Voices of BI

online columnist Ronen Feldman, Ph.D.     Column published in DMReview.com
February 24, 2005
  By Ronen Feldman, Ph.D.

Editor's note: DM Review welcomes Ronen Feldman as the newest online columnist. Each month he will cover the history and recognition of the importance of unified data (structured and unstructured).

Are you hearing voices? Maybe you should be.

Your company has a mature, extensive, accurate and scalable business intelligence infrastructure in place.  You can analyze how many widgets your business makes in a day, how many it sells and what the profit margin is.  You know what your customers are buying and you know what your suppliers are selling.  But do you know what they're saying?  Most likely you do not. 

To date, business intelligence (BI) has matured almost entirely around numerical data.  BI has evolved from tactical use to a key component in forging corporate direction, part of everything from credit card mailings to investment strategy.  But, until recently, BI has consisted mostly of tracking, storing and analyzing structured data.  Unstructured data, or text, has been an unequal partner in BI, primarily because businesses have not had access to the tools necessary to extract relevant data points and structure them appropriately for analysis.  Unstructured data often tells us the most about a constituency's voice - by not tracking, organizing and analyzing this data, businesses miss out on capturing the voice of the customer, the supplier or the industry influencer.

Part of the challenge surrounding unstructured data analysis is processing the sheer volume of information available. The Web, more powerful PCs, cheap storage and high bandwidth, along with enterprise applications such as e-mail, call center software, CRM and ERP solutions have driven knowledge workers to create, share and store unprecedented amounts of unstructured data.  According to a study by the University of California, Berkeley, the amount of information in the world has doubled over the last three years to roughly five exabytes.  According to researchers at Berkeley, it would take 37,000 new libraries the size of the Library of Congress to house that information if printed.  Ninety-two percent of that information is data stored on magnetic media, primarily hard disks. 

If BI tools only access structured data, a lot of information is being left out of the decision-making process. Every organization is collecting text that is relevant to business processes.  Hidden in this text is information that provides a complete view of the customer, internal and external social networks, the supply chain, business development and innovation.  

The current landscape of hypermergers, accelerated innovation and the increasing pressure on regulatory compliance has prompted the renewed search for new ways to tap into and leverage unstructured content. While new search technologies from Google, MSN and others are heating up discussion around gaining access to unstructured data, they don't provide broad data extraction and analysis capabilities needed to integrate unstructured data into the business intelligence infrastructure or "stack."

Structured data analysis often provides answers to "when," "who" and "what" is happening, but falls short of providing visibility into "cause, complaint, correction" information.  Unstructured data provides situational context around an event or set of events that answers the questions "why" and "how," essentially filling in the "cause, complaint, correction" knowledge cycle.  Knowing the "why" and "how" empowers organizations to uncover hidden relationships, evaluate events, discover unforeseen patterns and facilitate problem identification for rapid resolution. Utilizing intelligence extracted from unstructured data enables organizations to avoid loss of profit margins due to preventable write-offs, customer churn, legal settlements, warranty claims or inefficient product development cycles.

For example, structured data enables carmakers to determine when a warranty repair is made, what was broken, what model was affected and the age of the vehicle repaired.  But, by analyzing the unstructured data in the comments field of a warranty claim, the manufacturer can identify the set of circumstances that may have caused a problem.  Reading through thousands of warranty claims is not feasible, but using a text analytics tool, the manufacturer can analyze all of this unstructured data to pull out potential root causes for warranty repairs.  The combined result, in this case, is that the carmaker knows what repairs were made, when, how old the vehicle was and what the root cause was.

To effectively realize the benefit of integrating structured and unstructured data, BI must include an integrated view of structured and unstructured data.  Advances in standards, hardware, software and storage are making this integrated view possible.  The relational leaders such as IBM, Microsoft and Oracle are already raising the stakes by offering better support for unstructured content. With its acquisition of Venetica in August of 2004, IBM Software Group's Data Management unit extended its access to unstructured data sources.   For its part, Oracle provides enhanced support unstructured content management with Oracle 10g.  XML's emerging prevalence along with better, faster text analysis tools make tagging unstructured data and integrating it into structured data sets a reality. 

The stakes are high, but so are the potential rewards.  The successful unification of structured data and unstructured content means faster innovation, better CRM, improved financial insight, reduced costs and better performance.  Over the next six months this column will seek to start a dialogue in the business intelligence community about unstructured content and its role in the new BI "stack."


For more information on related topics visit the following related portals...
Business Intelligence and Unstructured Data.

Dr. Ronen Feldman. PhD is one of the leading minds in the field of text mining and draws on years of experience in the development of knowledge discovery systems and text mining applications. Feldman is responsible for ClearForest's technical business development, rapid prototyping, and the research and development of new products. In particular, he is in charge of the wireless segment and development of language models for new vertical domains. Feldman serves as a consultant to leading Israeli companies and serves on the program committees of AAAI, KDD, PKDD and SIGIR. He is often an invited speaker in academic and industrial conferences, and he is a senior lecturer in the Mathematics and Computer Science Department of Bar-Ilan University in Israel.

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.