DM Review Published in DM Review Online in August 2005.
Unified Business Intelligence: UIMA - Is IBM Hearing Voices?

by Ronen Feldman, Ph.D.

Clearly, we are on to something with this unified business intelligence (UBI) stuff that we've been talking about for the past two years. If we weren't, IBM probably wouldn't be interested in it.

Earlier this month IBM announced the open unstructured information management architecture (UIMA) standard, essentially a platform for integrating structured data and unstructured information. Even if they don't directly claim to have done so, Big Blue has officially offered the world a blueprint for UBI. This is a very good thing; the successful unification of structured data and unstructured content means faster innovation, better CRM, improved financial insight, reduced costs and all-around better performance.

UIMA is a standard architecture developed by IBM Research that can be used to add text analytics to any application. It enables interoperability among different analytics software and enterprise applications, and provides tools for developers to speed the creation of new, reusable text analytics components. UIMA can also be used to support wider reaching enterprise search applications by providing a common mechanism for delivering natural language processing, helping computers understand text and extract deeper levels of meaning, including the relationships that define specific facts. UIMA compliant text analytic components can use WebSphere Information Integrator OmniFind Edition to clarify the meaning of terms, mine text to find hidden knowledge and extract useful business information.

The UIMA news was immediately endorsed by a host of business intelligence (BI) providers, including SAS Institute, Cognos and ClearForest. These companies announced UIMA - compliant solutions for a range of applications including quality early warning, fraud detection and customer loyalty that enable companies to extract and analyze vital information from unstructured content (such as Word files, Web pages, RSS feeds, warranty claims or call center notes) for use in value-added business intelligence applications.

To date, BI has matured almost entirely around numerical data. BI has evolved from tactical use to a key component in forging corporate direction, part of everything from credit card mailings to investment strategy. But until recently BI has consisted mostly of tracking, storing and analyzing structured data. Unstructured data, or text, has been an unequal partner in BI, primarily because businesses have not had access to the tools necessary to extract relevant data points and structure them appropriately for analysis. IBM's UIMA is significant, because it offers a complete technology standard for integrating the two worlds of structured and unstructured information.

Unstructured data often tells us the most about a constituency's voice - by not tracking, organizing and analyzing this data, businesses miss out on capturing the voice of the customer, the supplier, or the industry influencer. However, "voices" are inherently complicated; identifying how to tap a voice is the most critical element of a UBI implementation. A great number of categorization, unstructured data management and taxonomy creation projects have died from their inherent broad scope and the lack of an overarching structure for integrating unstructured data into the structured BI stack. UIMA provides that superstructure, and in so doing has paved the way for more success.

What is the long-term impact of the UIMA announcement? While it is still too early to know for certain, we can assume it means several things. We can assume it means greater collaboration between BI vendors, yielding a new generation of tools to provide greater competitive advantage, manage risk and improve customer loyalty.

Examples of these tools will include software that enables manufacturers to integrate unstructured technology into their business intelligence stack to help identify the root causes of warranty claims, and improve their products. Similarly, consumer goods manufacturers might use text analytics to analyze customer comments and better understand consumer needs, preferences and suggestions. Whether its insurance fraud, retention of cell-phone customers, improved clinical trials, faster equity analysis or national defense, UIMA and the marriage of structured data and unstructured content will make it easier to accomplish.

Dr. Ronen Feldman. PhD is one of the leading minds in the field of text mining and draws on years of experience in the development of knowledge discovery systems and text mining applications. Feldman is responsible for ClearForest's technical business development, rapid prototyping, and the research and development of new products. In particular, he is in charge of the wireless segment and development of language models for new vertical domains. Feldman serves as a consultant to leading Israeli companies and serves on the program committees of AAAI, KDD, PKDD and SIGIR. He is often an invited speaker in academic and industrial conferences, and he is a senior lecturer in the Mathematics and Computer Science Department of Bar-Ilan University in Israel.

