Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Text Mining Improves Business Intelligence and Predictive Modeling in Insurance

  Article published in DM Review Magazine
July 2003 Issue
  By Marty Ellingsworth and Dan Sullivan

Business intelligence and statistical analysis techniques are running out of steam. Or at least that appeared to be the case.

Fireman's Fund Insurance Company, for example, tried a wide range of analytic techniques to understand rising homeowner claims and suspicious auto claims, but could not find predictive patterns in the data. The insurance company's team of analysts, led by one of the authors (Ellingsworth), realized the problem was not with their techniques, but with their data. The analysts were dealing with new types of claims that were not fully described by the structured data collected by the company. Fortunately, the additional information was available in adjuster notes and other free-form texts.

To satisfy the accuracy needs of the modeling programs, the company used basic text mining techniques to isolate new attributes from the text and then combined those with previously available structured data to expand the total amount of relevant usable information. The thinking was that if business intelligence techniques seem inadequate, one should just build a better mousetrap. Fireman's Fund subsequently discovered that success might just mean paying closer attention to the supply chain of information where basic data features originate.

In this article, we will describe a basic text mining technique, term extraction, and discuss how it was successfully used at Fireman's Fund to gain insights into urgent business problems. We will also provide some tips that may be of value when introducing text mining to your own organization.

Term Extraction

Term extraction is the most basic form of text mining. Like all text mining techniques, this one maps information from unstructured data into a structured format. The simplest data structure in text mining is the feature vector, or weighted list of words. The most important words in a text are listed along with a measure of their relative importance. For example, consider the following hypothetical claims adjuster notes:

"The claimant is anxious to settle; mentioned his attorney is willing to negotiate. Also willing to work with us on loss adjustment expenses (LAE) and calculating actual cash value. Unusually familiar with insurance industry terms. Claimant provided unusual level of details about accident, road conditions, weather, etc. Need more detail to calculate the LAE."

This text reduces to a list of terms and weights as shown in Figure 1.This list of terms does not capture the full meaning of the text, but it does identify the key concepts mentioned. To identify key terms, text mining systems perform several operations. First, commonly used words (e.g., the, and, other) are removed. Second, words are stemmed or replaced by their roots. For example, phoned and phoning are mapped to phone. This provides the means to measure how often a particular concept appears in a text without having to worry about minor variations such as plural versus singular versions of words.

Figure 1: Example List of Terms and Weights
Figure 1: Example List of Terms and Weights

The final step calculates the weight for each remaining term in a document. There are many methods for calculating these weights, but the most common algorithms use the number of times a word appears in a document (the term frequency, or tf factor) and the number of times the word appears in all of the documents in a collection (the inverse document frequency, or idf factor).1 In any event, large term frequency factors increase the weight of a term while large inverse term frequency factors lower the weight. The general assumption behind this calculation is that terms that appear frequently in a document describe distinguishing concepts unless those terms appear frequently across all texts in the collection.

For another example, consider a workers' compensation claims system. As with other insurance applications, this would track demographics about claimants, location of the accident, type of accident, etc. It may also include Boolean indicators for common conditions involved in past claims, such as slippery floor; but there are practical limitations to the number of such indicators - therefore, free-form text is used for additional details.

Narratives could be used to describe activity prior to the accident, unusual environmental conditions, distracting factors, etc. Term extraction could identify key terms in each narrative (e.g., turning, bending, twisting prior to the accident; leaks, ambient temperature, wind conditions in the environment conditions notes; and noise, foot traffic and other distracting factors in the final narrative). By mapping the free-form text to a feature vector, the text is modeled in the same attribute/value model used by structured data and thus lends itself to analysis using traditional business intelligence tools such as ad hoc reports, OLAP analysis, data mining and predictive modeling.

Applications of text mining are not limited to claims processing. Many business transaction applications, such as customer relationship management, e-mail responses, clinical records and enterprise resource planning (ERP), include both structured data (such as numeric measures and coded attributes) and free-form annotations. CRM systems may track detailed descriptions of customer complaints, doctors may note variations in symptoms or special instructions in a patient's chart and ERP systems might track notes on problems in production runs. Free-form notes are used frequently because we cannot always determine all the attributes relevant to a business process.

In some cases, relevancy changes with time. When suits were brought against Firestone for faulty SUV tires, Fireman's Fund turned to free-form text analysis to determine if any of their claims related to the litigation. Unpredictable cases such as this are candidates for text mining-based analysis.

Fireman's Fund Matches Techniques to Problems

Mastering information is a critical competency for success in the insurance industry. As part of an internal consulting group, Ellingsworth is often faced with making new headway on old problems. These problems typically take the form of making predictions about expected claims and understanding why outcomes vary from those predictions. Only in understanding why the outcomes are unmatched can they craft a set of alternative management solutions.

Text mining helps the Fireman's Fund in at least these three ways: extracting entities and objects for frequency analysis; identifying files with particular attributes for further statistical analysis; and creating entirely new data features for predictive modeling. The first method was used in the Firestone case.

The second method was used when the insurer saw the cost of homeowners' claims soaring in a single state. When the traditional reports failed to provide clarity, the frontline staff was polled to provide suggestions. They indicated that a new type of claim was emerging which involved mold. The effect trailed the occurrence, meaning that by the time it became a serious issue, many cases were already on the books.

Once the company realized the potential liability, it began to examine past claims in an effort to identify claims that required central tracking. Unfortunately, no structured code existed for categorizing and tracking mold risk. The level of effort required to manually examine cases from the prior two years to tag them for this risk was unreasonable. However, by using a handful of known examples, analysts identified patterns in claims using text mining techniques and were able to search for additional files with those patterns. This first-pass filtering was not perfect, but it did yield a much smaller list of files that could be manually coded. While pattern matching based on unstructured data works in some cases, other business problems require more integration of structured and unstructured data.

Some of the Text Mining Tools and Vendors

For more information on commercially available text mining tools, consult:

SAS Text Miner

IBM Intelligent Miner for Text


Insightful Miner for Text

Megaputer Text Analysis

Analysts with Fireman's Fund ran into a wall when trying to build a model to predict suspicious claims in third-party automobile accidents. After modeling with all the available structured data, the models were only marginally useful, and the team was desperate to try new approaches. During a test and validation iteration, analysts observed an interesting phenomenon. Investigators were reading the claim file in order to further categorize cases identified by the model. Then the investigators assessed the behaviors of the claimants and the facts of the claim scenario. This led to the notion that specific recurring themes in the story of the claim were their triggers for further research. That behavioral set prompted the analysts to realize that those features had to be exposed and added to the modeling process. The result was a model that could identify useful referrals that would be kept up to date as new information was added to the files in unstructured form over the life of the claim.

Lessons Learned

Text mining has succeeded at Fireman's Fund because they focused on business fundamentals. If you are hitting the wall with structured data analysis, consider these tips.

First, focus on enhancing the gains of high economic value projects that are already in place. Marginal improvements through the intelligent use of unstructured data can improve ROI. With these near-term identifiable wins, you can fund further research.

Second, consider which projects failed due to lack of detailed data. Can text mining and term extraction in particular create useful data features that allow you to discover heretofore unknown analytical insights?

Third, remember the keys to success in any information technology project: people, process, technology, philosophy and environment. This is a specialized area, and few organizations are equipped with the right talent to succeed without investing in the ongoing education of their business intelligence analysts (assuming they have them). The processes of information extraction and text categorization are supported by many software vendors. However, the creation of company-specific resources, such as a robust predictive taxonomy, requires at least several iterations with subject-matter experts and automated tools.

Fourth, look for approaches that embed ongoing feedback. Such feedback provides a chance for continued improvement and also permits monitoring for drift in vocabulary and for detecting new topics of interest.

Finally, watch for key indicators of projects to avoid. These include:

  • Lack of an executive sponsor.
  • Lack of a method to show the value to the sponsor.
  • Lack of in-house resources.
  • A determination to "do it all yourself."
  • Fear of finding a qualified consulant

Text mining is a powerful technique for expanding the range of data we can analyze. Often, the information we need to understand a business process is available to us; we just are not looking in the right spot for it. As Fireman's Fund has shown, text mining complements existing techniques. Solutions to apparently impenetrable problems are found when both structured and unstructured data are used. Sometimes you need more than just a better mousetrap - you need better mice.


For more information on related topics visit the following related portals...
Content Management and Unstructured Data.

Marty Ellingsworth is the director of operations research, a division of the Customer Research and Strategies group, in marketing at Fireman's Fund Insurance Company. Ellingsworth's current area of specialization centers on turning theoretical data driven solutions into practical initiatives that yield profit. His group has active projects touching every facet of the organization underwriting, sales & marketing, claims spanning most lines of business in both commercial and personal property and casualty insurance. He can be reached at mellings@ffic.com.

Dan Sullivan is president of the Ballston Group and author of Proven Portals: Best Practices in Enterprise Portals (Addison Wesley, 2003). Sullivan may be reached at dsullivan@ballstongroup.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.