Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search

Business Intelligence
Corporate Performance Management
Data Management
Data Modeling
Data Quality
Data Warehousing Basics
Master Data Management
View all Portals

Scheduled Events

White Paper Library
Research Papers



DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Enterprise Content Management:
Evaluating Enterprise Content Management Tools

  Column published in DM Review Magazine
March 2002 Issue
  By Dan Sullivan

A growing niche in the content management market is a class of tools that can index and organize distributed content across a range of platforms and make it accessible through a variety of methods. These enterprise-class tools support three distinct models of access: search, navigation and collaboration. In this month's column, we will examine the benefits and drawbacks of the first two types of access, the various approaches to implementing these processes and, most importantly, how to evaluate tools in each category. Next month's column will focus on collaboration.

The first access method is the common search technique. With these tools, users discover relevant content by specifying keywords and phrases, and Boolean indicators. We all know how effective this can be. Vendors have developed proprietary methods for improving search relevancy by applying some basic rules about word forms, looking at recurring patterns in text and using other statistical analysis methods. While these techniques help, we can't seem to shake one fundamental problem. Regardless of how we try to search, when we improve the chances of finding all relevant content (increasing recall), we tend to increase the number irrelevant hits (decreasing precision). Similarly, when eliminating irrelevant hits, we tend to miss relevant content.

Consequently, the first step in evaluating the effectiveness of a search tool is to understand its rate of precision and recall and how improving one measure affects the other. The speed of indexing content and query response time are also important factors in choosing an enterprise search tool. If you do not have the time or resources for a detailed comparison of enterprise content tools, skip the search tool evaluation and concentrate on evaluating the navigation and organization components instead. There is more variation among vendors' offerings in this area than in the older and better-understood search arena.

In the navigation and organization group, we find categorizers, taxonomy builders and clustering tools. The benefit of these tools is that they allow users to search at higher levels of abstraction. Categorizers assign predefined labels to content that enable users to search with a small set of labels. It's the categorizer, not the user, that must keep track of all the different ways to describe objects in the organization. With a taxonomy, a customer can find a product by browsing a Yahoo!-like directory without having to guess at distinguishing key words. Clustering brings the added benefit of finding content similar to something the user has already found.

Categorizers and taxonomies either use manually crafted rules or, more commonly now, learn classification rules from examples. There is no single approach to learning from examples that works best in all situations; and vendors are turning to either a combination of algorithms, as in Stratify's case, or to supporting a combination of automatic rule induction with manually crafted business rules, the method adopted by Quiver and Verity.

You should consider three criteria when evaluating categorization and taxonomy tools: the accuracy of classification, the number of examples required to train the categorizer and the speed of training.

The accuracy of categorization depends upon the quality of the training examples and effectiveness of the underlying algorithm. Tool developers cannot control the quality of examples, but they do choose their algorithms. While many of us are more interested in integration, customization and cost, we have to pay attention to what occurs under the hood. Vendors sometimes prominently discuss their algorithms while others tuck the details away in technical white papers. In either case, make vendors provide comparative results between their algorithms and their competitors' algorithms. It is easy for vendors to say they have the best categorizer. Make them prove it.

With enough training examples, categorizers will reach acceptable levels of performance. The question is how many examples are required - 1,000 or 10,000? Compiling training examples can be time-consuming, so consider the hidden cost of staff resources when implementing and maintaining categorizers. Also, compare tools with regard to the time it takes to execute the training cycle. This depends both on the number of examples and the underlying algorithm. Again, make vendors provide some concrete numbers.

There is no silver bullet in enterprise content management. No single algorithm works best in all situations, but we are seeing a trend toward hybrid approaches that combine either multiple algorithms or automatic and manual methods. Fully automated methods may not reach necessary accuracy levels without large numbers of high- quality training examples. Manually crafted business rules come with obvious overhead. When evaluating these tools, it is essential to understand the tradeoffs that vendors make, such as higher accuracy at the expense of slower training cycles. Make sure your objectives align with the strengths of the tool you implement.


For more information on related topics visit the following related portals...
Business Intelligence (BI) and Content Management.

Dan Sullivan is president of the Ballston Group and author of Proven Portals: Best Practices in Enterprise Portals (Addison Wesley, 2003). Sullivan may be reached at dsullivan@ballstongroup.com.

Solutions Marketplace
Provided by IndustryBrains

Best Practices in BI: Webcast featuring Gartner
View this free Webcast featuring Gartner and Information Builders and hear leading experts share their vision for the future of enterprise business intelligence, including how to maximize the success and ROI of BI applications through best practices.

See Enterprise Business Intelligence in Action
See how business intelligence can be used to solve real business problems with this live demo from Information Builders

Validate Data at Entry. Free Trial of Web Tools
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.