Enterprise Content Management:
Evaluating Enterprise Content Management Tools
A growing niche in the content management market is a class of tools that can index and organize distributed content across a range of platforms and make it accessible through a variety of methods. These enterprise-class tools support three distinct models of access: search, navigation and collaboration. In this month's column, we will examine the benefits and drawbacks of the first two types of access, the various approaches to implementing these processes and, most importantly, how to evaluate tools in each category. Next month's column will focus on collaboration.
The first access method is the common search technique. With these tools, users discover relevant content by specifying keywords and phrases, and Boolean indicators. We all know how effective this can be. Vendors have developed proprietary methods for improving search relevancy by applying some basic rules about word forms, looking at recurring patterns in text and using other statistical analysis methods. While these techniques help, we can't seem to shake one fundamental problem. Regardless of how we try to search, when we improve the chances of finding all relevant content (increasing recall), we tend to increase the number irrelevant hits (decreasing precision). Similarly, when eliminating irrelevant hits, we tend to miss relevant content.
Consequently, the first step in evaluating the effectiveness of a search tool is to understand its rate of precision and recall and how improving one measure affects the other. The speed of indexing content and query response time are also important factors in choosing an enterprise search tool. If you do not have the time or resources for a detailed comparison of enterprise content tools, skip the search tool evaluation and concentrate on evaluating the navigation and organization components instead. There is more variation among vendors' offerings in this area than in the older and better-understood search arena.
In the navigation and organization group, we find categorizers, taxonomy builders and clustering tools. The benefit of these tools is that they allow users to search at higher levels of abstraction. Categorizers assign predefined labels to content that enable users to search with a small set of labels. It's the categorizer, not the user, that must keep track of all the different ways to describe objects in the organization. With a taxonomy, a customer can find a product by browsing a Yahoo!-like directory without having to guess at distinguishing key words. Clustering brings the added benefit of finding content similar to something the user has already found.
Categorizers and taxonomies either use manually crafted rules or, more commonly now, learn classification rules from examples. There is no single approach to learning from examples that works best in all situations; and vendors are turning to either a combination of algorithms, as in Stratify's case, or to supporting a combination of automatic rule induction with manually crafted business rules, the method adopted by Quiver and Verity.
You should consider three criteria when evaluating categorization and taxonomy tools: the accuracy of classification, the number of examples required to train the categorizer and the speed of training.
The accuracy of categorization depends upon the quality of the training examples and effectiveness of the underlying algorithm. Tool developers cannot control the quality of examples, but they do choose their algorithms. While many of us are more interested in integration, customization and cost, we have to pay attention to what occurs under the hood. Vendors sometimes prominently discuss their algorithms while others tuck the details away in technical white papers. In either case, make vendors provide comparative results between their algorithms and their competitors' algorithms. It is easy for vendors to say they have the best categorizer. Make them prove it.
With enough training examples, categorizers will reach acceptable levels of performance. The question is how many examples are required - 1,000 or 10,000? Compiling training examples can be time-consuming, so consider the hidden cost of staff resources when implementing and maintaining categorizers. Also, compare tools with regard to the time it takes to execute the training cycle. This depends both on the number of examples and the underlying algorithm. Again, make vendors provide some concrete numbers.
There is no silver bullet in enterprise content management. No single algorithm works best in all situations, but we are seeing a trend toward hybrid approaches that combine either multiple algorithms or automatic and manual methods. Fully automated methods may not reach necessary accuracy levels without large numbers of high- quality training examples. Manually crafted business rules come with obvious overhead. When evaluating these tools, it is essential to understand the tradeoffs that vendors make, such as higher accuracy at the expense of slower training cycles. Make sure your objectives align with the strengths of the tool you implement.
For more information on related topics visit the following related portals...
Business Intelligence (BI) and
Dan Sullivan is president of the Ballston Group and author of Proven Portals: Best Practices in Enterprise Portals (Addison Wesley, 2003). Sullivan may be reached at email@example.com.
Provided by IndustryBrains
|Best Practices in BI: Webcast featuring Gartner|
View this free Webcast featuring Gartner and Information Builders and hear leading experts share their vision for the future of enterprise business intelligence, including how to maximize the success and ROI of BI applications through best practices.
|See Enterprise Business Intelligence in Action|
See how business intelligence can be used to solve real business problems with this live demo from Information Builders
|Validate Data at Entry. Free Trial of Web Tools|
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.
|Design Databases with ER/Studio: Free Trial|
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.
|Data Mining: Levels I, II & III|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Click here to advertise in this space|