Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Enterprise Content Management:
Five Elements of Successful Enterprise Search

  Column published in DM Review Magazine
June 2002 Issue
  By Dan Sullivan

The availability of robust and scalable portal and search tools makes it possible to implement enterprise search services. The general idea behind enterprise search is that content anywhere in the organization is accessible through a single search tool, usually as part of an enterprise portal. While a far cry from the implementation quagmires of ERP and CRM systems, enterprise search applications can fail if you do not plan properly.

To ensure a successful implementation of enterprise search, be sure to address: sizing, duplication, design, control and metrics.

Precise figures are not needed when sizing your existing content, but you need to know orders of magnitude to plan for storage, network utilization and processing times. It is also important to know where the content resides. Crawling, the process of gathering content for indexing, must be scheduled so that file, Web and database servers are not overwhelmed with I/O operations. Sizing early in the implementation phases of enterprise search will also provide a baseline for monitoring growth, as estimates on information growth vary widely.

With a handle on the extent of enterprise content, move to the second step of minimizing duplication. Having multiple, identical documents in an index will waste space. Variations in versions will raise questions of authenticity; for example, which 401(k) plan document is accurate, the one from finance or human resources? Some duplication will always exist. E-mail attachments sent to multiple recipients are duplicated. Copies are made when access controls might prevent users from retrieving documents in a directory even if the user has legitimate reason to read some of them. The goal is not to resort to draconian measures to prevent duplication, but to recognize the problem and clean up duplicates when possible. It is difficult to determine how much duplicate content you can expect to eliminate. Estimating duplicate content is also problematic. The "How Much Information?" study conducted at the University of California, Berkley, indicates that original content constitutes roughly 20 percent of all digital content. That's probably the best general estimate we will find.

Thirdly, design the enterprise search architecture to meet your infrastructure requirements. The design must account for additional storage requirements and increased network traffic, particularly during crawling and load balancing for indexing and query response. Additional considerations include replicating indexes to improve performance, backup and recovery procedures and network security. In many ways, the compilation of the index for enterprise search is similar to the extract, transform and load process in data warehousing. The processes often occur during off hours creating limited windows of opportunity, the initial build is much more time- consuming than incremental changes and the processes collecting information will need access to multiple systems. Small installations can operate effectively with a single server for indexing and query processing. Midsized sites should consider using separate indexing and query processing servers. Large enterprises will require a brokered or federated architecture. Brokered systems use multiple indexing and query processing servers with a single broker process distributing the work between them. When enterprise search is mission critical, a failover broker should be in place. Federated architectures also distribute the workload; however, they use different search engines to search different repositories, such as Lotus Notes, Open Text Livelink and Web search engines. The results are then combined and presented to the user. One of the advantages of federated searches is that a centralized logical index is not required. On the other hand, some of the more advanced functions related to personalization and vendor- specific features depend upon information maintained in centralized indexes.

Controlling search operations is the fourth element of successful implementations. Controls begin with policies. At the very least, policies should be defined that identify the type of content to index, the frequency of updates and access controls on the content. Some content, such as enterprise portal content, should clearly be included in enterprise search; other content, such as confidential legal and human relations documents, should not. However, the inclusion of remaining content is not determined quite as easily. Should user directories be included even if the user is the only one with access to the directory? Should you restrict indexing? The answers to these questions must balance indexing and query-processing resources with the ability to improve the way users work.

Finally, establish metrics to measure enterprise search performance. Key indicators available from Web log analysis are the number of queries issued, average number of hits per query, number of queries per session and query response time. Ideally, a user should have to issue few queries per session, receive relatively few hits and receive responses quickly. More advanced analysis on terms used in queries can help guide category and taxonomy development.

Successful enterprise search will not just happen by installing some software and letting it run. However, attention to these five key elements will get you where you want to go.


For more information on related topics visit the following related portals...
Content Management.

Dan Sullivan is president of the Ballston Group and author of Proven Portals: Best Practices in Enterprise Portals (Addison Wesley, 2003). Sullivan may be reached at dsullivan@ballstongroup.com.

Solutions Marketplace
Provided by IndustryBrains

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

KOM Networks Archiving and Data Storage
KOM Networks, a leader in archiving and data storage for more that 37 years, offers organizations a cost effective means to secure their growing data stores.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.