Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Enterprise Search: A Competitive Edge

  Article published in DM Direct Newsletter
November 18, 2005 Issue
  By Chris Wildgoose

With Internet search engines, we've come to expect immediate retrieval of information, and Google has become the de-facto standard on retrieval results. Most companies today have begun implementing searches for their internal content; but they have fallen short of the results offered by Google. Searching documents internal to an organization is known as enterprise search, and it represents a more complicated problem. Beyond the purchase of a search engine, companies must also devote considerable effort to understanding the information needs of the organization. Simply installing the search engine within the organization will most definitely lead to unsatisfactory results. A thorough understanding of the challenges involved help to ensure success.

The Challenges

Searching for content in the enterprise introduces a number of challenges not present in content residing on the Internet. These complications are outlined below, and the reasons range from technical to business.

Multiple Information Needs

Information relevancy on the Internet is basically tuned to the information needs of the general Internet user, but a business organization contains multiple user groups, each with different information needs. A search query entered into an Internet search engine will essentially place the most popular documents at the top of the results list. Popularity on the Internet is determined by the number of hyperlinks to a particular piece of content. This approach works well for a general sense of what is relevant, but not for an organization whose sense of what is relevant may shift. More specifically, users within an organization have different information needs that depend upon their job function. For the same query entered into the search engine, one user group will judge the relevancy of documents returned differently than another. For example, a user in the accounting department considers accounting materials more relevant than a user in the engineering department.

Demands for Greater Control

The organization requires greater control over its content in order to support its business objectives. The general Internet user doesn't require this level of control and may be satisfied with a search that bypasses important information. To the business, glossing over data necessary for maintaining competitive advantage is not an option. So the business requires a more granular search mechanism. The terms precision and recall are typically used by search engine vendors to determine the quality of search results. Precision is defined as the fraction of retrieved documents that are relevant, while recall is defined as the fraction of all relevant documents that are retrieved. There is usually an inverse relationship between precision and recall, but the business user needs to be able to ask precise questions of their data and to receive precise answers with high recall. This mechanism should be able to filter out extraneous results not pertaining to an employee's job function.

Technical Limitations

For documents residing on the Internet, relevancy is determined through link analysis techniques, i.e., a document is determined as more important when it is referenced more frequently in other Web pages. Enterprise search engines, however, do not have the luxury of determining relevancy through this approach. Much of the content in an organization may reside in file systems and content management systems tht are inherently less connected than information residing on the Web. This has huge ramifications when determining the relevance of documents for searches conducted within the enterprise. Without the ability to perform link analysis, the enterprise search algorithms are forced to rely on other algorithms to determine relevancy.


No security restrictions apply for information residing on the Internet, but content within the organization demands the adherence to strict security requirements. To prevent a user from viewing unauthorized material, the search application needs to integrate into the existing security model of the organization. These security requirements force the search engine to keep track of the access rights during the indexing process of a document. This requirement introduces integration headaches, as it forces the organization to consider its single sign-on strategy. This may also contribute to some degradation in query speed when searching for information.

Multiple Repositories

All content on the Internet is indexed using bots that traverse from page to page. Content in the enterprise typically resides in multiple repositories. This information may reside in both structured and unstructured repositories such as internal Web sites, databases, e-mails and content management systems. Each repository type requires a different tool to index into the search engine and to allow for a unified search.

Difficult-to-Reach Content

Internet search engines often bypass difficult-to-reach content. This content, residing in the "deep Web," never gets indexed into search engines. Internet bots are only capable of traversing from link-to-link, and the content of the deep Web is usually accessed through a form post or through programming logic embedded within JavaScript. Preventing content from being indexed is usually unacceptable to the enterprise. The integrator needs to come up with creative techniques for getting this content into the search engine and may even need to develop custom from scratch.

Dynamic Content

Dynamic content that changes quickly is problematic for search engines whose indexes may never reflect the content of the real document. For an organization that requires timely data, this may be unacceptable. In this scenario, the organization may require the development of a real-time system to synchronize the index with the data.

Companies today are often leery of making the sizable investments required for successful search initiatives when the cost of enterprise search software alone ranges in the six digits. Successful search also requires considerable consulting commitment along with ongoing maintenance. While the investment may be sizable, the payoff in the form of increased efficiency for the organization is considerable.

For more information on related topics visit the following related portals...
Data Management and Unstructured Data.

Chris Wildgoose is managing partner at KnowledgeStream, an emerging consulting firm specializing in unstructured data management and business visualization. Prior to KnowledgeStream, he was president of Gooseworks, which later merged with Unitas. He can be reached at chrisw@knowledge-stream.com.

Solutions Marketplace
Provided by IndustryBrains

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.