Enterprise Search: A Competitive Edge
With Internet search engines, we've come to expect immediate retrieval of information, and Google has become the de-facto standard on retrieval results. Most companies today have begun implementing searches for their internal content; but they have fallen short of the results offered by Google. Searching documents internal to an organization is known as enterprise search, and it represents a more complicated problem. Beyond the purchase of a search engine, companies must also devote considerable effort to understanding the information needs of the organization. Simply installing the search engine within the organization will most definitely lead to unsatisfactory results. A thorough understanding of the challenges involved help to ensure success.
Searching for content in the enterprise introduces a number of challenges not present in content residing on the Internet. These complications are outlined below, and the reasons range from technical to business.
Multiple Information Needs
Information relevancy on the Internet is basically tuned to the information needs of the general Internet user, but a business organization contains multiple user groups, each with different information needs. A search query entered into an Internet search engine will essentially place the most popular documents at the top of the results list. Popularity on the Internet is determined by the number of hyperlinks to a particular piece of content. This approach works well for a general sense of what is relevant, but not for an organization whose sense of what is relevant may shift. More specifically, users within an organization have different information needs that depend upon their job function. For the same query entered into the search engine, one user group will judge the relevancy of documents returned differently than another. For example, a user in the accounting department considers accounting materials more relevant than a user in the engineering department.
Demands for Greater Control
The organization requires greater control over its content in order to support its business objectives. The general Internet user doesn't require this level of control and may be satisfied with a search that bypasses important information. To the business, glossing over data necessary for maintaining competitive advantage is not an option. So the business requires a more granular search mechanism. The terms precision and recall are typically used by search engine vendors to determine the quality of search results. Precision is defined as the fraction of retrieved documents that are relevant, while recall is defined as the fraction of all relevant documents that are retrieved. There is usually an inverse relationship between precision and recall, but the business user needs to be able to ask precise questions of their data and to receive precise answers with high recall. This mechanism should be able to filter out extraneous results not pertaining to an employee's job function.
For documents residing on the Internet, relevancy is determined through link analysis techniques, i.e., a document is determined as more important when it is referenced more frequently in other Web pages. Enterprise search engines, however, do not have the luxury of determining relevancy through this approach. Much of the content in an organization may reside in file systems and content management systems tht are inherently less connected than information residing on the Web. This has huge ramifications when determining the relevance of documents for searches conducted within the enterprise. Without the ability to perform link analysis, the enterprise search algorithms are forced to rely on other algorithms to determine relevancy.
No security restrictions apply for information residing on the Internet, but content within the organization demands the adherence to strict security requirements. To prevent a user from viewing unauthorized material, the search application needs to integrate into the existing security model of the organization. These security requirements force the search engine to keep track of the access rights during the indexing process of a document. This requirement introduces integration headaches, as it forces the organization to consider its single sign-on strategy. This may also contribute to some degradation in query speed when searching for information.
All content on the Internet is indexed using bots that traverse from page to page. Content in the enterprise typically resides in multiple repositories. This information may reside in both structured and unstructured repositories such as internal Web sites, databases, e-mails and content management systems. Each repository type requires a different tool to index into the search engine and to allow for a unified search.
Dynamic content that changes quickly is problematic for search engines whose indexes may never reflect the content of the real document. For an organization that requires timely data, this may be unacceptable. In this scenario, the organization may require the development of a real-time system to synchronize the index with the data.Companies today are often leery of making the sizable investments required for successful search initiatives when the cost of enterprise search software alone ranges in the six digits. Successful search also requires considerable consulting commitment along with ongoing maintenance. While the investment may be sizable, the payoff in the form of increased efficiency for the organization is considerable.
For more information on related topics visit the following related portals...
Data Management and
Chris Wildgoose is managing partner at KnowledgeStream, an emerging consulting firm specializing in unstructured data management and business visualization. Prior to KnowledgeStream, he was president of Gooseworks, which later merged with Unitas. He can be reached at email@example.com.
Provided by IndustryBrains
|Recover SQL Server or Exchange in minutes|
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.
|Design Databases with ER/Studio: Free Trial|
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.
|Speed Databases 2500% - World's Fastest Storage|
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage
|Manage Data Center from Virtually Anywhere!|
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.
|Free EII Buyer's Guide|
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.
|Click here to advertise in this space|