Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Beyond "I'm Feeling Lucky": Methods for Searching Electronically Stored Business Information

  Article published in DM Direct Newsletter
August 12, 2005 Issue
  By Brian Schlosser

A few years ago nobody paid much attention to electronic searching. There wasn't all that much to search for then. Today is very different. So much information has shifted to electronic form that searching is a huge issue. The cost of storage has plummeted and done away with most of the incentive to delete or, even archive, older information. Information technology professionals still try to enforce good storage discipline on users but, users make a strong case that the old information has compelling business value. If they could only find it when they need it.

Developers have long understood that there are many different ways to go about a search. Database specialists used to brag about how few calls it would take them to get a specific record and speed and storage efficiency drove search decisions. Today, search questions have come out of the realm of the developer and are often addressed by users. Tools for search abound, and users can select the ones that best fit their needs and tailor the tool by constructing search criteria. Twenty years ago no user would think of performing a Boolean search. Today, users perform them every day on publicly available search tools even if they don't know the technical term for what they are doing. This has given users great freedom and power; however, many do not understand that there are different types of search that solve different types of problems. As a result people are often selecting the wrong tool to solve important business problems and getting results that do not meet their needs.

Types of Search

It is only natural to start by thinking about the tools available to address users' search needs. It is also a mistake to start by examining those tools. If users start by looking at the search tools that are available, they risk missing a fundamental truth: Not all search tasks are the same. While it seems pretty obvious, we often don't behave as though we understand it. We readily allow tools to define the nature of the information that we will search and the type of result we get. In the end, we have to use what is available but understanding the nature of our information needs can help us make better choices and understand the limitations of the tools we choose.


Usually, the question that leads us to search information has a clearly defined answer, and we know what the answer will look like before we start the search. We want to know the capital of Uzbekistan or we want a copy of the sales presentation that we did in May. We do not know the exact answer, but the satisfactory result will be easily recognizable and we can structure the query based on keywords that are known to us at the outset. If we structure our query well, the answer will be a bull's-eye or close to it. Web search engines, e-mail search tools and desktop search tools often favor searches that address these kinds of questions. In fact, one popular Web search tool offers a button entitled "I'm Feeling Lucky" that bypasses the search list and takes you straight to their top-ranked result. If you have a nice, tight question to answer this might be the way to go.

Bull's-eye searches can be frustrating too. They don't work very well when one word has multiple meanings. Too many search hits can overwhelm the searcher, and twenty or so hits displayed per page can make it very difficult to sort through a large pool of responsive information. Also, the results of your search may cause you to modify your search criteria and then you have to plow through many of the same false hits that you evaluated on the first search all over again.

Bull's-eye searching is improving. Tool providers are continuously changing their ranking strategy adding various strategies for improving the organization of their results. Some questions, however, resist the bull's-eye approach and require a different approach to searching.

Help Me Find What I Wasn't Looking For

Sometimes it is hard to know what information will provide the solution to a problem. A significant number of business tasks require information review but do not lend themselves to a bull's-eye style search. Imagine the HR professional charged with examining several gigabytes of e-mail for evidence that employees inappropriately distributed confidential healthcare information. The possible number of keywords to search could be staggering. The results to evaluate could go on forever. The information might not have been labeled as confidential or could have been buried in the bodies of otherwise innocuous e-mails. A bull's-eye search will merely help to do the wrong kind of search faster.

Similar search needs occur regularly in business. Compliance regulations demand that people know what is going on in their business and can be so broad as to make it difficult to use a bull's-eye search. Fraud investigators can be virtually assured that the perpetrators will use code words and other means of hiding their intent. Concept analysis technology can help solve these problems by indexing and searching information based on important words. For example, looking for words closely associated with what you are after rather than just the specific word you entered. Unfortunately, concept searches often don't get to a viable result because of the difficulty of reviewing page after page of results when there is a low probability that everything you are after will appear in the highly ranked search hits.

Clustering techniques that group search hits into folders are a help if the folders are dynamically created by the tool. They keep related documents together but, they leave you with a few problems:

  • It is hard to get a global view of the pattern of your hits. (It's like exploring a maze by walking through it versus viewing it from above.)
  • You end up looking at lists of lists which isn't very productive.
  • You are still limited to a small number of hits displayed on the screen at one time.
  • It is hard to reorient the folder structure based on information you uncover in your research. In fact, it is impossible with most tools.

I Can See for Files and Files ...

In order to accomplish the kind of search that finds what you were not looking for, you need to have perspective. You need to be able to orient yourself to a large portion of the search hits returned at a glance. I'm not talking about the 10 to 30 hits per page returned from list-based tools that I previously described. I mean a 1,000 or 2,000 hits laid out on your screen in a way that allows you to see the major themes contained within the hits and drill down into the specific information contained there without your hand leaving your mouse and without having to click through dozens of screens.

Many people find it hard to believe that you can accomplish what I just described on a single screen. You can, using data visualization technology. The technology allows you to view and act on huge amounts of data on a desktop computer screen or a laptop with a reasonably large screen. The interface can be surprisingly easy to use and you can exceed your previous levels of productivity within a few minutes of using a well-constructed visualization tool. The look and feel of a quality visualization tool can vary widely by use and by vendor but, there are a few criteria to consider to ensure that you find the right tool for your particular use:

  • Can you act directly on the visualization? A good visualization tool will let you make decisions about the information you are evaluating such as including or excluding them from the result or marking them for further action by pointing and clicking the visualization.
  • Does the tool allow you to work with all the different types of information you need? An example is that a fraud investigator might want to review emails, attachments and loose file from a server in one set of data
  • Does the tool allow you to easily coordinate the results of many people reviewing information and bring the result together at the end? Law firms need to do this for large electronic discovery projects
  • Will the tool give you the same result twice given the same set of data? Compliance officers, litigators and government regulators are very concerned about this.
  • If you find an important document, concept, or term can you rearrange the visualization to show how all the other information relates to what you found? This ability to dynamically change your perspective of the information allows you to move much more quickly and accurately through your task. Imagine an HR professional that comes across an inappropriately distributed document and wants to find out if any more went out regarding the same topic.

Some data visualization tools will allow you to do all of these things and some will not. If the tool is well constructed and provides the features you need for the task at hand, you will be much more successful in performing complex searches in which you don't start out knowing exactly what a "bull's-eye" will look like. You will avoid the frustration of applying the wrong tool to the project and you will likely save your company or firm an awful lot of time and money.

Not all searches are created equal. Some business problems lend themselves to creating a very specific set of search criteria and recognizing a bull's-eye result. Unfortunately, many business problems are less well defined. Using a bull's-eye search to solve a problem that involves a lot of data and large numbers of search hits to review can lead excess work and costly mistakes. Several technologies address parts of the challenge and they are now being brought together into tools that combine search with concept analysis, clustering and data visualization to simplify complex search and review tasks. The best of these tools can save organizations millions of dollars in labor, legal exposure and lost opportunities.


For more information on related topics visit the following related portals...
Content Management, Data Visualization and Strategic Intelligence.

Brian Schlosser is chief executive officer of Attenex Corp. with more than 20 years of sales and business leadership experience. Founded in 2001 and headquartered in Seattle, Attenex Corporation develops and markets software to help corporations, law firms, legal service providers and government agencies improve the efficiency of processing and reviewing electronic documents. Attenex software combines patented document analysis and visualization technology with a powerful processing engine to accelerate the identification of relevant information in its native format. Leading legal service providers and consulting organizations offer Attenex software to dramatically reduce the total cost of discovery for their clients.For more information about the company and its products, call (206) 373-6565 or visit www.attenex.com.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

KOM Networks Archiving and Data Storage
KOM Networks, a leader in archiving and data storage for more that 37 years, offers organizations a cost effective means to secure their growing data stores.

Click here to advertise in this space

E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.