Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Analytic Applications
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab
Monthly Product Guides
Buyer's Guide

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Dynamic Classification - What it is and Why Every Information-Intensive Organization Needs it

  Article published in DM Direct Newsletter
May 16, 2003 Issue
 
  By Claude Vogel

Today's powerful search technology enables us to rapidly sift through vast quantities of data stored in all kind of formats. This is an astoundingly powerful process when we know what it is we are looking for. But all too often, we don't. And sometimes it is the unplanned, in-the-moment pursuit of hunches that result in the greatest discoveries.

Next generation search technology will directly address this particular component of the search process, the "abductive" process. Particularly human, abduction is a process of using "informed" intuition to make informed decisions, based on real-time analysis of all the information at hand. Next generation search is all about empowering this process and providing tools that support the problem-solving capabilities of the end user.

This is now possible though a groundbreaking new technology called Dynamic Classification. Dynamic classification specifically empowers the creative, intuitive thought processes of knowledge workers. It enables individuals to save and share information discoveries - or even the deductive paths that led to these discoveries. It actively supports collaboration and the natural creation of like-minded communities.

Dynamic classification works by exposing and classifying the real-time results of search steps as they occur. It shortcuts the journey between "search" and "find" by exposing the entire underlying information matrix all at once. It enables more rapid, fluid movement through an information structure that may be manipulated, corrected, blended and customized at will to suit individual thought processes and conclusions.

Dynamic Classification Defined

Dynamic classification enables users to dynamically view all possible categories of information together with the ability and tools to view, cross-correlate, mix and match the categories at will. Users gain the freedom and creativity to decide for themselves how they would like to organize their knowledge space, classify information in that space, and apply the logic they have uniquely created to classify new, incoming information as it arrives.

In order to understand how and why dynamic classification works, a closer look at search is required. There is still a dramatic difference between the way we mentally perform search and the way that software supports and presents this process on a computer screen.

Mentally, search is a bumpy, iterative process. It proceeds through a series of refinement loops filled with unplanned jumps and odd mental leaps. The journey is also data-intensive, involving a highly dynamic process of interconnecting data dimensions moving about on a grid of evolving ideas and patterns - kind of like a perpetual motion Rubik's cube. The information on this grid is extremely important to you, yet much of it remains hidden from view. Today's search technology returns the results of your process but not necessarily the road map that got you there. Dynamic classification takes the revolutionary step of exposing the underlying matrix as well as the footprints of your intuitive journey.

Consider this simple example. Let's say that you are looking for an apartment in Carlsbad, CA. You might type Carlsbad_Apartment, a seemingly straightforward query into the search box. An hour later, you decide instead upon a studio in nearby Oceanside and are happy with this result. When you began your search, Carlsbad_Apartment didn't appear to have anything to do with Studio_Oceanside. Yet during the find process, you made thousands of tiny, sequential decisions and trade-offs based upon the unsatisfactory results of your initial query (Carlsbad_Apartment), eventually leading you to decide upon a studio in Oceanside. While this journey would have been impossible to anticipate and design from a technical viewpoint, it is perfectly logical to you.

What dynamic classification would do in this case, is to shortcut the process between the initial inquiry and the satisfying result. It does this by showing the individual the "big picture" - albeit in an extremely organized view. It returns all the relevant data points, logically and consistently organized into categories, in order to present the individual with all relevant information. It provides information at a glance that encourages an individual to investigate and drill down into areas in which he hadn't initially realized there was an association; in other words, supporting a more highly informed, real-time, analytic problem-solving course of action.

Finding is a Dynamic Process

Consider the apartment example again. When you began your search, the system returned top-level results based on the two data dimensions you requested; geography and a specific dimension of housing. What you didn't see were all the possible categories that the system considered relevant to your search, for example:

Locations

            Carlsbad

            San Diego

            Encinitas

            Oceanside

 

Specific locations

            Ocean-facing

            Beach Access

            Near Highway

            Downtown

 

Rental Housing

            Apartments:

                        Studio

                                     Furnished/Unfurnished

                        One-bedroom

                                     Furnished/Unfurnished

                        Two-bedroom

                                     Furnished/Unfurnished

            Houses:

Cottage

                        Furnished/Unfurnished

            Bungalow

                        Furnished/Unfurnished

 

Leasing Arrangement

                        Month to month

                        Yearly

                        Sublease

The list goes on. But imagine how much faster and more satisfying the search process would have been if you could have seen all of your lodging options at a glance. You would have swiftly moved among the folders, browsing the entire hierarchy, viewing the results of different combinations and then moving on or refining the groupings of data as your interests evolved.

You would have effectively moved from a linear, fixed and iterative process into a shifting, multidimensional space. Consider again the contrast between the mental and technical search processes. Our brains have learned to keep up and maintain order even as our own perspective twists and combines assumptions one after another. We take this mental ontology (each individual's organization of the basic components of his/her world) for granted. But the only way we can replicate this mental search ability in a technical search environment is extremely complex. Organizations must isolate ontologies and encapsulate them in a semantic foundation. Specifically, organizations must:

  • Tag, categorize and organize data as it arrives, building a meticulously indexed foundation of information
  • Leverage this indexed, stable foundation of knowledge through dynamic classification at query time.

Getting Started

Incoming data should be tagged, categorized and made available for search processes in a single, comprehensive pass. In this way, all of the complex administrative, logistical, compliance and security processes can be handled at once. The possibility of security breaches is reduced because there is only one versus multiple entry points. As well, the one-pass method ensures greater consistency at all levels. It is simply more efficient and the end result is a single, uniformly and richly indexed repository of data.

Taxonomies are also an extremely important part of the foundation-building process. Taxonomies establish a single, consistent and uniformly understood structure for communication. They represent a stable way of defining the world because they reflect a genus to species relationship. Taxonomies reflect a hierarchical view of the world or, in a business context, how organizations choose to define themselves

Yet, while taxonomies are a critical component of any organization's information asset management operations, they can be difficult and time-consuming to create. They require the services of highly skilled professionals to design and maintain. Different companies have different ways of addressing this challenge. One way is to adopt standardized, commercial taxonomies from third party vendors that can be "plugged" in to the categorization process. Another is to hire or outsource professional assistance in taxonomy development.

Once an organization's data assets are comprehensively and meticulously categorized using taxonomies as the organizational basis, dynamic classification can begin.

Extending Search Through Dynamic Classification

The best way to understand how dynamic classification works is to see it in action. The following example describes what the end user would see as he or she engages in the process of dynamic classification.

Let's say that that an intelligence officer user is looking for documents on typical terrorist actions based in different geographical areas. She would begin her investigation in the usual way, typing a keyword into a search box, say: bomb truck. She would immediately see a hierarchical grouping of folders in the middle of his screen. This grouping represents a classification that has been built dynamically based on all information at hand at that moment, in specific response to her query. The screen would display the all documents relevant to the search request - organized along the most representative intersections of categories, versus displaying every single instance of the terms occurring in documents regardless of context, or worse, someone else's predetermined idea of how and in what instance a document should be displayed.

Now the worker can browse through any of the folders and sub- folders populated in the classification as desired, viewing everything connected - however broadly or specifically - to that query and investigate any perspective that interests her. Let's say that the defense researcher decides to look a bit further into current occurrences of the subject "bomb truck." She will immediately see that there have been several occurrences in Africa along with other kinds of bombings, e.g. using incendiary bombs. From a broader perspective, it will also become clear that other types of terrorist actions are discussed in the same context, but differ according to location; Europe being linked to counterterrorism issues, Middle East and Asian to chemical weapons. More details are just one click away. The user can easily drill into these subject areas to explore Iraq's involvement with chemical bombs and the Philippine's guerilla links to chemical ordnance. Between the ability to zero in on a particular bomb truck incident or slide to related topics in step with deductive thinking, the user gains understanding of all the materials available in just seconds. And, because she is provided with a global view of absolutely everything related to her initial search query, she can immediately see and more efficiently begin pursuit of the areas most likely to help solve specific problems.

If a particular classification or grouping of data is found useful, it could be saved for reuse or sharing with colleagues. Users can therefore turn the results of their investigative search processes into persistent objects, assets that are available when needed.

The end-user's search box thus becomes a dynamic, changing search bar populated with the dynamic classifications the user found most useful or interesting. The user's perspective on the world or peculiarities of research interest or style are dignified and saved as persistent objects in what becomes a highly personal, organic, desktop portal. This ability contrasts sharply with a typical corporate portal, in which the end-user views a predigested compilation of folders populated by documents created through someone else's logic.

Why Dynamic Classification is Important

There can be little doubt that individuals prefer to view, organize and access the information most useful to them in their own particular way. They want to create, in effect, organic portals residing on their desktop that access centrally stored and controlled repositories of data. Providing individuals with this power also benefits the organization.

First, classification fosters the creative, intuitive thought processes of your knowledge workers, leveraging the investments you have already made in these employees. Second, classification provides a means of valuing and supporting the individual efforts and small iterative discoveries that might otherwise have been overlooked. And finally, enabling individuals to save and share information discoveries or even the deductive path that led them to these discoveries fosters collaboration and the natural creation of like-minded communities. Obviously such collaboration reduces the possibility of redundant efforts and enables employees to easily and dynamically share best practice and evolving theory information. But perhaps most importantly, dynamic classification dramatically improves the speed, success and satisfaction of search.

Consider the following specific benefits:

Capture Accidental Knowledge in Addition to Existing Knowledge

Categorization is an extremely effective tool for capturing "existing" knowledge. Classification, on the other hand, is extremely effective at capturing "accidental" knowledge, amplifying the possibility of those "Eureka" discoveries that we all seek.

Individuals want very specific information organized in a way that makes unique sense to them within the context of the problem they are trying to solve. For example, an economist, an intelligence office and a munitions manufacturer would each view the word "bombs" from a very different perspective. Even if the keyword is commonly understood and the documents containing that term are consistently cited, the kind of information these three different people will be looking for or the way they might want to organization found information is likely to be very different.

Each end user isn't constrained to a view created by anyone else's logic. He or she can create their own view - or classification - based on their own unique perspective.

Let's say that the same intelligence officer is investigating new bomb technologies. But when she views the dynamic classification resulting from her initial query on "bomb truck," she sees a folder for biological warfare. Interested, she clicks on that folder, discovering a subcategory on chemical bombs. Wondering who might be doing what with such weapons, she crosschecks this classification against the geography taxonomy and discovers that that the Middle East folder has a large number of documents containing references to chemical bombs. Using dynamic classification tools, she could save the multilevel drill-down process as well as the results into a new, uniquely titled category. Or optionally, cross-check those results against another taxonomy and then view, save or drill down further from the results of that crosscheck. This find process - as enlightening as it clearly is with its underlying matrix of categories and information - has previously been hidden from view.

It's Future-Proof

You can't predict how or why any particular bit of information will be useful. Even if you could, it wouldn't be predictive of how it may be used in the future. This is why information should be indexed at the corporate level - for consistency and management of assets - but classified at the individual or community level.

Classification allows categories to be built dynamically and at will so there's no need to worry about how a document should be categorized, or worse, categorize it only based on previous or existing knowledge.

This strategy makes a lot more sense, as it is impossible to predict how and where information will be used. For example, smallpox has been tagged within documents for years but it has not typically shown up within folders because it has only recently become a concern. Dynamic classification allows end users to search for something such as smallpox and then cross-check it against country, bioterrorism threats, specific terrorist organizations, etc.

Lightning Fast and Dynamic

When properly managed, underlying search categorization processes establish a foundation for dynamic processing. Tagged, categorized and taxonomically organized information has already accounted for the contents of each document. Each document's semantic signature has been stored and weights have been assigned to the concepts in each document. In this way, at query time, the technology is already working with an entity that is extremely structured. Classification happens in second or minutes versus days or weeks.

Summary

Dynamic classification provides a solution to the paradoxical quest facing corporations: how to manage burgeoning information assets while, at the same time, provide more satisfying, efficient problem- solving tools for its knowledge workers. It enables the generation of organic and truly personal portals, reflective of an automatic, interactive and dynamic feedback loop between the individual and all possible data. Building on the foundation established by search and content categorization technology, dynamic classification enables a new level of decision making.

...............................................................................

For more information on related topics visit the following related portals...
Content Management.

Dr. Claude Vogel is chief scientist for Convera, a provider of enterprise search, retrieval and categorization solutions. At Convera, he is broadening the core indexing, categorization and dynamic classification capabilities of Convera?s RetrievalWare search, retrieval and categorization technology and other vertical market solutions. Dr. Vogel engages in ongoing research and has published more than 70 pieces, including nine books on the subjects of software engineering, cognitive design, social organizations and semiotics. You can reach him at cvogel@convera.com.

Solutions Marketplace
Provided by IndustryBrains

Bowne Global Solutions: Language Services
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets

Award-Winning Database Administration Tools
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!

NEW Glasshouse White Paper from ADIC
Learn to integrate disk into your backup system; evaluate real benefits and costs of different disk backup approaches; choose between disk arrays and virtual tape libraries; and build long-term disaster recovery protection into a disk backup system.

Test Drive the Standard in Data Protection
Double-Take is more affordable than synchronous mirroring and enables you to recover from an outage more quickly than tape backup. Based upon the Northeast blackout and the west coast wild fires, can you afford to be without it?

Click here to advertise in this space


E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.