Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge Integrity:
Characterization Taxonomies

  Column published in DMReview.com
September 1, 2005
  By David Loshin

In last month's column, we looked at a problem associated with the standardized use of reference data for information exchange. In that column, we explored how a business application's use of what was expected to be standardized data varied slightly from its original meaning, leading to a lack of synchronization across the exchange enterprise. In this column, we explore a different aspect of the same problem - the definition and use of data values grouped within semantic hierarchies known as taxonomies.

A natural application of a business intelligence (BI) application is to attempt to understand individual behavior based on an enveloping classification scheme. A classic example related to the media industry is characterizing radio or television ratings by sex and age groups, such as "most popular with males aged 18 to 35." Here, there is a multidimensional classification scheme - first the population is divided by gender, then each gender group is broken into age ranges. In turn, performance is assessed based on the ratings for specific "products" within what is considered to be a simple taxonomy.

A taxonomy is a hierarchical means for classification organized according to a predefined system. The system should provide a natural dissection of the data elements into hierarchical groupings, with each set of subgroups unambiguously distinct from the others, yet all subgroups covering all the possibilities. The implication is that a reasonable taxonomy will provide clarity when slicing and dicing, but only when care is taken to abide by the basic rules, which we will discuss.

Much of what people encounter in everyday life is related to a hierarchical taxonomy, and many BI applications rely on this for analysis. There are different kinds of taxonomies used in different kinds of analyses such as: geographical, order-based and product-based.

A geographical taxonomy provides a hierarchy defined by encompassing location. For example, a street address is located on a street, which is in a neighborhood, which is in a town, which is in a county, which is in a state, which is in a country. Individual behavior is categorized based on location, and aggregation is performed along the encompassing boundaries. An example of an order-based taxonomy may be the internal structure of a company - lines of business incorporate divisions, which contain groups made up of individuals. Aggregation may take the form of productivity and performance metrics, grouped within lines of business or measure an individual's productivity. Product-based hierarchies might be aligned by product class (e.g., automotive supplies), then product category (e.g., air fresheners), then product name ("Pine-Fresh"). A BI application would measure revenues, margins and profitability across the product hierarchy.

Taxonomies are great for defining categorization, especially in OLAP-style analysis. In retrospect, taxonomies themselves represent business knowledge as reference data, and all data that is related to that reference data is affected by the quality criteria assigned to the enumeration of codes, the mapped values, the number of levels within the hierarchy and the methods for insertion into the hierarchy.

Despite the apparent simplicity of a value hierarchy, problems can emerge without proper attention to these two basic concepts:

1. At each level in the hierarchy, there should be an unambiguous distinction between the values. This means that there should not be any overlaps in definition (or in the values collected at lower levels of the taxonomy), nor should there be any gaps (i.e., missing values within the level).

2. There must be a coordinated approach to modifying the taxonomy. In other words, when it is clear that there may be gaps in the value set or that there are values that imply the introduction of new levels in the hierarchy, there must be a "political" framework in which the new elements are proposed, debated, modified and approved to maintain synchrony.

Not abiding by these rules will have some obvious negative consequences, mostly in allowing the de-evolution of the value of the hierarchy. The existence of gaps or overlaps in the value sets leads to difficulty in presentation of results. For example, in a pivot table, how are elements aggregated when the same values appear under multiple subgroupings? Similarly, a lack of coordination regarding the injection of new information into the hierarchy allows for semantic dissonance, as individual participants begin to overload values with meanings that are not agreed to by the rest of the constituency.

At first glance, a simple coded mapping of values within a two- or three-level hierarchy appears to be relatively simple, but I am sure that every reader has a story that describes the complexity of taxonomy management. If you'd like to share your story, e-mail me (loshin@knowledge-integrity.com), and I will relate your experiences in future columns!


For more information on related topics visit the following related portals...
Business Intelligence and Knowledge Mgmt..

David Loshin is the president of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Enterprise Knowledge Management - The Data Quality Approach (Morgan Kaufmann, 2001) and Business Intelligence - The Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. Loshin may be reached at loshin@knowledge-integrity.com.

Solutions Marketplace
Provided by IndustryBrains

Extract Greater Value from Your Data
Your data is the engine that drives your business. Let the Experts at IBM help you put the information you need right at your fingertips. Click to register for more information.

IBM Master Data Management
IBM Master Data Management provides a single view of critical information by bringing together all core components required for a successful enterprise data management strategy. Learn more!

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Eden - Industry Leading Help Desk Software
Offering cutting edge Help Desk and IT Service Management software solutions, built to leverage your existing IT infrastructure

File Replication and Web Publishing - RepliWeb
Cross-platform peer-to-peer file replication, content synchronization and one-to-many file distribution solutions enabling content delivery. Replace site server publishing.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.