FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Ontology and Taxonomy

Design Challenge

Like many concepts in our field, ontology and taxonomy have more than one definition. In addition, these definitions can sometimes contain vague or ambiguous phrases.

The Challenge

In simple terms and using an example if possible, how would you define ontology and taxonomy, and how do they differ?

The Response

Gordon Everest, professor emeritus, provides this succinct explanation: “The synonym for ontology would be model (of something in data), and the synonym for taxonomy would be tree.” Robert Ruffin, data architect, offers this example: “The taxonomy of a tiger is that it is a subtype of cat (classification), but an ontological description may be that the tiger has a relationship to Asia, the continent on which it lives.”

An ontology is a formal way of organizing information. It includes putting things into categories and relating these categories with each other. The most quoted definition of ontology is from Tom Gruber: “Explicit specification of a conceptualization.” In other words, an ontology is a model - a model being a simplification of something complex in our environment using a standard set of symbols. Kinds of ontologies include but are not limited to glossaries, data dictionaries and, yes, even data models.

Steve Turnock, database engineer, says an ontology is a representation of a body of knowledge. “Ontology is closely related to semantics, the primary distinction being that ontology concerns itself with the organization of knowledge once you know what it means.” The body of knowledge can include both class and instance. We often find that one model’s class is another model’s instance. For example, wine is an instance of the class liquid, and zinfandel is an instance of the class wine.

Dave Hay, industry expert, adds, “In the modern world, the word is used to describe a list of the things that exist in an organization or an industry. Or, more specifically, it refers to the list of terms identifying those things. This includes a defined syntax and approach to specifying the relationship among those things. Ontology was originally the Greek word for the philosophical study of ‘that which exists.’ It turns out that identifying exactly what exists in our world is trickier than you might think.”

A taxonomy is an ontology in the form of a hierarchy. Steve Turnock provides this example: “The most commonly known of these is the biological classification of the structure of life itself. This is described in terms of phylum, family, genus, species and so on.” Nandi Iyer, solutions architect, adds a data twist to the definition: “Taxonomies are things of interest arranged in a hierarchical structure, typically in a supertype/subtype relationship.”

Whereas ontologies can have any type of relationship between categories, in a taxonomy there can only be hierarchies. A hierarchy is when a child only has a single parent and a parent can contain one or more children. If a child can have more than one parent (the term is poly-hierarchy), than the child is typically repeated for each parent. Examples of kinds of ontologies are product categorizations, supertype/subtype relationships on a relational data model and dimensional hierarchies on a dimensional data model.

Gordon Everest suggests taxonomy best practices: “Given a population of some things, we build a taxonomy to help us classify the members of the population into groups and subgroups within subgroups, etc. In a good taxonomy, every sibling set under a parent node (class) enables us to divide the parent population into mutually exclusive and collectively exhaustive subsets.”

Cheryl Rimes, senior business analyst, offers a health care example developed by the International Statistical Classification of Diseases and Related Health Problems (ICD). ICD provides a taxonomy to classify diseases and a wide variety of signs, symptoms and causes. Dave Hay provides this example and also raises a challenge: “The most famous of these is the Dewey decimal system for cataloging library books. It starts out with 10 major categories, and subcategories are defined by tacking digits to the end of the number. This was very useful for locating books that could physically be stored in only one place. It is less useful as a way to catalog a body of knowledge. Where do you put a book about the history of mathematics in the Islamic world? History? Mathematics? Religion? This points out the problem with most taxonomies. Most of our knowledge is not hierarchical. To cram a body of knowledge into a hierarchical structure leads to all kinds of problems.”

If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at http://www.stevehoberman.com/. If you have a challenge you would like our group to tackle, please email me a description of the scenario at mailto:me@stevehoberman.com

Steve's publishing company, Technics Publications, recently published the first edition of the DAMA Dictionary of Data Management, a CD-ROM containing over 800 terms spanning 40 topics, including finance and accounting, knowledge management, architecture, data modeling, XML and analytics. You can order a copy from the DMReview.com Bookstore at www.dmreview.com/books.


Steve Hoberman has worked as a business intelligence and data management practitioner and trainer since 1990. He is a Certified Business Intelligence Professional (CBIP), having achieved mastery level certification in data analysis and design. He is a popular and frequent presenter at industry conferences, both nationally and internationally.  Hoberman is a columnist and frequent contributor to industry publications, as well as the author of  Data Modeler's Workbench and Data Modeling Made Simple (available for purchase through the DM Review bookstore). He is the founder of the Design Challenges group, inventor of the Data Model Scorecard and a recognized innovator and thought leader in the field of data modeling. He can be reached at me@stevehoberman.com.

Graeme Simsion's latest book is out! Data Modeling Theory and Practice. Here's a link where you can read more about the book and purchase it at a discounted price.

For more information on related topics, visit the following channels:



Industry Vendors