-
Marketplace
-
Channel Resources
Articles from this Site
A Statistical Stocking Stuffer for the Holidays
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
New Interfaces Integrated into MEGA International Modeling Software
Which would be a better choice of classes for career growth in data warehousing - ETL architecture or dimensional modeling?
CA ERwin Data Modeler Designs Tool to Integrate with Microsoft Visual Studio 2005 Team Edition for Database Professionals
White Papers
Best Practices: Eight Tips for Improving Your Professional Services Business
Metadata Management for Enterprise Applications
UML for C#
PHP Code Design
Domain-Specific Modeling: 10x Faster than UML
Web Seminars
Books
Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models
Data Mining Using SAS Applications
Data Mining: Concepts, Models, Methods and Algorithms
Ontology and Taxonomy
Design Challenge
Like many concepts in our field, ontology and taxonomy have more than one definition. In addition, these definitions can sometimes contain vague or ambiguous phrases.
The Challenge
In simple terms and using an example if possible, how would you define ontology and taxonomy, and how do they differ?
The Response
Gordon Everest, professor emeritus, provides this succinct explanation: The synonym for ontology would be model (of something in data), and the synonym for taxonomy would be tree. Robert Ruffin, data architect, offers this example: The taxonomy of a tiger is that it is a subtype of cat (classification), but an ontological description may be that the tiger has a relationship to Asia, the continent on which it lives.
An ontology is a formal way of organizing information. It includes putting things into categories and relating these categories with each other. The most quoted definition of ontology is from Tom Gruber: Explicit specification of a conceptualization. In other words, an ontology is a model - a model being a simplification of something complex in our environment using a standard set of symbols. Kinds of ontologies include but are not limited to glossaries, data dictionaries and, yes, even data models.
Steve Turnock, database engineer, says an ontology is a representation of a body of knowledge. Ontology is closely related to semantics, the primary distinction being that ontology concerns itself with the organization of knowledge once you know what it means. The body of knowledge can include both class and instance. We often find that one models class is another models instance. For example, wine is an instance of the class liquid, and zinfandel is an instance of the class wine.
Dave Hay, industry expert, adds, In the modern world, the word is used to describe a list of the things that exist in an organization or an industry. Or, more specifically, it refers to the list of terms identifying those things. This includes a defined syntax and approach to specifying the relationship among those things. Ontology was originally the Greek word for the philosophical study of that which exists. It turns out that identifying exactly what exists in our world is trickier than you might think.
A taxonomy is an ontology in the form of a hierarchy. Steve Turnock provides this example: The most commonly known of these is the biological classification of the structure of life itself. This is described in terms of phylum, family, genus, species and so on. Nandi Iyer, solutions architect, adds a data twist to the definition: Taxonomies are things of interest arranged in a hierarchical structure, typically in a supertype/subtype relationship.
Whereas ontologies can have any type of relationship between categories, in a taxonomy there can only be hierarchies. A hierarchy is when a child only has a single parent and a parent can contain one or more children. If a child can have more than one parent (the term is poly-hierarchy), than the child is typically repeated for each parent. Examples of kinds of ontologies are product categorizations, supertype/subtype relationships on a relational data model and dimensional hierarchies on a dimensional data model.
Gordon Everest suggests taxonomy best practices: Given a population of some things, we build a taxonomy to help us classify the members of the population into groups and subgroups within subgroups, etc. In a good taxonomy, every sibling set under a parent node (class) enables us to divide the parent population into mutually exclusive and collectively exhaustive subsets.
Cheryl Rimes, senior business analyst, offers a health care example developed by the International Statistical Classification of Diseases and Related Health Problems (ICD). ICD provides a taxonomy to classify diseases and a wide variety of signs, symptoms and causes. Dave Hay provides this example and also raises a challenge: The most famous of these is the Dewey decimal system for cataloging library books. It starts out with 10 major categories, and subcategories are defined by tacking digits to the end of the number. This was very useful for locating books that could physically be stored in only one place. It is less useful as a way to catalog a body of knowledge. Where do you put a book about the history of mathematics in the Islamic world? History? Mathematics? Religion? This points out the problem with most taxonomies. Most of our knowledge is not hierarchical. To cram a body of knowledge into a hierarchical structure leads to all kinds of problems.
If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at http://www.stevehoberman.com/. If you have a challenge you would like our group to tackle, please email me a description of the scenario at mailto:me@stevehoberman.com
Steve's publishing company, Technics Publications, recently published the first edition of the DAMA Dictionary of Data Management, a CD-ROM containing over 800 terms spanning 40 topics, including finance and accounting, knowledge management, architecture, data modeling, XML and analytics. You can order a copy from the DMReview.com Bookstore at www.dmreview.com/books.
Steve Hoberman has worked as a business intelligence and data management practitioner and trainer since 1990. He is a Certified Business Intelligence Professional (CBIP), having achieved mastery level certification in data analysis and design. He is a popular and frequent presenter at industry conferences, both nationally and internationally. Hoberman is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler's Workbench and Data Modeling Made Simple (available for purchase through the DM Review bookstore). He is the founder of the Design Challenges group, inventor of the Data Model Scorecard and a recognized innovator and thought leader in the field of data modeling. He can be reached at me@stevehoberman.com.
Graeme Simsion's latest book is out! Data Modeling Theory and Practice. Here's a link where you can read more about the book and purchase it at a discounted price.
For more information on related topics, visit the following channels:


