Published in DM Review Online in August 2005.|
Printed from DMReview.com
Knowledge: The Essence of Meta Data: Organization and Classification Begins at Homeby R. Todd Stephens, Ph.D.
One weekend the house manager decided that the basement at the Stephens home was out of control. The basement existed for a single purpose for well over ten years: junk collection. Now one really knows what existed down there, nor did we believe that we could possibly own all of this stuff. In any case, the senior house manager (Mrs. Stephens) decided that we were no longer going live in this kind of environment. The project was laid out with a specific timeline, a budget of $500 and the identification of resources. The design of the storage shelves would be simple and made of 2x4s for the frames and 1x4s for the shelves. These shelves would span three walls and covering 38 feet of space. Our first stop was Home Depot to purchase the wood, nails and a new saw blade since the other one was quit dull. The process of construction only took about eight hours with the help of our six year old. (Probably would only take six without the precious help).
The following day, after many wasted pleas to meet my tee time, we headed to Wal-Mart to purchase the storage crates. They had green ones, blue ones and clear ones; ones with wheels and those without. All shapes and sizes were available, but we settled on a couple ones that seemed to fit the type of materials we needed to store. On the way home, we stopped at Staples to pick up a labeling machine which would allow us to place specific classifications on each box, crate, folder and cabinet.
Back home, the process of cataloging began by noticing that we had three initial high-level classifications: Christmas, travel and computer parts. As we moved to organize the Christmas objects, three additional sub-classes emerged: tree, china and decorative. Three crates soon became four when the ornaments wouldn't fit into a single crate. Interesting, the tree ornaments were eventually super sub-classed into glass, traditional and kids. Not sure where the yarn angels go but I slide one to the dog and proceeded to create a new class called "throw away." This process continued with travel objects (sorted by geography), and one would think that the computer equipment could easily be divided into various sub-classes. Unfortunately, the house manager had to attend to the two year old, and I was left in charge of the home taxonomy. No problem here, cram everything into a single box and what ever doesn't fit is thrown away. The reason why I was "impeached" as the house manager many years ago. Upon returning, we continued with the rest of the objects and created, grouped, divided and collapsed the different classes. Everything is in its place, easy to find and most of the real junk tossed out. What does all of this mean to us in the meta data world? Basically, the events provide us with some insight into how to build taxonomies, at least on a small scale. The following paragraphs describe some of the lessons learned.
Without a doubt, the most important decision was to bring in an expert that really cared about the results. The house manager was the person for this job; her personality is a true "A" type that organizes everything. She ensured that we stayed focused and that the end result was usable by any member of the team. She didn't get lost in building something enormous in size with custom made cabinets. The shelves were simple and easy to build; we focused on delivering the value that the project demanded. The skills required included foresight to see what assets we had and the very best way to organize them within the environment we were working in.
Figure 1: Poor Drawing of Home Storage Taxonomy
Realize that organization and taxonomies take time to build. Two weekends were set aside in order to complete the application build (shelves), test (check the weight capability and size) and data load. Could the shelves have been built quicker? Of course, but the end product would not have been as usable and organized. The true value was the evolution of the taxonomy as described in the earlier sections. Could we have gone to "Shelves-Are-Us" and purchased preprinted labels and containers? Yes, but the majority of labels would have been tossed out, representing a waste of time and money. More importantly, would that taxonomy be as useful to the end user as the one we developed? The taxonomy was not producer based but rather consumer based. Consumer-based taxonomies are emerging in the technical world and are better known as Folksonomies. A Folksonomy is a flat namespace where there is no hierarchy and no specific parent-child relationships [Mathes, 2004]. Users can classify their content with simple keywords such as customer, order, schema, service, building, etc. This kind of classification works great when the user has a limited set of assets they are interested in. Depending on your job description, organizing technical assets in a Folksonomy works great since the number of assets is usually fairly small. For example, the organization may create a taxonomy structure for a customer name service as follows:
For a developer within the enterprise services organization that works within the customer profile segment, they may file the service under "Name." Within their context or job, this classification makes perfect sense. Currently, users of the repository already have a simple classification system outside of the repository in the favorite's utility inside the browser. Unfortunately, that information cannot be accessed by the repository group in order to build additional classification methods. Technorati is an example of an online service utilizing folksonomy technology. They provide a search engine and classification taxonomy across the collection of blogs. A Weblog or blog is personal journal. Weblogs express as many different subjects and opinions as there are people writing them. Some blogs are highly influential and have enormous readership, while others are primarily intended for a close circle of family and friends. Technorati provides a meta data search engine based on the RSS feeds which is the method of publication for most blogs. The reason I bring this up in the context of this article is how this application has altered my classification or taxonomy system for my own blog. Recently, I went back and reclassified all 100 entries based on the system used within Technorati. For example, I was classifying my thoughts on the future of technology under "The Future of Technology" which sounds reasonable. However, everyone else was using globalization, future, technology or innovation.
The final lesson is knowing when to expand or contract the taxonomy. Taxonomies require the skills of foresight, hindsight and validation of construction. Was the process of taxonomy expansion a result of some large-scale education? No, the reality is that we weighed the amount of time to search and the number of crates. If we labeled the three crates with the same tag "Christmas," then the amount of time required to search in the creates was far more than organizing the Christmas goodies into three sub-classes where we would only need to search a single crate. The amount of time to label and organize versus the amount of time to search was our definitive taxonomy strategy.
The world of meta data continuous to evolve where technologies like taxonomies, Folksonomies, and many others are changing they way the world looks at meta data. From a database technology, to an enterprise architecture, to an Internet innovation, meta data continues to provide value everywhere.
R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.
Copyright 2007, SourceMedia and DM Review.