Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge: The Essence of Meta Data:
Six Degrees of Separation of Our Assets

online columnist R. Todd Stephens, Ph.D.     Column published in DMReview.com
September 16, 2004
  By R. Todd Stephens, Ph.D.

Six degrees of separation is the theory that all of us can be connected to any other person on the planet through a chain of acquaintances that has no more than five intermediaries. The theory was first proposed in 1929 by the Hungarian writer Frigyes Karinthy in a short story called "Chains." In 1967, American sociologist Stanley Milgram devised a new way to test the theory, which he called "the small-world problem." He randomly selected people in the Midwest to send packages to a stranger located in Massachusetts. The senders knew the recipient's name, occupation and general location. They were instructed to send the package to a person they knew on a first-name basis who they thought was most likely, out of all their friends, to know the target personally. That person would do the same and so on, until the package was personally delivered to its target recipient. Although the participants expected the chain to include at least a hundred intermediaries, it only took (on average) between five and seven intermediaries to get each package delivered. Milgram's findings were published in Psychology Today and inspired the phrase "six degrees of separation."

In 2001, Duncan Watts, a professor at Columbia University, continued his own earlier research into the phenomenon and recreated Milgram's experiment on the Internet. Watts used an e-mail message as the "package" that needed to be delivered and, surprisingly, after reviewing the data collected by 48,000 senders and 19 targets (in 157 countries), Watts found that the average number of intermediaries was indeed, six. Watts' research, and the advent of the computer age, has opened up new areas of inquiry related to six degrees of separation in diverse areas of network theory such as power grid analysis, disease transmission, graph theory, corporate communication and computer circuitry. (Special thank you to What is.com for filling in the gaps.)

Let's bring it home to the corporation. We make the claim that there are millions of technical assets within the corporation. Obviously, the way in which we define "asset" can increase or decrease that number. Suppose we have 2,000 systems or applications within the corporation, with an average of 15 tables, 10 fields and 10 elements of meta data. This would generate 3 million data assets alone. Not to mention the relationships between assets, schemas, components, programs, interfaces, Web pages, metrics, business rules, etc. Is it any wonder we have so much trouble determining what we have, where it is, who uses it, when is it accessed and how you can access the same asset. Do you see where we are going with this logic? Yes, I could make the statement that any asset, yes, any asset that you select is only six degrees from another asset.

Let's test the theory on the movie industry. Can we find a relationship between Vivien Leigh from Gone with the Wind and Tobey Maguire of Spider Man fame?

Vivien Leigh was in Deep Blue Sea, The (1955) with Arthur (I) Hill
Arthur (I) Hill was in Amateur, The (1981) with Ed Lauter
Ed Lauter was in Seabiscuit (2003) with Tobey Maguire

Here is the link, try it yourself. http://oracleofbacon.org/oracle/star_links.html. The longest length I could find was four but I am sure there are longer ones. Kevin Bacon has an average of 2.946 for all of the 645,957 actors in the database. 13 of them actually require eight jumps but I challenge you to find one. Other than having fun, what does this say about our organization and the web of assets we have created?

The data warehouse provides an excellent application for impact analysis and our six degree test. Suppose we have a data warehouse that collects information from three to four sources and feeds a couple of data marts. See Figure 1.

Figure 1: Simple Example of Degree of Separation

In this example we can relate Customer_Name to CustName by a series of relationships (transformations).

Customer_Name from the CRM application is transformed (Transformation A) into CustomerName in the data warehouse.
CustomerName in the data warehouse is transformed (Transformation B) into CustName in the data mart.

In fact, ETL or field to field mappings are at the heart of impact analysis of a data warehouse. The problem, as with the movie database, is that it only contains a single type of relationship. (i.e., starring in a movie). What about actors that are related such as Kirk and Michael Douglas? How about marriage relationships such as Michael and Catherine Zeta-Jones? How about people that live on the same street in Hollywood or attend the same church? The magic of the degrees of separation package described at the beginning of this article was that all types of relationships were taken into account, not just family members or neighbors. The power wasn't in the detailed meta data but the diversity of relationships.

Data management provides not just one type of relationships but many including: domain, transformation, taxonomy, function and location. The real question isn't that we couldn't hire a consultant or assign an employee to the task of identifying these relationships, but how are they utilized. Does your meta data solution provide the functionality that is required to document these relationships? What value would come from having a system that can relate and document these relationships? The reality is that we haven't been very good at collecting and utilizing relationships. I have enjoyed watching the growth of the Internet over the past few years. The growth from a usage and content perspective has not surprised me. The ease at which organizations have jumped on the Web demonstrates that anyone that can understand HTML can publish a site. What has surprised me is that we have done a crappy job at defining relationships between these artifacts of information. The number of Web pages on the Internet may, only slightly, out number the quantity of assets in a major corporation. Thus, we have a similar problem.

How much longer are we going to determine who is using technology by simply turning it off and seeing who screams? Don't laugh, we all know that is exactly how it is done when you have a half-hearted effort to understand the meta data environment. What happens when a production application goes down and the CEO asks what the impact of the outage is? I hope your answer won't be "Well, only three people have called to complain." The repository isn't just about capturing information and loading into a meta-model. The relationship between assets is as important as the core descriptive information. When you consider the number of assets and the different types of relationships, you can see how complex this job can be. If we could solve the relationship problem then I would be going for an IPO with the world's best relationship engine. Sorry Google, that mathematical, keyword and linkage relationship business model will be destroyed by someone in the next five years.


For more information on related topics visit the following related portals...
Meta Data.

R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.

Solutions Marketplace
Provided by IndustryBrains

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.