Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge: The Essence of Meta Data:
When Meta Data Galaxies Collide

online columnist R. Todd Stephens, Ph.D.     Column published in DMReview.com
March 17, 2005
  By R. Todd Stephens, Ph.D.

The Internet is spectacular, 90 billion pages and growing, 352 million authors, 171 million domain names and 945 million users. The Internet is a result of rapid deployable technology that allows anyone with a computer to place their thoughts, concerns and expertise online. The majority of the information placed on the Web is not to sell something but rather to simply be heard. The Internet allows us all to be heard, perceived and branded by our own imagination. The Internet is dynamic, untethered, built with passion, unedited, governed only by technology and available to all. The Internet works 24x7, 365 days per year and works equally well in Sharpsburg, Georgia, as it does in Shanghai, China. The Internet is not always accurate, sometimes confusing, without management or guidance and many times fails to deliver accurate information to the end user. 

The Internet is a thing of beauty that works precisely because it is uncontrolled and the sheer volume of information that is available. The Internet works because of four specific reasons: diversity, independence, decentralization and aggregation of information. The first three are obvious and will continue to expand beyond our imagination. Aggregation is the real problem with the Internet, and the question remains how you aggregate 90 billion sources of information. The Internet is meta data's greatest failure. We revel in the success of search engines such as Yahoo! and Google with their indexing millions of pages of distributed content. Yet, type in "meta data" and you get 10,900,000 hits which has grown from 2 million just three years ago. How many of those sites actually provide a solid foundation of meta data? Type in "enterprise meta data" and you drop just below a million hits. Again, how many of these links are you going to follow? Better yet, what about the other 70 billion pages not indexed by the search engines? (It is estimated that only 10 percent of pages are indexed). What about the semantic Web? 

"The semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming. "

"The semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." - Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

The semantic Web would really be nice, except for a few small problems. First, how do you get the cow back in the barn? Are any of the 90 billion pages currently in existence going to convert to XML, RDF and other semantic technologies? And, are those people that wouldn't take the time to properly tag their content in simple metatag format, now going to try to figure out how ontology frameworks work? The reality is that the meta data information on the Internet is more of a social issue than a technical one. Web authors simply don't document their pages with well-formed meta data; not because they don't understand how to "tag" it but because they are lazy. Yes, there are a few organizations that try to cheat the technology by falsifying their meta, data but this is a very small percentage. The vast majority of us fail to "tag" our content because we simply choose not to. Unfortunately, today there is no real benefit to actually documenting with good meta data since most search engines fail to process the tagged information. Unless the semantic Web figures out a way that classification and documentation can be done seamlessly and easily, the semantic Web is decades away from any form of success.

Now, let's take a look at the intranet which attempt to bring many of the benefits of Web-based technology to the corporation. Okalahoma State University provides a great overview:

The "intranet" was born out of the need for groups to facilitate internal communication across geographical and time boundaries. By using common Internet protocols, or core technologies, the division can easily communicate, distribute information and facilitate project collaboration across the entire enterprise while keeping unauthorized users out.  Intranets are appealing because of the savings due to decreased printing, duplicating and distribution costs. A second driving force behind the rapid growth of intranets is that the technology is extremely intuitive and easy to use without much training. Employees use hypertext links to search for and access text, graphics, audio or video, all organized into home pages. At the basic level, the division can do away with many costly corporate documents that are now produced on paper, such as human resources guides, newsletters, annual reports, maps, company facilities, price lists, product information literature - any document that is of value within the corporation.  Intranets can be also be used to deliver software upgrades, provide marketing and sales support, disseminate training materials and schedules and provide employees with access to internal help desk knowledge bases. Teams can use them to exchange information and share data when working on collaborative projects.

This information is generally only accessible to employees, consultants and contractors that are employed by the corporation.  Information may extend to other business partners and relationships such as outsourced development, extended supply chains or legal relationships. While the intranet has many similarities with its kissing cousin the Internet, it also has many differences. The information placed on the intranet is generally controlled and governed by organizational or political factors that are not present on the Internet. In many cases, intranets are limited to business and process-oriented information content which is generally built top down.  This type of information is sometimes defined as "brand" information. Brand information can describe a business unit or technology group purpose, provide contact information, FAQs and detailed business processes. Online intranet information is critically important and saves an enormous amount of phone calls, e-mails and conversations in order to discover who, what, where, when and how internal business needs to get done. Many of these business functions can be moved to the online environment and each organization should strive to push forward in this area. Still, intranets are not built by passion, rewards conformity versus innovation and the majority of the content is edited and governed beyond seeding any original thought. Hence the difference between information applications built on passion (Internet) and ones built by salary (intranet). Accurate? You would think the accuracy rate would increase and be directly proportional to the level of governance. Unfortunately, this is far from the truth.  The organization is an evolving entity where employees, processes and structures are constantly evolving.  While the information may have been accurate at the time of publishing, the accuracy begins to deteriorate very quickly.  Which leads us to ask the question that haunts the Internet itself, is inaccurate information worse than no information? 

Meta data within the corporate environment offers a superb opportunity to develop an environment where the full potential of meta data can be received. Even a large corporate intranet would not expand beyond a couple hundred thousand pages of content. If you think of meta data as a crop then the intranet is like the rich soil of the Midwest, while the Internet is like the Arizona desert. We pause for a brief moment to reflect on our meta data success within the corporation ... Pause ... Now, look out! The corporation is opening its doors to collaborative computing with wikis, blogs, groove environments and many other technologies that support collaboration efforts. The traditional intranet allowed the corporation to stay focused on the business and the management of the business processes.  Intranets allowed us to streamline our internal processes and once again lower the cost of doing business.  Intranets allowed us to manage our organizations and gain efficiencies unlike we have ever seen before. Now comes a new era, where the lines between Intranet and Internet are blurred. Questions that must be asked include:

  • Will meta data succeed in these new environments as it has with intranets and repository technology?
  • Does the corporation want employees spending time on wikis, blogs or discussion threads?
  • How will the internal search engine differentiate between production business process information and general discussion?
  • Will the trust with intranet information be broken by mixing collaborative knowledge?
  • How will management handle the loss of control and governance?
  • How do we transform the technology professional into the high performance knowledge worker?
  • How will the company value (ROI) collaborative efforts?

There are many success stories in the knowledge management world where collaborative applications have served the organization well. The Eureka story from Xerox is an excellent example of success in this area. While Eureka was collaborative in nature, the real value was from the focus on one specific "vertical" area of repair. Perhaps the answer lies within the Eureka story: focus collaborative efforts on a specific and narrow problem.  he Internet and intranet methodologies are like two galaxies colliding. We don't have a Hubble telescope to see what happens when these worlds crash into each other. Nor do we know how to define success because we have two different methods of measurement. Unlike galaxies that follow physical laws, colliding environments follow no rules and may very well have different results based on the physical collision of the stars (business units). The rules are evolving at such a rate of change that leads some experts to say, "If it works, then it is obsolete." Perhaps success is only a temporary state of existence that bursts out of the galactic collision but evaporates as the value flame dies out as quickly as it was created. David Weinberger defines this confusion where, "Common sense doesn't hold there, and uncommon sense hasn't yet emerged. No wonder we're having trouble figuring out how to build businesses in this new land. We don't yet even know how to talk about a place that has no soil, no boundaries, no near or far."

In the end, the best and worst of each environment will emerge. Many people believe that the collision will result in "death" of the smaller galaxy (intranet) and the growth of the larger one (Internet). However, the truth is that both galaxies will "die" and a new one will emerge as a "reincarnation" of value. Over the past few years, we have seen a "reincarnation" in the basic definition of meta data and the integration into information intelligence. Perhaps, the collision is upon us and we are simply seeing the cosmic result. The final question is what do you do? Do you sit on the sidelines while others succeed, fail and eventually pave the path? The answer is obvious; the experience will be messy and beautiful at the same time. We must play a role when meta data galaxies collide.


For more information on related topics visit the following related portals...
Meta Data.

R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

cost-effective Web server security
dotDefender protects sites against DoS, SQL Injection, Cross-site Scripting, Cookie Tampering, Path Traversal and Session Hijacking. It is available for a 30-day evaluation period. It supports Apache, IIS and iPlanet Web servers and all Linux OS's.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.