Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge: The Essence of Meta Data:
Meta Data and XML: Will History Repeat Itself?

online columnist R. Todd Stephens, Ph.D.     Column published in DMReview.com
October 16, 2003
  By R. Todd Stephens, Ph.D.

Last month, we touched on the HTML meta data component called the metatag, and it seems logical to take a look at the XML meta data world this month. However, there is a huge difference between these two technologies. In the HTML world metatags are optional while in the XML world they are the required foundational components of the technology itself. Unfortunately, we don't have to look to far back to get a clear picture of where we are going.

The relational database model got its start in a series of IBM technical reports and then in a landmark paper, "A Relational Model of Data for Large Shared Data Banks," Edgar Codd laid out a new way to organize and access data. What Codd called the "relational model" rested on two key points: It provides a means of describing data with its natural structure only - that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high-level data language which will yield maximal independence between programs on the one hand and machine representation on the other. (Codd, 1970).

We as database programmers jumped right in. We didn't need any stinking models, data dictionaries or any tool to help define data. Inline coding was all the rage back then. I did it; you did it; we all coded applications with very limited reuse or data resource management. Eventually, the databases and applications got too big, and we needed to export the data definitions into copybooks, data definition structures or some other hardware-specific utility. Eventually, the database environment got too complex and we started using models to present the meaning of data and the ER diagram was integrated into the SDLC. After that data stewardship, meta data management, data quality and data resource management programs emerged and integrated themselves in the core data architecture. Ahhhhh, nirvana in the RDBMS had been reached (by someone, I'm sure).

Then comes XML down the technology track. Surely, we are not going to just jump right into this technology and return to those early days of database history. This time we are going to create meta data tag management applications built on the principles of reuse and provide the active utility of schema and DTD validation on the front end. This time, we are going to integrate data quality before it's loaded into the XML database. This time, we are going define standards and data definitions that can be categorized into understandable business views and business functionality. This time, we are going to implement XML not because it's the latest and greatest, but because it's the right business decision.

Yeah, right on! Who's with me? Hello? Hello? Staff? Staff? Darn, where did everyone go? No, I'll bet your organization is like mine where programmers, designers, integrators and project managers are jumping all over the XML technology with little or no forethought to the XML data resource management. Don't be surprised when you take a look at one of your XML files and see the problem shown in Figure 1. I have removed all but four statements from the file. Can you spot the errors?

Figure 1: Sample XML File

Within this one XML document, we found six different spellings for service, four different spellings for order, six different field naming standards, four different data formats as well as 22 fields for which we could not determine the definition or use (17 percent). Just as when we only had a few key databases, the management of the information wasn't a very big deal. We could mentally handle the semantic understanding and ambiguities because the scope was relatively narrow. What is going to happen when we have a 1,000 of these XML documents sitting around with their umpteen undefined standards: Oh, it won't this time: we're too smart; we're to advanced. XML is an open standard; we have DTD and schemas this time and we will have tools this time. Really, you think so? Where are the tools, where are the standards, where are all those advanced thinkers everyone keeps talking about? They don't seem to be the ones actually writing the XML code.

Of course, I can't blame the programming community who is under the siege of out-sourcing, overseas sourcing, budget cuts and time constraints that hardly allow time for taking a look at XML beyond the next three steps. Who among us would stand up in the face of management and say what we need in the XML environment is:

  • Consistent style of tag names,
  • Consistent naming conventions,
  • Consistent tag definition,
  • Managing the XML artifacts for reuse,
  • DTD and schema dynamic validation utilities,
  • Documented code sets,
  • Well-defined business model namespaces,
  • RDF-defined meta data,
  • Front-end topic maps and ontologies.

If you are one of these bold souls, the XML fellowship should have a place you. Be forewarned that you will have an uphill battle with the leadership that thrives on the short- term view and demands short-term results. Eventually, the path of meta data will be paved and acknowledged as a critical component of the enterprise as well as your XML strategy.

A couple of weeks ago, a friend and I were discussing the lack of new and innovative products to hit the market over the past decade. Where are the products that changed our lives such as television, microwave ovens, compact disks or the personal computer. Of course, things have gotten faster and smaller, but I am talking about truly innovative products that make everyone head to the store to buy, buy, buy. Perhaps, XML will be the infrastructure needed to bring on the next wave of innovative products to our door. Maybe the products will be digital in nature, and XML and the meta data held within will enable this next wave of innovation. All I do know, is that meta data was barely on the radar of most major corporations five years ago. Today, meta data is in the forefront the XML technology, and I for one can't think of a better place to be.


Check out DMReview.com's resource portals for additional related content, white papers, books and other resources.

R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.

Solutions Marketplace
Provided by IndustryBrains

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Speed Databases 2500% - World's Fastest Storage
Faster databases support more concurrent users and handle more simultaneous transactions. Register for FREE whitepaper, Increase Application Performance With Solid State Disk. Texas Memory Systems - makers of the World's Fastest Storage

Recover SQL Server or Exchange in minutes
FREE WHITE PAPER. Recover SQL Server, Exchange or NTFS data within minutes with TimeSpring?s continuous data protection (CDP) software. No protection gaps, no scheduling requirements, no backup related slowdowns and no backup windows to manage.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.