Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge: The Essence of Meta Data:
XML Standards are an Integral Part of Meta Data

online columnist R. Todd Stephens, Ph.D.     Column published in DMReview.com
August 19, 2004
 
  By R. Todd Stephens, Ph.D.

One of the interesting questions that has been asked over the past few years is what role does meta data play in the development of XML? In the past, data architects could ignore the meta data aspects of defining data or, better said, they could let the application tool take care of mundane activities such as meta data. However, with the advent of XML technologies, meta data must be placed on the front end and integrated into every step of the process. In fact, data architects must find ways to exploit meta data and adapt their training to a new world where, perhaps, the definition of data is as important as the data itself. Take a look at Figure 1 and review the five tiers of XML asset creation.


Figure 1: XML Artifacts

Architecture Naming Conventions

Naming standards have been referred to as a "taxonomy" in some of the literature today. In my effort to keep all things as simple as possible, naming standards will reflect how we define the rules of construction. For example, lets review a couple of simple rules.

Language. Wherever possible, tag names will be constructed using American English as the basis for language and spelling.

Term Capitalization. All terms used within the XML tagging structure will begin with a capital letter and all subsequent letters for that term will be in lower case.

Valid: DayOfMonth, HoursWorked, PrincipleBalance
Invalid: DayofMonth, hoursWorked, PrincipleBALANCE

Term Separation. Term separation characters will not be allowed; the capital letter of Term Capitalization will act as a term divider.

Valid: DayOfMonth, HoursWorked, PrincipleBalance
Invalid: Day_Of_Month, Hours Worked, Principle-Balance

Most naming convention documents will contain many more of these but you get the idea. We are stating the basic principles or standards of how XML tags can be created. Naming conventions are simply the foundation of this effort. Knowledge of these principles should extend up and down the organization concerned with the development, reuse or integration of XML based applications including: designers, developers, analysts and testers. These rules should be encoded into the XML schema development application (if there is one) or housed in the architecture document repository.

Terms

Terms sounds like the simplest of the elements in the five-tier program. We want a collection of terms that can be used in conjunction with the rules to build our element inventory. Great, that sounds simple; where is that Webster's dictionary? Hold on, it is not that simple since we may have multiple terms. For example, what if we had an architecture standard as follows:

Consistent Use of Terms, Abbreviations and Stock Symbols. All terms used within the XML construct will be consistent in the utilization of terms, abbreviations and the special use stock symbol within the document. The stock symbol abbreviation is the shortest construct for the term. Example: Term=Balance, Abbreviation=Bal, Stock Symbol=B.

Valid: PrincipleBalance, PrinBal, PB
Invalid: PrincipleBal, PBalance, PBal

A solid glossary will hold the collection of valid spellings, definitions and other construction information and rules.

Elements and Attributes

Elements and attributes are the result of one or more terms being combined in order to define a specific context for data. Examples would include PrincipleBalance and DayOfWeek. Elements will be housed in the data dictionary along with definitions, ownerships, models or other types of meta data. Do you really need a data dictionary since elements and attributes can be defined inside the schema? Not really, applications are available that can scan across multiple schemas looking for specific elements. However, many researchers are focusing on the ability to model x-dimensional data for XML. Perhaps this logical, conceptual and physical modeling concept will provide the link between these two efforts.

Schema

The schema has replaced the use of the document type definition (DTD) with a new XML technology. The DTD is a long-standing standard under the SGML standard and is excellent at describing document structure. However, the DTD falls short in providing the same utility to a data-centric environment. The schema provides the ability to describe rules and constraints for base and derived data, extended data types, facets, value limits, enumeration, occurrences and patterns.

Where are you going to house those schemas? OK, there are only about 20 in use within your organization so it's no big deal. What happens when there are 10,000? What about externally defined schemas? Have you taken a look at the number of publicly available vocabularies out there? There are already hundreds spanning multiple businesses and processes. If your organization is going to use external standards or vocabularies then you are going to need to capture the significant meta data that describes the vocabulary, how it is being used, what business process can benefit from using the standard, etc. An additional element of meta data that is required includes a subject matter expert. Let's say that you decide that you are going to use the OASIS ebXML Business Process standard in an application. It is not only important to capture the specifics of the standard but also identify someone that understands the standard, elements and utility provided. Otherwise, you will be forcing the development community to start from ground zero every time they choose this standard.

Usage

Usage provides the basic value of identifying which organization, applications, programs, message structures are using the schema definition. Here is where your entire repository collection comes into play. Assuming you have been listening over the past few years, you should be able to tie the schema repository to your system, application and interface repository; to your reusable component repository; and to your Web service registry. You may even begin to see the need to supplement your reuse program with another reusable asset, namely the XML schema. You do have a formal reuse program based on the reuse maturity model in your organization? Right?

The goal is to have a repository with the critical information online and readily accessible for every stage of the process. There are several products on the market that handle some but not all of the required levels. Which level is the most important? They are all important depending on your role and stage of development. Recently, I spent some time at an academic conference in Las Vegas. Yes, some where between crying over pocket aces in seven card stud and the elation of the dealer getting a pair of fours to go with my straight, I was able to discuss meta data with both faculty and students. For those of us with an ACM or IEEE subscription, the lack of meta data research is disappointing at best.

The response was a welcome sight in that the 20 or so people that attended seemed to be really interested in where meta data is going and why this should be an important research topic. First, we must address the basic question of why meta data should be an academic research topic and not a professional one. For the most part, meta data research is still stuck in the 1980s in the sense that meta data is just data about data. I proved this, albeit not scientifically, by taking a look at my freshman text on computer science. Oh yes, we all remember those days of card decks, mainframes, COBOL, Assembler, Commodore 64s and meta data or data about data. Today, if you open a computer science text book the world has changed as we see from C Sharp, UML, object-oriented, grid, Java and meta data or data about data. Are we saying that every technology under the computer science umbrella has changed except for meta data? Unfortunately, meta data has lost much of its luster and fascination which has been replaced with XMI, topic maps, ontologies, etc. The basic value and utility of meta data has not changed and therein lies the problem. We need to expand the body of knowledge around this subject and stop expecting the solutions to come from the vendor community. Perhaps the XML construction process and repository relationship is a starting point.

...............................................................................

For more information on related topics visit the following related portals...
Meta Data and XML.

R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

dotDefender protects sites against Web attacks
30-day evaluation period for dotDefender, a high-end cost-effective security solution for web servers that protects against a broad range of attacks, is now available. dotdefender supports Apache, IIS and iPlanet Web servers and all Linux OS's.

Click here to advertise in this space


E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.