Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Knowledge: The Essence of Meta Data:
Metrics and the Source of all Knowledge

online columnist R. Todd Stephens, Ph.D.     Column published in DMReview.com
September 15, 2005
  By R. Todd Stephens, Ph.D.

Metrics have always been an important part of information technology. Unfortunately for the most part, metrics are an after thought of the project itself. The natural progression of a system that moves from innovation, incubation and migration (or the SDLC of choice) is to eventually measure the impact and value-add to the business. What is it about metrics they we love and love to hate? Metrics tend to create absolutes by where performance is defined by numerical standards which are not easily manipulated. Of course, there are more things than strict numerical analysis that define success, but organizations better be sure the implementation includes a solid collection of metrics.

Many metrics are simply irrelevant to the meta data work being done or don't have a direct impact on the long-term success of the program. Information gets gathered, but no action is taken as a result. Take a look at the performance metrics in the repository and ask yourself, "When was the last time we took an action, based on this number?"

Many times metrics are used as a weapon against the staff member. Dr. Edward Deming often said, "We need to drive fear out of the workplace." But most performance measurement systems do exactly the opposite. When management does act on a metric, they don't always look at the business process. Instead they focus on someone, some other department or some outside factor to "blame" causing people to play the system and to point their fingers elsewhere when problems arise. Many times the metrics selected are too high level to provide information where someone can take action to address before problems develop. In other cases, an important result gets looked at, but it is impacted by so many variables it is difficult to determine the degree of correlation. For example, is a three percent decrease in the rate of content due to improvements in the process or variability in the measurement system? The metrics show a result, but it may be too late to take any corrective action; which leads us to understand that by the time a problem is discovered, it is too late to do anything about it. In the world of meta data, there are an infinite number of possible metrics. This article will define the foundation metrics that each and every meta data implementation should review in detail: content and usage.

Content Metrics

Content metrics describe what information we have inside the repositories. Without considering how the data is used, content focuses on the what. Perhaps the most obvious example of content metrics is the object count. An object count sounds like a simple concept except for fact that there are multiple methods for defining what classifies as an object. In the database world, there are plenty of options to consider when counting objects: entities, attributes, tables, databases, fields and specific components of meta data descriptors. Should a table count as one asset or do we break down the object by the number of fields within the table? When counting logical assets, do we count the logical model, add the entities and attributes or is that considered double counting? There is no real answer here, other than: it depends on the organizational requirements. The essential element for content metrics is the consistency of delivery and definition.

We can measure the breath and scope of these meta data elements for each object type as well as the percentage of completeness for the model itself. Some objects may have an extended meta-model with 20 meta data elements while others may only contain a few. The number of attachments is another measurement that we can take on a specific asset. The thinking here is that objects that have extended unstructured documentation are better understood than those with only a few attachments. Examples of attachments could include logical models, UML models, user guides, installation instructions, etc.

The relationship is another content metric that most implementations fail to measure. Relationships between assets can fall into a wide variety of classifications: assimilation, classification, semantic and activity based. The most obvious relationship between assets is the assimilation relationship which basically states that one asset is directly, systematically and purposefully related to another. The classification relationship is a basic domain based relationship. For example, all ".xls" files on my personal computer are related by the classification that they are Excel files. The semantic relationship could be considered a classification relationship on steroids. However, the classification focused on a specific match under a meta data domain, the semantic search moves beyond and allows for a variety of semantic techniques. The activity-based relationships are created by reviewing the usage metrics which in turn create such as "most popular" or "top ten downloads."

Content metrics should be captured on a monthly basis and evaluated by utilizing trend analysis software which evaluates the information over an extended period of time. Ideally, the process of collection should be automated and have the ability to capture at any point in time. What growth percentage should be applied to the content metrics? Again, long-term success is not defined by the explosion of growth in the first year but by the subsequent three to five years. The first few years may very well have triple digit growth but sustaining this is near impossible over the long term.

Usage Metrics

The other key metric is usage. Remember, you can have all of the content in the world but with out usage you haven't done much more than build a nice inventory. Usage is the key to delivering long-term value-add to the organization. The first usage metric class is focused on the user. Many Web-based applications utilize three high-level classifications for user traffic. A "hit" is each individual file sent to a browser by the Web server. A "page view" can be described as each time a visitor views a Web page on your site, irrespective of how many hits are generated. Web pages are comprised of files. Every image in a page is a separate file. When a visitor looks at a page (i.e., a page view), they may see numerous images, graphics, pictures, etc. and generate multiple hits. For example, if you have a page with 10 pictures, then a request to a server to view that page generates 11 hits (10 for the pictures and one for the html file). A page view can contain hundreds of hits. This is the reason that we measure page views and not hits. Additionally, there is a high potential for confusion, because there are two types of "hits." The hits we are discussing in this article are the hits recorded by log files and interpreted by log analysis. A second type of "hit" is counted and displayed by a simple hit counter. Hit counters record one hit for every time a Web page is viewed, also problematic because it does not distinguish unique visitors. The third type of class is a visitor who is a human being, and their actions are "human" events, because only humans navigate the Internet (Opentracker, 2005). The best form of usage is probably page views; and since most repository applications are Web based, this shouldn't be hard to capture. The question that needs to be asked is what does the page view traffic look like over time? Do you set a standard for growth and work on tasks that can help build the traffic? Our organization uses the 10% bar to measure our progress over a 12-month time period. If our traffic grows at or above 10%, we have exceeded our objective. Those repositories that fail to deliver 10% must be reviewed in order to determine the long-term viability of the application itself.

We can also track the length of time a person stays on the repository, what time of day is most popular and which day gets the heaviest traffic. These time-based metrics are important to ensure the repository is up and operational 100% of the time, especially during high traffic periods. Now, if we move away from the user and focus the attention on the actual page or artifact, other metrics provide insight. We can tell which of the asset pages is viewed the most and which artifact has the highest download rate. These simple metrics may alter the way you present artifacts and even generate new classifications. Having links on the repository that represent "most popular," "most downloaded" or "latest additions" add value to the meta data environment. These classifications are defined as usage-based classifications. In other words, the use of the repository actually defines the classification of the assets. Assuming your repository has some advanced features, you can measure how many subscriptions per asset you have, how many transactions may be processed by the component or what is the reuse level within the application. Remember, you can generate any number of metrics but we should only focus on the ones that can generate action, support the expansion of the brand, and help managers understand the environment.

The discussion of content and usage leads one to ask a simple question; which comes first? Does the more content you add to the repository lead to increased usage? Or, does the more usage you have lead to more content? The obvious answer is that both statements are correct. Content and usage go hand in hand, and once one of the two metrics begins to fall the other won't be far behind. Keep things simple, focus on three to four key metrics and ensure consistent reporting. The one saving grace with metrics is that they lack emotion and simply exist to be reported and acted upon.


Check out DMReview.com's resource portals for additional related content, white papers, books and other resources.

R. Todd Stephens, Ph.D. is the director of Meta Data Services Group for the BellSouth Corporation, located in Atlanta, Georgia. He has more than 20 years of experience in information technology and speaks around the world on meta data, data architecture and information technology. Stephens recently earned his Ph.D. in information systems and has more than 70 publications in the academic, professional and patent arena. You can reach him via e-mail at Todd@rtodd.com or to learn more visit http://www.rtodd.com/.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Manage Data Center from Virtually Anywhere!
Learn how SecureLinx remote IT management products can quickly and easily give you the ability to securely manage data center equipment (servers, switches, routers, telecom equipment) from anywhere, at any time... even if the network is down.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.