Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search
advertisement

RESOURCE PORTALS
Business Intelligence
Compliance
Corporate Performance Management
Data Integration
Data Quality
Data Warehousing Basics
ETL
Master Data Management
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
DM Review Extended Edition
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Tech Evaluation Center:
Evaluate IT solutions
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Ontology Management for Federal Agencies

  Article published in DM Direct Newsletter
June 17, 2005 Issue
 
  By Joram Borenstein and Rex Brooks

The continued accrual and creation of data as well as the ongoing attempts to turn this data into information have the possibility to both liberate and paralyze large organizations and agencies. In this regard, the private and public (local, state and federal) sectors are no different from one another. Both have external realities and internal needs jockeying for control of technical and business executives alike.

What emerges, therefore, is a classic knowledge management quandary. How, on the one hand, are these bodies to make progress, fulfill new business requirements and achieve new objectives while at the same time ensuring that the classification, understanding and application of data continues uninterrupted? Data accrual and creation is not an inherently complicated science. Yet the sheer volume and capacity involved leaves many enterprises struggling to come up with ways in which this data can be properly managed and understood.

Typical Headaches/Pain Points-Bottlenecks, Chokepoints, Vocabulary Mismatches Mean Data Model Mismatches

Organizations with data and information management headaches tend to struggle with a fairly similar set of characteristics. Many tend to have numerous applications, data-entry points, formats and ways of managing data. Such organizations also typically possess few, if any overarching views into the data. Moreover, they tend to lack an ability to compare and contrast data, sometimes when it's even stored in the same format and/or location. Finally, the absence of a centralized mechanism for cataloging data is normally the state of affairs as well.

The existence or absence of the items just described tends to result in obvious problems requiring description. For instance, without an ability to compare and contrast data, redundancies cannot be identified. Second, without a way to search for data resemblance, semantic inconsistency inevitably arises in some form, either due to confusion, overlap, improper use of language or terminological ambiguity. Third, data and any underlying data models that cannot be properly understood and documented normally continue to remain unclear for years to come, thereby affording little opportunity for users to implement any badly needed improvements.

Specific Government (Federal and State) Information Management Problems

As indicated above, both the private and public sector suffer from some or all of these problems in one way or another. As such, unfortunately the Federal Government is no stranger to these issues. If only due to its enormous workforce, disparate data entry-points, and competing data formats and data models, the Federal Government faces enormous hurdles (some of which, it's important to point out, have been resolved in recent years) in terms of its overall data and information management and architecture.

An Environmental Example

The scope and diversity of the information management problems that have arisen recently within or related to the Federal Government is very wide. One recent example had to do with constructing an online system available to U.S. residents interested in learning about environmental toxins and other hazards, regardless of their source or of whom tracks and identifies such hazards. This project is all the more complicated by the fact that Environmental Protection Agency (EPA), federal, state and local data is involved. Some localities are covered by one or more of these agencies; others are tracked by only one of them.

By relating disparate data assets from various formats (including Microsoft Excel spreadsheets, structured files and Microsoft Access files) to one another (through the use of an ontology model, which will be described in more detail), a Web interface portal was constructed. Employing this interface, users will one day be able to query various data sources and gain immediate knowledge of environmental health information.

A prime example of this would be an individual looking to understand regional or local environmental problems before purchasing a home. Another good example of the utility in this project would be a parent intent on understanding possible levels of environmental exposure to a child. Finally, local officials or industry lobbying groups might wish to apply or lobby for additional cleanup funding if they discover that certain areas are more polluted than others.

An Aviation Security Example

An equally compelling information management problem encountered on a daily basis by local and federal government officials revolves around aviation security. In a post-September 11th world (and pre-September 11th, as well), assessing security risks amid passengers departing from and arriving into the United States remains an ongoing challenge. As in the environmental example above, the data sources for assessing passenger threat levels come from a multitude of sources and in a multitude of formats. Only by organizing and comparing the useful bits of data do Federal Aviation Administration (FAA) and Transportation Security Administration (TSA) officials possess the ability to reduce passenger check-in times, eliminate false positives, and ensure a working aviation system nationwide.

In a recent implementation aimed at resolving this situation, the FAA used the diverse information sources it had available to compare data, uncover relationships from disconnected information, and make a higher percentage of correct assumptions about passenger threats. Analysis of this data and the FAA's ability to merge and use those aspects of its broad information capabilities make an increasingly complex task more manageable and successful.

Information Superhighway to Citizen Centric Government: Historical Perspective, Legislation, Recent Initiatives and Mandates

The Clinger-Cohen Act of 1996 (formerly known as the Information Technology Management Reform Act (ITMRA)) repealed the Brooks Automatic Data Processing Act of 1949, which had originally authorized and directed the Administrator of the Federal Government's General Services Administration (GSA) to coordinate and provide for the economic and efficient purchase, lease, and maintenance of automatic data processing equipment by Federal agencies. Clinger-Cohen was adopted to enable the various departments and agencies of the Federal Government to more effectively focus on the particular scopes and purposes of their more narrowly defined domains of activity and knowledge rather than the large scale viewpoint that GSA's perspective allows.

What this reform represented was a sea change in the way the Federal Government conducts business with regard to IT in response to the rapid development of the World Wide Web (to which we will refer as the "Web") of the Internet and the advent of this electronic information medium as a major channel for the flow of information globally. It should be noted that the widespread use of the Web had not exploded to the extent it has since. Yet, even while it may not have been thought of in such a way at the time, the Clinger-Cohen Act had sufficient breadth to allow for its rapidly expanding adoption. Thus its provisions allow citizens and the agencies which serve them to adjust to these new and changing methods in which relations with the larger national community can now be conducted. This is not to say that all hurdles have been accommodated, but a good start has been made.

Specifically, Clinger-Cohen assigned authority for the management of IT within the Federal Government to the Director of the Office of Management and Budget (OMB). Further, it set forth a procedure by which these semi-independent IT investments would be required to explicitly set goals and objectives against which these investments would also be required to be assessed on a timely or annual basis.

The practical, net result is that this requires agencies to at least attempt to maintain a coherent IT strategy. For the record, many successive pieces of legislation have contributed to the current need for these agencies and departments to develop an Ontology Management Policy, including:

  • The Government Performance and Results Act of 1993 required a strategic plan with annual reviews for all departments and agencies to create explicit goals and objectives that can be tracked by specifying outcomes against which those outcomes can be measured to track results;
  • The Government Paperwork Reduction Act of 1995 established the Office of Information and Regulatory Affairs (OIRA) in OMB. This office reviews all Governmental Agency Information Collection and requires that only the information necessary for proper and practical utility be collected by any department or agency;
  • The Government Paperwork Elimination Act of 1998 allowed "electronic information submission and transmission" in Government Agency Interactions, and allows departments and agencies to maintain electronic records;
  • The E-Government Act of 2002 enhanced the management and promotion of electronic Government services and processes by establishing a Federal Chief Information Officer (CIO) within the Office of Management and Budget (OMB), and by establishing a broad framework of measures that require using Internet-based information technology to enhance citizen access to Government information and services, and for other purposes.

Shortly, we will survey the circulars issued by the OMB which mandate guidelines which federal departments and agencies must use to establish their goals and objectives, describe how these will be measured, and justify the IT investments to be made that fulfill the obligations of the foregoing legislation. However, first, let's look at the keystone around which this overall effort pivots, the Federal Enterprise Architecture Framework (FEAF) that was developed in response to Clinger-Cohen and was established in 1999 with the publication of Version 1.1 by the Federal Chief Information Officer's Council.

The primary set of tools which the FEAF provides are the five reference models that define terminology and associated data types for the broad areas of IT which are common across the government and comprise the major components of the Federal Enterprise Architecture Management System (FEAMS):

  • FEA Business Reference Model (BRM)
  • FEA Service Component Reference Model (SRM)
  • FEA Performance Reference Model (FRM)
  • FEA Technical Reference Model (TRM)
  • FEA Data Reference Model (DRM)

FEAMS provides, through the BRM, a set of business lines which are common to many departments and agencies, and inclusion of analyses based on these specific lines of business (LOB) are required under some of the following OMB Circulars which mandate implementation of the legislation discussed to provide the Capital Planning and Investment Control (CPIC) required by Clinger-Cohen:

  • OMB Circular A-11 - Preparation, Submission and Execution of the Budget
  • OMB Circular A-19 - Legislative Coordination and Clearance
  • OMB Circular A-94 - Discount Rates to be Used in Evaluating Time-Distributed Costs and Benefits
  • OMB Circular A-97 - Specialized or Technical Services for State and Local Governments
  • OMB Circular A-131 - Value Engineering
  • OMB Circular A-16 - Coordination of Geographic Information,and Related Spatial Data Activities
  • OMB Circular A-89 - Federal Domestic Assistance Program Information
  • OMB Circular A-130, Management of Federal Information Resources
  • OMB Circular A-119 Adoption of Voluntary Consensus Standards

The lines of business and the agencies specifically tasked with developing these business cases for fiscal year 2006 are:

 Financial Management (FM)

            Department of Energy

            Department of Labor

Human Resources Management (HR)

             Office of Personnel Management

Grants Management (GM)

              Department of Education

              National Science Foundation

Federal Health Architecture (FHA)

              Health and Human Services

Case Management (CM)

               Department of Justice

As of the first quarter of fiscal year 2005, and notwithstanding a pilot project in the Department of the Interior to create an emerging technologies Web site for reference throughout the government, ET.gov, plans for creating a centralized or federated set of repositories or registries to provide easier access to the IT organizing resources and data assets of the Federal Government have not yet been drafted. However, there is positive impetus toward this eventual collection of resources in the Communities of Practice that have been organized under the auspices of the Federal Chief Information Officers Council (CIOC) and the Architecture and Infrastructure Committee (AIC) in active collaboration with some cabinet departments such as the Department of Justice, Department of Homeland Security, Department of Defense, Department of the Interior, and Department of Transportation.. These are the XML Community of Practice (XMLCoP), http://xml.gov, Semantic Interoperability Community of Practice (SICoP), http://web-services.gov, Knowledge Management Community of Practice (KMCoP), http://km.gov and the Collaborative Expedition Workshop Series (CEW), http://ua-exp.gov (which has since moved its ongoing collaborative efforts to http://colab.cim3.net under the Wiki link or under the COLAB link on the http://ua-exp.govsite.

Additionally, there are several efforts at consolidating meta data about IT resources and data assets in such efforts as the Department of Homeland Security Meta Data Center of Excellence the Department of Defense Meta Data Registry:

DoD Architecture Framework (DoDAF) - Previously C4ISR Architecture Framework, Including the DoD Meta Data Repository

An additional element that makes the Federal Government unique from the commercial sector is the need to comply with legislation and the related budgetary realities of this legislation. The goal behind this effort is to bring compliance to bear on agency budgets so that reuse of existing and similar reference models would increase.

As described previously, over the course of the past decade the Federal Government has introduced a number of new mechanisms for initially proposing and subsequently enforcing the need for a cross-government federal enterprise architecture called FEAF. This was accomplished primarily by Congress passing in 1996 new legislation outlining the benefits of this task, the Information Technology Management Reform Act (ITMRA) known now as the Clinger-Cohen Act.

Nevertheless, it is important to acknowledge the contribution and parallel efforts being undertaken in other departments within the federal government. For instance, the TEAF (Treasury Enterprise Architecture Framework) and DoDAF (DoD Architecture Framework; formerly C4ISR) are initiatives each related to the FEAF yet focused primarily on resolving architectural difficulties within the Departments of Treasury and Defense, respectively. While both TEAF and DoDAF relate to FEAF (albeit in different ways), the DoDAF is more relevant because it is newer and also includes an underlying repository of architectures called DoD Architecture Repository System (DARS). DARS enables reuse of existing technical resources, in this case specifically the various enterprise architectures being used across and within the Department of Defense

FEAF, FEAPMO Return to AIC and CIOC

More recently, the OMB decided in 2004 to return the Federal Enterprise Architecture Framework and Program Management Office to the jurisdiction and oversight of the Architecture and Infrastructure Committee and the Chief Information Officer's Council as the more appropriate arena in which to conduct this effort. As such, its Web site has been relocated to www.egov.gov. You may also access FEAMS from www.egov.gov or directly from www.feams.gov.

The importance of these developments is that the way is now paved for utilizing the emerging toolsets of the semantic Web, a concept originally coined by Sir Tim Berners-Lee for describing a coherent structural approach to resources available through the Web and useful in this case for a more economical IT future in the U.S. Federal Government. To understand this more thoroughly, agencies and departments need to become more familiar with the following concepts and standards.

Ontologies

Ontologies are descriptive models that enable users to accurately describe a specific knowledge domain. Moreover, they allow such descriptions to be made unambiguously. Ontologies are typically defined in two main ways. Either they are considered to be "a branch of metaphysics concerned with the nature and relations of being" (Merriam-Webster) or, more generally, "a specification of a conceptualization" (Tom Gruber). For our purposes, the distinction among these definitions is not crucial. (Interestingly, The Free On-Line Dictionary of Computing defines an ontology as "an explicit formal specification of the objects, concepts and other entities that exist in some area of interest and the relationships that hold among them.") For our purposes, it is most important to stress the relationships that hold among the entities because it is through those relationships that resources are quickly found and utilized within specific domains, and which allow inference and rules engines to allow abstract knowledge resources to be supplied with operational data and perform useful tasks.

The concept of ontology originally emerged from the field of philosophy, studied for thousands of years, particularly by the Greeks. In the 5th century BC, Socrates, Plato and Aristotle (among others) pioneered philosophy, logic and metaphysics. Ontologies only really began seeing hands-on use when computer science departments and academics begin using them to solve problems across academia, often in concert with artificial intelligence projects. In the past 15 years, academic groups have increasingly attempted to turn ontologies into commercial products, a number of examples of which exist today in the market.

Ontologies are commonly used to provide a common vocabulary for the entities in a domain and for the relations between these entities. Also, they are excellent for capturing information about these same entities, their characteristics and their interrelations. Lastly, ontologies do a good job of rationalizing disparate data sources.

In terms of current usage, ontologies are being applied and used by numerous private and public sector bodies. In fact, a number of these groups have even posted their ontology models online for perusal, distribution, and use (see, for instance, the DAML Ontology Library at http://www.daml.org/ontologies/). Such organizations include Verizon, CyCorp, Lockheed Martin, Booz-Allen Hamilton, Lockheed Martin, Yale, Stanford and the Massachusetts of Technology (MIT).

Upper Ontology

Upper ontologies (sometimes referred to as foundation ontologies) are formally defined as "a hierarchy of entities and associated rules ... that attempts to describe those general entities that [do and] do not belong to a specific problem domain." (source: Wikipedia) As opposed to an ontology, upper ontologies are those concepts which are one level of abstraction above the ontology. By being general, they are also therefore more generic than ontologies and do not define specific domain knowledge concepts.

Domain Ontologies

Domain ontologies are any ontology which defines the specialized knowledge relevant to a specific domain. For example, a domain ontology might refer to knowledge having to do with banking and serve the financial industry. Another might have to do with legislation, automobiles, or health care.

Taxonomies

As with ontologies, taxonomies are another way of classifying items in a structured manner. Taxonomies are traditionally hierarchical in nature, sometimes in what is referred to as a tree structure. As opposed to ontologies, taxonomies do not possess classification mechanisms such as business rules, indirect properties, and the like. This distinction and others make taxonomies less likely when compared to ontologies to be used for enterprise Information Technology (although it is important to point out that tree-like structures have been used by some enterprises to describe knowledge domains).

Resource Description Framework (RDF) and RDF Schema

Resource Description Framework (RDF) is also a part of the W3C's Semantic Web activity. Where XML specifies vocabularies or languages, RDF is a language that is commonly used to explain and describe web-based information about resource entities in a common manner. RDF statements are composed of resources, properties and values, and are commonly referred to as being composed of a subject, predicate and object, a tuple structure known as a "triple." One article describes RDF as, "an effort to identify these common threads and provide a way for Web architects to use them to provide useful Web meta data without divine intervention." (Source: http://www.xml.com/pub/a/2001/01/24/rdf.html?page=2) RDF Schema on the other hand are used to describe how RDF vocabularies are used.

OWL

OWL (Web Ontology Language) is a way to allow for ontologies to be shared using the World Wide Web. OWL is a language for specifying semantics. It was only approved in early 2004 as a Candidate Recommendation by the World Wide Web Consortium as part of that organization's Semantic Web Activity (http://www.w3.org/2001/sw/). It's also worth noting that OWL has three distinct sublanguages: OWL Lite, OWL DL and OWL Full. Finally, from a historical perspective, OWL is expressed through RDF and represents an outgrowth of both RDF and an earlier ontology language called DAML+OIL Web Ontology Language.

The Existing Web and New Ontology-Related Standards

Whereas HTML was designed with information presentation as a primary goal, XML, OWL and RDF have been brought together (primarily by the W3C) to ensure proper information content and description. It is this distinction which is so crucial to the success of proper information sharing.

Our exploration of ontologies to this point has been largely confined to the historical and general perspective. We wanted to explain what ontologies are, where they come from and how they can be useful in the context of the IT requirements of the Federal Government in broad terms. We also wanted to explain the context in which the adoption of ontologically related technologies underpinned by open and largely royalty-free standards makes sense for most if not all Federal Governmental departments and agencies. These standards also happen to fall within the loose bounds of what is known as Voluntary Consensus Standards mandated for adoption where feasible by the OMB.

We have provided a wealth of sound reasons why adopting an ontological management of IT resources will substantially reduce the burden of simply keeping track of those resources, and, therefore, making those resources more useful to federal agencies and deparments. However, as the saying goes, "the proof of the pudding is in the eating," so we now look forward to the second part of our report which will examine how well-grounded management of IT resources through domain-specific ontologies that are well-integrated into the developing structure of an overall governmental IT structure inherent in the concept of the Federal Enterprise Architecture Framework will benefit agencies by ultimately lowering costs, reducing duplicated data and duplicated management efforts, increasing efficiency and delivering good quality service to the citizens who comprise any agency's major constituency and stakeholder group.

...............................................................................

For more information on related topics visit the following related portals...
Master Data Management and Reference Data.

Joram Borenstein is director of marketing at Unicorn Solutions, working on enterprise semantic technologies. His previous experience includes managing the rollout of content management software platforms. He has written and lectured extensively on the Semantic Web, e-commerce, ontology modeling, Web services and grid technologies.

Rex Brooks, president of Stabourne Communications Design, has pursued an extensive and wide-ranging career in advertising art direction, corporate identity and graphic design. His ongoing interests have included the applications of computer technology in his field and applying concepts from the fields of psychology, sociology and advertising in the area of semantics and semiotics for the purposes of improving communications in digital information systems. This led to his involvement with OASIS in the HumanMarkup Technical Committee helping to create the Human Markup Language. He is the cofounder of the Content Development Working Group of the Web 3D Consortium and Humanmarkup.org, Inc. and serves as vice chair of the OASIS HumanMarkup Technical Committee. He is also actively serving on the OASIS Web Services for Remote Portlets and Emergency Management Technical Committees. You can reach him at rexb@starbourne.com.



E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2007 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.