Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Integration Theory, Part 2

  Article published in DM Direct Special Report
September 6, 2005 Issue
 
  By Joram Borenstein and Russell Ruggiero

Part 1 of this Integration Theory series appeared in the August 12, 2005 issue of DM Direct and is available at http://www.dmreview.com/article_sub.cfm?articleId=1034584

Back in late 1999, XML was considered an immature technology with a very low adoption rate among both vendor and developer communities. Today, things have changed a great deal regarding mainstream adoption of XML. A case in point is the recent announcement that the next version of Microsoft Office will save files in XML by default. And while 1999 may be looked at as the pivotal year for XML adoption, we now believe a similar scenario regarding the Semantic Web is developing in 2005. As with any revolutionary or groundbreaking effort, adoption is predicated on a number of factors that include guidance, education, and the ability to deal with setbacks. In any event, the Semantic Web is a compelling concept that is building critical mass at a very impressive rate.

The Semantic Web

Much has been written about the Semantic Web, but individual interpretations often skew the original message. Therefore, going back to square one is preferable to gain a proper and accurate depiction of the Semantic Web.

" The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. " ( Tim Berners-Lee, James Hendler, Ora Lassila," The Semantic Web," Scientific American, May 2001 ).1

'The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming." 2

Stripped from multiple layers of media varnish, the message is clear: the Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Key enabling technologies include the Web Ontology Language (OWL) (a markup language for creating ontologies, a prime area of focus regarding the Semantic Web) and the previously mentioned RDF.

How Ontologies Relate to the Semantic Web

To understand the Semantic Web comprehensively and accurately requires a brief overview of ontologies and their role vis--vis the ongoing and gradual creation of a Semantic Web. For without a consistent manner in which individuals and machines can process and understand information, the notion of a semantically reliable web of pages, documents, files, images and the like becomes nearly impossible. The early creators of the Semantic Web idea have inherently understood this need for an underlying infrastructure since their work on this initiative in 2000-2001. The key enabling technology selected to serve as the underpinnings of the Semantic Web are ontologies.

First to definitions. The most frequently quoted definition of an ontology is one originally authored by Tom Gruber; he describes an ontology as "a specification of a conceptualization" 3 and while this is perhaps theoretical in nature, it nonetheless transmits the extent to which objects in a business or elements of a process workflow can be described accurately and granularly. When introducing this subject to non-technical audiences, it is advisable to explain an ontology as akin to a common business language, or an agreed-upon set of terms, relationships, and business reflecting the inner workings of an organization, department or line of business.

Commercially, ontologies are thought of in terms of their practical applications and benefits. For real-life applications, ontologies typically are relied upon for a number of reasons and uses, including but not limited to: information processing, sharing, comparison/reconciliation, searching and, perhaps most important, interoperability. Some industries have even begun thinking about and experimenting with standardizing industry models and messaging protocols on ontologies. Chief among such industries are the insurance industry's Association for Cooperative Operations Research and Development (ACORD ) standard, the U.S. Department of Defense's DoD Architecture Framework (DoDAF) and the utility industry's Common Information Model (CIM). The broad Semantic Web effort, however, does not draw distinctions among commercial, academic or consumer-oriented aspirations. Rather, it aims to bring together in a single vision the numerous benefits ontologies give to this vision and furthermore ensure its survival through a process of standardization.

On the first front, the benefits are multifold and touted in numerous articles and presentations. The most mainstream and frequently quoted one is the May 2001 article co-authored by Tim Berners-Lee, James Hendler, and Ora Lassila in Scientific American entitled simply "The Semantic Web." In this piece, the authors (all three of whom are Semantic Web pioneers known worldwide) explain that "Ontologies ... can be used in a simple fashion to improve the accuracy of Web searches ... [or for] more advanced applications ... to relate the information on a page to the associated knowledge structures and inference rules." They define ontologies as a "vocabulary for discussion," (also a good way to describe this technology to non-technical audiences!) In a similar vein, a 2002 article stated "Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing." 4

As for standardization efforts, the most aggressive and mature effort to bring coherence to the ontology world is that being spearheaded by the W3C, specifically its work around OWL. It is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content by providing additional vocabulary along with a formal semantics. (http://www.w3.org/TR/2004/REC-owl-features-20040210/ "OWL Web Ontology Language: Overview "). As of February 10, 2004, OWL is an official W3C Recommendation.

Search Technologies and the Semantic Web

Some confusion currently exists regarding the capabilities and methodologies of popular search technologies and the common framework that supports the concept of the Semantic Web. Listed below are some important differences.

Popular Search Technologies

Offerings such as Ask Jeeves and Google are search engines that employ text-matching techniques to find pages. There is no set of defined standards, so queries are done in an ad hoc manner. However, the continued refinement and increased functionality of these offerings are making them more useful and popular. Two prime examples of increased functionality include: 1) the introduction of Zoom by Ask Jeeves, which is a categorization technology that enables users to expand or narrow searches based on conceptually related topics; and 2) the new Google Desktop Search program ( http://desktop.google.com/ ) allowing users to hunt for e-mails, documents and other files on a hard drive.

The Semantic Web

The Semantic Web creates an underlying fabric that enables the end user to perform a federated query that presents them with information either on specific topic or ontology. Rather than leverage an ad hoc methodology, the Semantic Web promotes a common framework that allows data to be shared and reused by leveraging open standards such as RDF and OWL. The ability to leverage open standards enables this common framework to access data across unlike applications or platforms, which in essence promotes the goal of interoperability.

This concept is all well and good, but how does building a common framework actually create a coherent data ecosystem? The key is meta data, which may be viewed as data about data, data about processes and data about platforms. Meta data enables computers to better understand the meaning of the information, which maps back to Berners-Lee et al .'s original message of a " Well-defined meaning, better enabling computers and people to work in cooperation." Theoretically, a common framework based on open standards has the ability to access and retrieve structured data, which may be housed in a relational database or unstructured data residing in an enterprise content management system (ECMS). As a result, the end user has the capability to access and retrieve data that may have been otherwise unavailable to them with popular search technology such as Ask Jeeves or Google.

In addition, a common framework is far more coherent in nature than a popular search technology when accessing sensitive data. For example, various data sources have different levels of security access. As a result, a common framework provides the capability to leverage a consistent set of business rules that allow for the full access, limited access, or denial of a data source, and/or data sources by the end user.

While different in many ways, both of these efforts provide value regarding information access and retrieval. Listed below are private and public scenarios that are meant to help differentiate between these two efforts.

New Meta Data-Focused Solutions

The Semantic Web will most likely be built in stages by various communities of practice, which will act as the building blocks for this effort. Rather than just presenting a concept, we must leverage open standards, software, and hardware solutions, combined with accepted best practices to help solve business problems. Listed below are three product areas that will be instrumental in building the Semantic Web.

  1. End User (New Breed of Search Technologies). Conventional search technologies are limited in their ability to properly query structured/non-structured and secure/non-secure data sources. Enter the new breed of search technologies that are focused on supporting metadata and a wide-range of file formats. Because of their greater support for metadata and related open standards, the new genre of search technologies are better suited to present ontological information to the end user.
  2. Coherent Layer or Framework Layer (New Breed of Integration Products). Enterprise information management (EIM) solutions create a coherent layer or framework layer above the data source, and/or sources. In basic terms, t hese types of solutions are focused on enterprise architecture (EA) and meta data management to align meta data (describing current IT systems) with enterprise architecture (describing the business). This is accomplished by supporting a consistent set of business rules, combined with support for common functions and processes.
  3. The Data Sources (New Breed of Database Products). The ability to properly query "data" is a core component regarding the creation of the Semantic Web. Hence, support for the following three technologies by the major database vendors will be of paramount importance:
    • SQL/RDBMS: Transaction meta data is embedded or implicit in the application or database schema.
    • XQuery: XML wraps the meta data about the transaction around the data.
    • SPARQL/RDF: Enables semantics as well as syntax to be embedded in documents.

The product areas mentioned above are all focused on creating a bidirectional meta data-based environment that leverages both semantic (RDF and OWL) and core Web services/SOA (e.g., WSDL, ebXML/UDDI, SOAP, etc.) open standards.

Bottom Line

In some form or another, the Semantic Web will be upon us sooner rather than later. The faster people realize this fact, the greater the chances of its widespread adoption. It is also fair to say that while the Semantic Web vision may progress in the manner outlined by Tim Berners-Lee, variations to this original vision are most likely to occur concurrently. Expect early adopters (e.g. academia, government, life sciences, etc.) to help build the core foundation for the Semantic Web in an organic manner, perhaps internally at first. Moreover, a Semantic Web (or Webs) will most likely be built in stages, rather than through a "big bang" approach. Accordingly, open standards via OASIS, the W3C and others will play a vital role in helping to shape and guide the overall effort.

Pundits may argue that the Semantic Web is not a feasible or practical undertaking. Many reputable analyst and reporters said the same of XML in the late 1990s. Prognostications aside, these people did not fully understand or appreciate the value of this groundbreaking new technology. Statements such as "vendors and customers will never adopt XML because it is far to complex" or "XML presents too much overhead" were quite common. As faith would have it, XML has now become the de facto standard. A similar scenario appears to be forming in regards to the Semantic Web. The Semantic Web promotes a common framework that allows data to be shared and reused by leveraging open standards such as RDF and OWL. The ability to leverage open standards enables this common framework to access data across unlike applications and platforms, which ultimately helps to promote the goal of true interoperability, both within organizations and across industries and partnerships.

References:

1. Source: http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2.
2. Source: http://www.w3.org/RDF/.
3. Source: http://www-ksl.stanford.edu/kst/what-is-an-ontology.html.
4. Doan, AnHai. Madhavan, Jayant. Domingos, Pedro. Halevy, Alon. "Learning to Map between Ontologies on the Semantic Web."  2002. Source: http://www.cs.washington.edu/homes/alon/site/files/glue.pdf .

...............................................................................

For more information on related topics visit the following related portals...
Analytics, Business Intelligence (BI), Data Integration, Enterprise Information Management and Semantic Web.

Joram Borenstein is director of marketing at Unicorn Solutions, working on enterprise semantic technologies. His previous experience includes managing the rollout of content management software platforms. He has written and lectured extensively on the Semantic Web, e-commerce, ontology modeling, Web services and grid technologies.

Russell Ruggiero is a senior IT analyst. He is the acting chairman of HumanMarkup.org. Ruggiero has authored more than 150 articles and reports for well-respected firms that include Gartner, Inc. and Source Media. He may be reached at rrugg55041@aol.com.



E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.