Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search

Business Intelligence
Corporate Performance Management
Data Management
Data Modeling
Data Quality
Data Warehousing Basics
Master Data Management
View all Portals

Scheduled Events

White Paper Library
Research Papers



DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Management Association (DAMA):
The Next Data Management Frontier: Unstructured Data

online columnist Data Management Association International - DAMAI     Column published in DMReview.com
June 22, 2006
  By Data Management Association International - DAMAI

This month's column is contributed by Patricia Cupoli.

At the April 2006 Wilshire Meta Data Conference/DAMA International Symposium, there were a number of presentations that dealt with metadata, ontologies (organization of knowledge and terms), semantics, controlled vocabularies and taxonomy/classification. You may ask why these topics typically associated with library and information science, document management, content management and knowledge management were presented - they do not seem typical for the data management professional. However, these types of presentations have been showing up more and more in the last several years.

Data management professionals are becoming more and more involved with a data area that is called unstructured data. This term includes objects in both hard and soft media such as emails, all types of text documents, graphic images, videos and Internet Web pages. These items cannot be stored in a database or spreadsheet columns and rows, but can be stored in a relational DBMS BLOB (binary large object) or in XML files. Yet most of the unstructured data has some type of structure (also known as semistructured data) which could provide metadata in adherence to a standard such as the Dublin Core (15 metadata elements in total to include title, author, description, etc.). This metadata could be stored in a relational database even if the object content is not in electronic format.

Why is unstructured data important to a company? It has been estimated that at least 80 percent of a company's data is unstructured and not easily accessible or found. In this age of Sarbanes-Oxley and other regulations, the overwhelming amount of unmanaged, unstructured data could increase a company's exposure. Business users want to browse and search across all types of data for such opportunities as understanding customer issues. Management often does not have the ability to make decisions based on analysis of both structured and unstructured data if unstructured data is not integrated into a data warehouse/business intelligence environment.

This growing area of data needs to be managed as a corporate asset to provide value. It has to be identified, captured, organized, and made accessible and sharable. These management processes should sound familiar to data management. This organization deals with the structured data world through the development/maintenance of data model structures and metadata associated with data models that give meaning and vocabulary, and has best practices of data standards and a governance process with data stewards. One structured data concept (e.g., employee entity) can have many expressions or types (e.g., management or staff, active or retired, etc.) that describe it.

Unstructured data deals with content semantics where one expression (e.g., foot) can have many different concepts associated with it (e.g., unit of measurement, part of a human or animal leg below the ankle joint, or the lower part of anything). A controlled vocabulary organizes content through a selected list of words and phrases used to tag units of information (either automatically or manually) so that they may be more easily retrieved by a search. There is usually a governance structure to keep the various types of controlled vocabularies current. The different types include the following:

  • list of equivalence relationships or synonyms (e.g., cat and feline, baby and infant, student and pupil);
  • taxonomy that shows hierarchical relationships of subject and topic metadata;
  • thesaurus that shows equivalence (synonym list), hierarchical (taxonomy), and associative (related terms) relationships; and
  • ontology that represents a collection of taxonomies and thesauri for knowledge representation.

Where should data management start with unstructured data? Most likely, there are other organizational groups in your company such as content or knowledge management, libraries, records management, or document management that a data management organization could collaborate with to raise awareness of the criticality of managing and integrating unstructured data for accessibility. There can be synergy between data management and these other organizations with regard to values for reference data and data architectures, metadata creation and definition, metadata topics for taxonomies, use of newer technologies that can handle all types of data, and governance (it may be the same subject matter experts) at both the enterprise and project (requirements gathering) levels. It is the integration of structured and unstructured data that is a challenge, especially if the unstructured data is in paper or other media. Eventually, the techniques of structured data management and data integration will converge with the techniques of the unstructured data world to help businesses overcome this challenge.

Patricia Cupoli, CCP, CDMP, CBIP, is the DAMA International ICCP Liaison, the DAMAi Project Manager for the Data Exam Development, ICCP Board President, and a past president of DAMA International, DAMA Chicago, and DAMA Philadelphia / Delaware Valley. She is the recipient of the 2006 DAMA International Professional Award. She may be reached at ICCP_Liaison@DAMA.org.


For more information on related topics visit the following related portals...
DAMA and Unstructured Data.

The Data Management Association International (DAMA International) is a global not-for-profit, vendor-independent association of data and information resource management professionals with chapters and members around the world. DAMA International is dedicated to advancing the concepts and practices of data and information resource management. Its primary purpose is to promote the understanding, development and practice of managing data and information as key enterprise resources. DAMA International produces premier Symposiums for data and information management professionals in the U.S., the UK and Australia. For more information visit www.dama.org.

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.