Published in DM Review in March 2004.|
Printed from DMReview.com
Meta Data & Knowledge Management: Iterative and Narrative Data: Common Ground?by David Marco and by William J. Lewis
David wishes to thank Bill Lewis, a principal at EWSolutions, for his invaluable contribution to this month's column.
This month's column introduces the term iterative data for what is commonly labeled "structured data" and the term narrative data for what is commonly referred to as "unstructured data." The reasoning behind this terminological distinction is that whereas iterative data forms a record of iterative events, narrative data tells a story. New techniques and technologies for integrating narrative and iterative data are beginning to present significant opportunities for many types of enterprise applications.
In this column, we'll examine the iterative/narrative divide from two different perspectives. At a high level, we'll look at how these types of data have historically been managed by divergent software applications. At a more detailed "data model" level, we'll begin to examine the fundamental differences and similarities between iterative and narrative data.
Iterative and Narrative Data Applications
For better or worse, the form of an enterprise's data ? and its corresponding meta data ? is usually tightly coupled with the type of software in which it is implemented. If data requirements are addressed by technology specific to iterative data, the data is probably relational and the meta data is likely in the RDBMS (relational database management system) catalog, and perhaps in an ERwin model and/or meta data repository as well. On the other hand, narrative data and its corresponding meta data is likely to be bound in document management, content management, text management or knowledge management software applications.
As a consequence, the data architecture of almost every enterprise features an iterative/narrative data divide, rather than supporting integrated narrative/iterative data. The most common architecture, with some example software applications, resembles the stovepipe configuration shown in the Figure 1.
Most of today's business applications (enterprise resource planning, human resources, financial, customer relationship management, etc.) are operational applications for iterative data. Most companies, large and small, also make use of analytic applications for iterative data ? everything from Excel to enterprise tools from vendors such as Business Objects and MicroStrategy.
Many operational applications for narrative data began in a more specialized market ? mostly in industries such as engineering, manufacturing and publishing ? that require rigorous management of narrative documentation. Operational applications for narrative data typically support functions for document origination, editing, approval, versioning, distribution and access control. More common operational applications for narrative data include collaboration applications such as e-mail.
In contrast, analytic applications for narrative data are relative newcomers. Examples of these include visualization tools that can render compelling graphical representations of the results of text mining.
That's the high-level, application portfolio management perspective. If we dig down to the "content" itself, three differences ? and one crucially significant similarity ? are eventually revealed.
Iterative/Narrative Data: Differences and Similarity
The most significant and obvious dissimilarity between these types of data is that the "class-instance" pattern, dutifully adhered to by iterative data, breaks down with narrative data. An instance in an iterative data set (e.g., a row) is very specifically "about" a single thing ? that is, it is a representation of a single thing in the real world. On the other hand, a narrative instance (e.g., a document) is usually about ? or represents multiple things. Even worse, what it is about is often ambiguous, varying by the observer.
The second characteristic distinguishing narrative data from iterative is the preponderance of "connecting content." This content makes a narrative readable by connecting the "real" content that we'll discuss momentarily.
The final distinction between iterative and narrative data is that of order. An iterative data set such as a relational table is ideally (and, some would assert, by definition) an unordered set. The order of the columns in a row is insignificant as well. On the other hand, a document has a narrative order ? a beginning, middle and end. If you remove the order from a narrative instance "shred" or "normalize" it to extract and group its contained assertions by class ? it is essentially destroyed. (Think of trying to read a book backwards.) As a result, a narrative instance, taken as-is, yields easily to only one type of retrieval: sequential. Its order provides a crucial difference between the totality of a narrative and the sum of its parts.
What then, if anything, do these data types actually have in common? The good news is that both iterative and narrative data sets contain assertions: declarations of facts. The differences are "only" in how the assertions are treated.
In an iterative data collection (e.g., a relation) all assertions:
In contrast, assertions contained in a narrative data collection:
The underlying assertion (pun intended) of this column is that a common syntax can indeed be derived for assertions, whether arranged iteratively or narratively, and that meta data is the crucial enabler for this derivation. However, before this can happen, common ground must first be found for disparate meta data.
David Marco is an internationally recognized expert in the fields of enterprise architecture, data warehousing and business intelligence and is the world's foremost authority on meta data. He is the author of Universal Meta Data Models (Wiley, 2004) and Building and Managing the Meta Data Repository: A Full Life-Cycle Guide (Wiley, 2000). Marco has taught at the University of Chicago and DePaul University, and in 2004 he was selected to the prestigious Crain's Chicago Business "Top 40 Under 40." He is the founder and president of Enterprise Warehousing Solutions, Inc., a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using data warehousing and meta data repository technologies. He may be reached at (866) EWS-1100 or via e-mail at DMarco@EWSolutions.com.Bill Lewis is a principal consultant with Enterprise Warehousing Solutions. Lewis' 20-plus years of information technology experience span the financial services, energy, healthcare, software and consulting industries. In addition to his current specializations in data management, meta data management and business intelligence, he has been a leading-edge practitioner and thought-leader on topics ranging from software development tools to IT architecture. His book, Data Warehousing and E-Commerce, is available at online and brick-and-mortar booksellers. Lewis can be reached at email@example.com.
Copyright 2004, Thomson Media and DM Review.