Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

Resource Portals
Business Intelligence
Business Performance Management
Data Integration
Data Quality
Data Warehousing Basics
EAI
EDM
EII
ETL
More Portals...

Advertisement

Information Center
DM Review Home
Conference & Expo
Web Seminars & Archives
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

General Resources
Bookstore
Industry Events Calendar
Vendor Listings
White Paper Library
Glossary
Software Demo Lab
Monthly Product Guides
Buyer's Guide

General Resources
About Us
Press Releases
Awards
Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

What Exactly Is a Data Model? Part 2

  Article published in DM Review Magazine
March 2003 Issue
 
  By David Hay

This is part two of a series of three articles that will address conceptual, logical and physical models, object models and data model views. The goal of this series is to provide a clear vision of how all these elements relate to each other.

Previously: The Three-Schema Architecture and Framework Views

The ANSI three-schema architecture (that we have expanded here to four schemata) and John Zachman's information architecture provide a good basis for understanding the nature of data models.1,2 The ANSI architecture describes the external schema that represents the way a business owner views a business. Data models of this view tend to consist of terms for concrete things actually seen and manipulated by the business people. (This is the "business owner's view" in John Zachman's Architecture Framework.) The most important thing to capture in representing this view is the vocabulary. The conceptual schema represents the fundamental, underlying structure of the organization. The conceptual data model is inferred from multiple external business owners' views and is much more rigorous in its representation (the "architect's view").3 The internal schema is really two: the logical schema which represents data in terms appropriate to a particular data manipulation approach as usually expressed in a database management system or DBMS (the "designer's view") and the internal schema which is concerned with the physical characteristics of storage on a storage device ("builder's view").

Last month's article described data models used to represent the business owners' views. This article concerns the conceptual data model created and used by the architect. Next month's article will deal with the designer's logical data model.

Row Three: Abstractions Seen by Architects (The Entity/Relationship Diagram)

The architect's job may be seen in terms of the Indian folk tale about the blind men and the elephant. You may recall how the man next to the side of the elephant thought it was much like a wall. The man who touched his tail, on the other hand, said it was really like a rope. The one who touched an ear said, "No, it is like a fan." The one who touched the elephant's leg claimed it was most like a tree, and the one touching the trunk felt a serpent. Each was adamant that his was the true and correct understanding of the nature of an elephant.

It is the architect's job to take such a set of disparate views of the business, held by various people, and produce a single, elephant-like, unified view. This involves both applying discipline to the ways things are represented and identifying the fundamental structures that are behind the tangible things people see. By convention, this kind of model is called an entity/relationship model. It is produced by the architect and is limited to binary, one-to-many relationships. In this case, a box represents an entity type - the definition of a fundamental thing of significance to the business. (The object-oriented world would name this a "class" and call the model an "object model." There is nothing wrong with this, provided it is understood that the model is, in fact, a conceptual one of the business and that a class represents a category of "business objects" - fundamental things of significance to the business-and not artifacts specific to object-oriented programming.)

It might seem that adding modeling constraints would limit what can be said by a model. In fact, such constraints actually reveal a great deal more about the underlying nature of things than can be understood without them. To render a three-way relationship as a series of binary relationships, for example, reveals at a more atomic level just what each component of the relationship is all about. Similarly, to render a many-to-many relationship as two relationships with an intersect entity type reveals much about the nature of each occurrence of that relationship. To insist that each identifier of an entity type completely identify all occurrences of that entity type clarifies just what the entity type means.

Business views are typically in terms of the here and now, where the here and now is often subject to significant change. By identifying underlying structures, an architect, on the other hand, is in a position to create a model that will survive changes in the particular way a business is run. That is, the architect looks at the world a little more abstractly than the business owner and sees more general categories. The architect's "things of significance" represent generalizations of the concrete things seen in the business owner's view. These conceptual entity types describe the fundamental structure of the organization. To be sure, this generalization should be done carefully so that the business owners can still recognize the concepts being presented, and it should not be taken to an extreme. Even so, it is important to do it to some degree.

Suppose, for example, that VENDOR, CUSTOMER and EMPLOYEE are presented and then the audience is asked, "Can an employee also be a customer? A vendor?" If the answer is "yes," then the participants will be ready for the idea that the underlying entity types are PERSON and ORGANIZATION, and the other concepts are merely roles played by PEOPLE and ORGANIZATIONS in their relationships to other entity types.

In other words, this process of trying to see the whole elephant requires considerable inductive thinking - to see the patterns of shapes that are common to all the views, even if they are not apparent to any of them.

It is not necessary (usually) to be so abstract as to have "THING" and "THING TYPE" (as some suppose), but the following are pretty standard:

  • PARTY, subtyped as PERSON and ORGANIZATION
  • SITE (or ADDRESS) and GEOGRAPHIC LOCATION
  • PRODUCT TYPE and PRODUCT
  • ACTIVITY TYPE and ACTIVITY (with optional WORK ORDER)
  • CONTRACT (or AGREEMENT or ORDER, etc.) 4

For a particular organization, some of these will be renamed and subtype structures will be added; however, if you start out with these concepts, you are well on the way to a solid, robust model.

In dealing with the difference between the business owners' views and the architect's view, it would be very nice if CASE tools provided the ability to define a data model "view," analogous to a SQL view. The business owner's perspective really is a view (in this sense) of the conceptual model. Such a tool would allow us, for example, to define a "virtual entity type," such as VENDOR and draw a box for it but behind the scenes to define it in terms of the conceptual model as "a PARTY that has at least one seller in relationship to a CONTRACT with us."5

Much of the information collected for a conceptual model cannot actually be represented on a diagram. This must be captured in the data model repository that supports the drawings. Indeed, this repository will become the kernel of the meta data repository that will support the applications built from this model.

It should be possible to include attributes in a conceptual model, although it is not necessary to do so when presenting the model initially. Indeed, attributes tend to clutter the model's visual appearance. The most important things to get across in a model feedback session are the entity types and relationships. With some audiences, however, including attributes can be helpful; and all attributes must be defined, with their data types (formats), at least by the end of the requirements analysis project. In the architect's conceptual data model, each attribute appears only once, associated with the entity type it describes.

Derived attributes are not only permitted in a conceptual data model, but they can be essential for describing the true nature of a set of entity types. They can be represented with a typographical flag (such as parentheses). The derivation is usually documented in the data model repository, although ORM and UML provide techniques for displaying formulae on a drawing.

One important component of a conceptual model is specification of the combinations of attributes and relationships that uniquely identify an occurrence of an entity type. Not all notations do this equally well. It is a second priority, but it is desirable. By the end of the analysis phase, all unique identifiers should be specified.

The final conceptual data model, then, should show:

Entity Types

  • Name
  • Definition

Attributes

  • Initially a few, but eventually all
  • Name
  • Definition
  • Optionality

Relationships

  • Names in both directions
  • Cardinality
  • Optionality

Unique Identifiers

Business Rules

According to The Business Rules Group, business rules come in four flavors: terms, facts, derivations and constraints (what they call "action assertions").6

We have seen that an entity/ relationship data model can handily describe terms, facts and derived facts. What it cannot adequately convey are constraints. These are more properly described as a deliverable for Column Six (Motivation) of the Architecture Framework. In fact, most data model notations do allow you to specify constraints for cardinality (an occurrence of this entity type may be related to no more than one occurrence of that entity type) and optionality (an occurrence of this entity type must be related to at least one occurrence of that entity type). Some notations (such as ORM and, to a lesser extent, UML) do allow some other constraints to be portrayed. For the most part, however, constraints must be described separately from the entity/relationship diagram.

Some would try to manipulate the entity/relationship diagram to represent business constraints; but this is a bad idea because invariably this manipulation reduces the robustness of the model. To the extent that constraints cannot be represented directly on the drawing, they must be captured in the data model repository by the time a conceptual data model is completed. These constraints must be described in detail (in terms of the entity types, attributes and relationships involved) in the repository.

Other Information Captured

In addition to constraints, behind the scenes the following must be captured in the data model repository:

  • Definitions of entities and attributes
  • Data type/format of each attribute
  • Derivation logic for each derived attribute
  • A domain for each attribute
  • Referential integrity rules

Because the entity/relationship diagram will be presented to business owners who initially will be unfamiliar with data modeling or many of the concepts presented, aesthetics is very important. The notation chosen should be as clean, concise and uncluttered as possible.

Notations

For the conceptual model, your author is partial to Richard Barker's entity/relationship notation that is part of the Structured Systems Analysis and Design Methodology (SSADM) methodology used in Europe which is promoted by Oracle Corporation. It is the most concise notation of all, with intuitive graphics. Also suitable is Clive Finkelstein's and James Martin's information engineering notation. A subset of UML can also be used. Object Role Modeling can be used, but it takes a somewhat different approach.7

Figure 1 shows the different views of data modeling seen by the architect and the characteristics of each.


Figure 1: Comparing the Different Views

Next month: The logical data models for relational and object-oriented environments.

References:
1. Tsichritzis, D.a.D. and A. C. Klug. "The ANSI/X3/SPARC DBMS Framework Report of the Study Group on Dabatase Management Systems." Information Systems. 3(3). 1978. p. 176- 191.
2. Zachman, John. "A Framework for Information Architecture." IBM Systems Journal, Vol. 26, No. 3. (IBM Publication G321-5298). See also http://www.essentialstrategies.com/publications/methodology/zach man.htm.
3. Zachman and Hay mean basically the same thing but use different terms to identify some of the rows. For a discussion of these differences, see: Hay, David C. Requirements Analysis. Upper Saddle River: Prentice Hall, 2003. pp. 5-6.
4. For more information on what patterns are available, see: Hay, David C. Data Model Patterns: Conventions of Thought. New York: Dorset House, 1995.
5. For more on data model views, see: Hay, David C. "Visualizing Database Structures," Database Programming and Design. Vol. 7, No. 6. June 1994.
6. The Business Rules Group. "Business Rules: What Are They Really?" 1995. Available at http://www.businessrulesgroup.org.
7. ORM is a completely different topic. For more information, see: Halpin, Terry. Information Modeling and Relational Databases. San Francisco: Morgan Kaufman Publishers, 2001.

...............................................................................

For more information on related topics visit the following related portals...
Data Modeling.

David Hay has been producing data models to support strategic and requirements planning since the mid-1980s. He is the founder and president of Essential Strategies, Inc., a consulting firm dedicated to helping clients define corporate information architecture, identify system requirements and plan strategies for the implementation of new systems. Hay is the author of Data Model Patterns: Conventions of Thought and Requirements Analysis: From Business Views to Architecture. He may be reached at dmr@essentialstrategies.com or by phone at (713) 464-8316.

Solutions Marketplace
Provided by IndustryBrains

SAP Software Migration for Customers
If your current applications are at risk, SAP Safe Passage provides a clear roadmap for solution migration with maintenance support & integration technology. View free demos now!

Design Databases with ER/Studio ? Download Now!
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Help Desk Software Co-Winners HelpSTAR and Remedy
Help Desk Technology's HelpSTAR and BMC Remedy have been declared co-winners in Windows IT Pro Readers' Choice Awards for 2004. Discover proven help desk best practices right out of the box.

Dedicated Server Hosting: High Speed, Low Cost
Outsource your web site and application hosting to ServePath, the largest dedicated server specialist on the West Coast. Enjoy better reliability and performance with our screaming-fast network and 99.999% uptime guarantee. Custom built in 24 hours.

Get SAP Technologies Training on DVD
For the first time ever, access SAP Technologies Training at your convenience with the TechEd '04 DVD. Each package includes 100s of hours of SAP training lectures & hands-on workshops.

Click here to advertise in this space


View Full Issue View Full Magazine Issue
E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.