|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
What Exactly Is a Data Model? Part 2
This is part two of a series of three articles that will address conceptual, logical and physical models, object models and data model views. The goal of this series is to provide a clear vision of how all these elements relate to each other.
Previously: The Three-Schema Architecture and Framework Views
The ANSI three-schema architecture (that we have expanded here to four schemata) and John Zachman's information architecture provide a good basis for understanding the nature of data models.1,2 The ANSI architecture describes the external schema that represents the way a business owner views a business. Data models of this view tend to consist of terms for concrete things actually seen and manipulated by the business people. (This is the "business owner's view" in John Zachman's Architecture Framework.) The most important thing to capture in representing this view is the vocabulary. The conceptual schema represents the fundamental, underlying structure of the organization. The conceptual data model is inferred from multiple external business owners' views and is much more rigorous in its representation (the "architect's view").3 The internal schema is really two: the logical schema which represents data in terms appropriate to a particular data manipulation approach as usually expressed in a database management system or DBMS (the "designer's view") and the internal schema which is concerned with the physical characteristics of storage on a storage device ("builder's view").
Last month's article described data models used to represent the business owners' views. This article concerns the conceptual data model created and used by the architect. Next month's article will deal with the designer's logical data model.
Row Three: Abstractions Seen by Architects (The Entity/Relationship Diagram)
The architect's job may be seen in terms of the Indian folk tale about the blind men and the elephant. You may recall how the man next to the side of the elephant thought it was much like a wall. The man who touched his tail, on the other hand, said it was really like a rope. The one who touched an ear said, "No, it is like a fan." The one who touched the elephant's leg claimed it was most like a tree, and the one touching the trunk felt a serpent. Each was adamant that his was the true and correct understanding of the nature of an elephant.
It is the architect's job to take such a set of disparate views of the business, held by various people, and produce a single, elephant-like, unified view. This involves both applying discipline to the ways things are represented and identifying the fundamental structures that are behind the tangible things people see. By convention, this kind of model is called an entity/relationship model. It is produced by the architect and is limited to binary, one-to-many relationships. In this case, a box represents an entity type - the definition of a fundamental thing of significance to the business. (The object-oriented world would name this a "class" and call the model an "object model." There is nothing wrong with this, provided it is understood that the model is, in fact, a conceptual one of the business and that a class represents a category of "business objects" - fundamental things of significance to the business-and not artifacts specific to object-oriented programming.)
It might seem that adding modeling constraints would limit what can be said by a model. In fact, such constraints actually reveal a great deal more about the underlying nature of things than can be understood without them. To render a three-way relationship as a series of binary relationships, for example, reveals at a more atomic level just what each component of the relationship is all about. Similarly, to render a many-to-many relationship as two relationships with an intersect entity type reveals much about the nature of each occurrence of that relationship. To insist that each identifier of an entity type completely identify all occurrences of that entity type clarifies just what the entity type means.
Business views are typically in terms of the here and now, where the here and now is often subject to significant change. By identifying underlying structures, an architect, on the other hand, is in a position to create a model that will survive changes in the particular way a business is run. That is, the architect looks at the world a little more abstractly than the business owner and sees more general categories. The architect's "things of significance" represent generalizations of the concrete things seen in the business owner's view. These conceptual entity types describe the fundamental structure of the organization. To be sure, this generalization should be done carefully so that the business owners can still recognize the concepts being presented, and it should not be taken to an extreme. Even so, it is important to do it to some degree.
Suppose, for example, that VENDOR, CUSTOMER and EMPLOYEE are presented and then the audience is asked, "Can an employee also be a customer? A vendor?" If the answer is "yes," then the participants will be ready for the idea that the underlying entity types are PERSON and ORGANIZATION, and the other concepts are merely roles played by PEOPLE and ORGANIZATIONS in their relationships to other entity types.
In other words, this process of trying to see the whole elephant requires considerable inductive thinking - to see the patterns of shapes that are common to all the views, even if they are not apparent to any of them.
It is not necessary (usually) to be so abstract as to have "THING" and "THING TYPE" (as some suppose), but the following are pretty standard:
For a particular organization, some of these will be renamed and subtype structures will be added; however, if you start out with these concepts, you are well on the way to a solid, robust model.
In dealing with the difference between the business owners' views and the architect's view, it would be very nice if CASE tools provided the ability to define a data model "view," analogous to a SQL view. The business owner's perspective really is a view (in this sense) of the conceptual model. Such a tool would allow us, for example, to define a "virtual entity type," such as VENDOR and draw a box for it but behind the scenes to define it in terms of the conceptual model as "a PARTY that has at least one seller in relationship to a CONTRACT with us."5
Much of the information collected for a conceptual model cannot actually be represented on a diagram. This must be captured in the data model repository that supports the drawings. Indeed, this repository will become the kernel of the meta data repository that will support the applications built from this model.
It should be possible to include attributes in a conceptual model, although it is not necessary to do so when presenting the model initially. Indeed, attributes tend to clutter the model's visual appearance. The most important things to get across in a model feedback session are the entity types and relationships. With some audiences, however, including attributes can be helpful; and all attributes must be defined, with their data types (formats), at least by the end of the requirements analysis project. In the architect's conceptual data model, each attribute appears only once, associated with the entity type it describes.
Derived attributes are not only permitted in a conceptual data model, but they can be essential for describing the true nature of a set of entity types. They can be represented with a typographical flag (such as parentheses). The derivation is usually documented in the data model repository, although ORM and UML provide techniques for displaying formulae on a drawing.
One important component of a conceptual model is specification of the combinations of attributes and relationships that uniquely identify an occurrence of an entity type. Not all notations do this equally well. It is a second priority, but it is desirable. By the end of the analysis phase, all unique identifiers should be specified.
The final conceptual data model, then, should show:
According to The Business Rules Group, business rules come in four flavors: terms, facts, derivations and constraints (what they call "action assertions").6
We have seen that an entity/ relationship data model can handily describe terms, facts and derived facts. What it cannot adequately convey are constraints. These are more properly described as a deliverable for Column Six (Motivation) of the Architecture Framework. In fact, most data model notations do allow you to specify constraints for cardinality (an occurrence of this entity type may be related to no more than one occurrence of that entity type) and optionality (an occurrence of this entity type must be related to at least one occurrence of that entity type). Some notations (such as ORM and, to a lesser extent, UML) do allow some other constraints to be portrayed. For the most part, however, constraints must be described separately from the entity/relationship diagram.
Some would try to manipulate the entity/relationship diagram to represent business constraints; but this is a bad idea because invariably this manipulation reduces the robustness of the model. To the extent that constraints cannot be represented directly on the drawing, they must be captured in the data model repository by the time a conceptual data model is completed. These constraints must be described in detail (in terms of the entity types, attributes and relationships involved) in the repository.
Other Information Captured
In addition to constraints, behind the scenes the following must be captured in the data model repository:
Because the entity/relationship diagram will be presented to business owners who initially will be unfamiliar with data modeling or many of the concepts presented, aesthetics is very important. The notation chosen should be as clean, concise and uncluttered as possible.
For the conceptual model, your author is partial to Richard Barker's entity/relationship notation that is part of the Structured Systems Analysis and Design Methodology (SSADM) methodology used in Europe which is promoted by Oracle Corporation. It is the most concise notation of all, with intuitive graphics. Also suitable is Clive Finkelstein's and James Martin's information engineering notation. A subset of UML can also be used. Object Role Modeling can be used, but it takes a somewhat different approach.7
Figure 1 shows the different views of data modeling seen by the architect and the characteristics of each.
Next month: The logical data models for relational and object-oriented environments.
For more information on related topics visit the following related portals...
David Hay has been producing data models to support strategic and requirements planning since the mid-1980s. He is the founder and president of Essential Strategies, Inc., a consulting firm dedicated to helping clients define corporate information architecture, identify system requirements and plan strategies for the implementation of new systems. Hay is the author of Data Model Patterns: Conventions of Thought and Requirements Analysis: From Business Views to Architecture. He may be reached at email@example.com or by phone at (713) 464-8316.