FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Time and Time Again: Managing Time in Relational Databases, Part 10 - Foreign Keys in Version Tables

In trying to explain versioning to other data modelers and DBAs, we have found that many of them are worried about the cascade update implications for versioned tables. In general, the concern is that if a new version of an object is inserted into a version table, then all foreign key references to the version it replaced must be updated to point to the new version. Furthermore, if those updates take place in tables which are themselves versioned tables, then those updates create additional new versions, and all foreign keys that point to them must also be updated. And so a single versioned update can cascade out to affect many other versioned tables, ending only when a cascade chain reaches either a) a non-versioned table or b) a versioned table which does not itself contain any foreign keys.

These concerned modelers and DBAs correctly point out that this cascade effect can make versioned updates very expensive. Furthermore, if there are any circular references, the cascade will become an infinite loop. For example, consider this pair of tables: a Client Version table and our Policy Version table. The Policy Version table contains a foreign key to the Client Version table, and each row in the Client Version table contains a count of the total number of versioned updates across all policies for that client. The infinite loop happens like this:

  1. A new version of a policy is inserted. Call the old version P12 and the new version P13.
  2. This triggers code that updates the policy version count in the related client, resulting in a new version of that client. Call the old version C25 and the new version C26.
  3. At this point, we must cascade update the policy version. P13's foreign key still points to client version C25, and it must be updated to point to C26. This creates a new version of that policy, P14.
  4. But now the trigger executes once again, updating the policy version count in the related client. This loops us back to step 2, where we update that count, creating client version C27.
  5. But once again, we must cascade update the policy version. P14's client foreign key still points to C26. This update creates a new version of that policy, whose client foreign key points to C27. This new version of the policy is P15.
  6. And so on, and so on, to either the last syllable of recorded time, or to a frustrated machine operator who cancels the process.

Also, notice that so far, we have talked only about cascade updates. There are also, of course, cascade deletes. As we will see later on, the semantic constraints surrounding cascade deletes in versioned tables are not identical to those for cascade deletes of non-versioned tables.

But for now, we will concentrate on the cascade update problem.

Chart of Version Patterns

Before continuing, we insert here the current chart of version patterns. References back to earlier patterns are happening pretty frequently now, and we hope this will help the reader recall what those prior version patterns are.

Figure 1: Chart of Version Patterns

Chart of This Series

Before continuing, we insert here a chart listing the installments to date in this series and a brief description of what each one is about. As this series grows longer, the back references to earlier installments will only increase, and we think that having a chart like this at hand will be helpful.

Figure 2: Chart of Installments to Date in this Series

What to Do About the Cascade Update Problem for Versioned Tables

There are only three possible responses to the versioned table cascade update problem (or to any problem). They are: a) ignore the problem, b) avoid it or c) fix it.

Ignore it. The first response is to ignore the problem. If we take this approach, it means that in many if not most cases, a single versioned update will trigger any number of additional versioned updates. Perhaps the attitude about multiple updates is: "DASD is cheap; CPU cycles are cheap. So just let it happen." But then what about the possibility of an infinite loop? Perhaps the attitude about infinite loops is: "Design your databases so there aren't any loops."

In some specific cases, this may be an acceptable response. But if our objective is to develop an enterprise solution (which, indeed, is our objective), then this response just won't do. We need a response that works in all possible cases, not just in special individual cases. We can't just ignore the issue of cascade updates in versioned tables.

Avoid it. The second response is to avoid the problem. This means to turn off referential integrity checking in the DBMS (which would stop the loop in the client to policy direction). If we can accept cascade updating, and are only concerned to avoid the infinite loop, an alternative is to turn off the developer-written code which updates counts (which would stop the loop in the policy to client direction).

Avoiding the problem is not as cavalier as it sounds. Standards committees are at loggerheads over how to introduce temporal management functions into relational DBMSs, and as a result, DBMS vendors are stymied. Consequently, there is no adequate support for temporal database management in current relational DBMSs. So we shouldn't be surprised that there is also no DBMS-implemented referential integrity that works for temporal (versioned) tables. Until there is such support, avoiding the problem by turning off referential integrity may be a very reasonable response.

But isn't the proper conclusion, given DBMS shortcomings, to implement referential integrity with our own code? Isn't the proper conclusion to fix it, not to avoid it?

That depends, of course, on a comparison of the costs and benefits of the two responses. Specifically:

  • What costs do we incur if we fix the problem? We may assume that these are all IT costs, not business costs. The answer is that we incur the development and maintenance costs that are part of any IT software artifact.
  • What benefits do we garner if we fix the problem? We may assume that these are all business benefits, not IT benefits. The answer is that a) we can use DBMS-supplied referential integrity, and thereby gain the assurance that RI errors will not occur; and b) by updating foreign keys whenever referenced rows are replaced by newer versions, the relationships involving versioned tables will remain current.
  • What costs do we incur if we avoid the problem? We may assume that these are all business costs, not IT costs. The answer is that a) since we cannot use DBMS-supplied referential integrity, we must therefore write our own code to guarantee that integrity; and b) since relationships among versioned tables are not updated if we avoid the problem, such relationships gradually become more and more "out of date" as additional versions of a foreign-key referenced object are created.
  • We may assume that there are no benefits to avoiding the problem (other than cost avoidance).

Fix it. The third response is to fix the problem. To fix any problem, we need to start with a correct diagnosis. We need to identify the root cause, the thing which, if fixed, will eliminate the problem. We suggest that the root cause of the problem is the instability of foreign-key references when those foreign keys point to versioned tables.

With "normal" (non-versioned) tables, cascade updates are rare because key changes are rare. If we were dealing with a Client and a Policy table - instead of with a Client Version and Policy Version table - then nothing we have considered so far would require a cascade update. But because we are dealing with foreign key-related version tables, every new version in the referenced table (Client Version) requires a cascade update in the referencing table (Policy Version). Because we have chosen to create a new version in the referenced table every time a new version is added in the referencing table (by incrementing a count), we loop back and create an infinite cascade update problem.

We can fix the infinite loop problem quite easily. In the case of the example we are working with, the infinite loop will be broken if a versioned update in the referencing Policy Version table does not cause a new version to be created in the referenced Client Version table. That means that we either don't keep a count of policy versions in the client, or else that updating that count is an overwrite type of update, not one that requires a new version to be created.

Additional discussion of the infinite loop problem, and the fix for it, might provide additional insight into the root cause of the overall cascade update problem. But we prefer to go after that additional insight by turning to the other part of the problem, the cascade update from "parent" to "child," from the referenced table to the referencing table.

A cascade update happens when a primary key changes, and there exist one or more foreign keys which reference it. So to fix a cascade update problem (not ignore it and not avoid it), we must either a) eliminate primary keys, b) eliminate foreign keys or c) stabilize the link between them.

We can rule out the first two options immediately. They amount to conceding that the relational model cannot handle versions and, more generally, cannot handle time. While it might be interesting to pursue this concession, we have neither the time nor the inclination to do so here.

But what does the third fix mean? What does it mean to "stabilize" the link between a primary and a foreign key? Let's start by looking at an example of these keys.

Figure 3: The Client Version and Policy Version Tables

In both tables, the primary key consists of two columns. The first column is an identifier for the individual object, either a client or a policy. The second column, a date, distinguishes and sequences the versions of each object. No two versions of the same object can have the same begin date.

Note: in all examples up to this point, the foreign key in the Policy Version table consists of only a single column - client number. Now that we are considering an issue involving foreign keys to versioned tables, we have to drop that simplification and show the full foreign key.

What Does It Mean?

We said earlier that to understand how avoiding the problem might work, we need to answer to the question, "What do foreign keys relate when they relate versioned tables?" The same thing is true for the "fix it" option. To understand how to fix the problem, we need to answer the same question.

Next time, in Part 11, we will see that the "avoid it" and "fix it" options are based on two different answers to the question, "What do foreign keys relate when they relate versioned tables?" Since different answers are being given, the different responses based on those answers will very likely entail different semantics supported by the answer. It is a cost/benefit analysis of these different semantics which will determine whether we should continue using an "avoid it" or a "fix it" response.

Editor's note: Watch for Part 11 in the October 5, 2007 issue of DM Direct.


Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.

For more information on related topics, visit the following channels:



Industry Vendors