FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Publisher reserves the right to serve qualified requesters only.

Time and Time Again: Managing Time in Relational Databases, Part 11 - Foreign Keys in Version Tables Continued

In Part 10, we presented a concern that many modelers and database administrators (DBAs) have expressed about what happens to foreign keys when a new version of an object is created. We said that "in general, the concern is that if a new version of an object is inserted into a version table, then all foreign key references to the version it replaced must be updated to point to the new version. Furthermore, if those updates take place in tables which are themselves versioned tables, then those updates create additional new versions and all foreign keys that point to them must also be updated. And so a single versioned update can cascade out to affect many other versioned tables, ending only when a cascade chain reaches either a) a non-versioned table, or b) a versioned table which does not itself contain any foreign keys."

 

We also noted that there is a potential cascade problem when an object is deleted. Note two key phrases here: a) "when a version is replaced," and b) "when an object is deleted." Versions are not updated; versions are replaced. It is objects which are updated in the process of one version replacing a previous one. Also, versions are not deleted; objects are deleted by placing a deletion indicator on the current version of the object, and also on any future versions on the database at the time of the deletion.

 

Chart of Version Patterns

 

Before continuing, we insert here the current chart of version patterns. References back to earlier patterns are happening pretty frequently now, and we hope this will help the reader recall what those prior version patterns are.

 


Figure 1: Chart of Version Patterns

 

Chart of This Series

 

Before continuing, we insert here a chart listing the installments to date in this series, and a brief description of what each one is about. As this series grows longer, the back references to earlier installments will only increase, and we think that having a chart like this at hand will be helpful.

 


Figure 2: Chart of Installments to Date in this Series

 

What Do Foreign Keys Relate? The Issue

 

In non-versioned tables, foreign keys relate objects. Each row in a Client table represents a client. Each row in a Policy table represents a policy. What a foreign key from a Policy table back to a Client table does can be expressed in two different ways. First, we can say that it relates the row representing that policy, which contains the foreign key, to the row representing the client for that policy. Secondly, we can equally well say that it relates that policy to that client.

 

The first way of saying what the foreign key does refers to the data; the second way refers to the things which that data represents. The first way, from the point of view of data management, is physical; the second way is logical or, more precisely, semantic.

 

In non-versioned tables, there is a one-to-one pairing of things we are interested in (clients, policies) with data that represents them (rows in tables). Because the mapping between what is represented and its representation is one-to-one, we don't need to pay much attention to the distinction. We can talk about what foreign keys relate in terms of either objects or rows.

 

Seldom dealing with versioned tables, modelers and DBAs are usually not very concerned with the difference between data and semantics, between tables, rows and columns, and the types, instances and properties that they represent. But when we are dealing with versioned tables, the one-to-one mapping breaks down. It is only when there is exactly one row, representing the initial version of an object, that the mapping appears to be one to one. After that, it is one object to many versions of that object.

 

However, a more careful reading of the situation would make it clear that, even in this case of a single version of an object, there is not a one-to-one mapping between object and row. For a row which represents a version of an object, even when there is only one version of that object being represented, does not represent the object. It represents a delimited section of the timeline of that object. Each version explicitly says, "I represent this object as it was, beginning on this specific date." Then (in the "ultimate" version pattern we are leading up to), the version also says either "And I no longer represent this object after this specific end date", or else it says "I will continue to represent this object 'until further notice'" (which, as we shall explain later, is not the same thing as saying "I represent this object until the end of time").

 

Because of this breakdown of a one-to-one mapping between an object in the world and a row in a table, talk about relating objects and talk about relating rows are not, as they are in the case of non-versioned tables, effectively equivalent. Let's consider a more typical situation, say, one client and five versions for that object in the Client Version table, and one policy related to that client, and three versions for that object, in the Policy Version table. Here, it is clear that the one-to-one mapping has broken down, both for clients and for policies. And here we will have to make a choice between the two ways of speaking. Do foreign keys relate policies to clients, or do they relate policy versions to client versions?

 

What Do Foreign Keys Relate? One Answer

 

Those modelers and DBAs who are concerned about the foreign key problem in versioned tables assume that foreign keys relate rows which represent versions. A foreign key from a Policy Version table back to a Client Version table relates one row to another, a policy version to its associated client version.

 

If we want a justification for the assumption that foreign keys in versioned tables relate versions to versions, it might go like this: we understand that the business is interested in clients and policies, and which go with which. But we modelers and DBAs serve the business by managing data. If the business analyst has gotten the requirements right, and if we have expressed them correctly, then foreign keys relating rows to rows will accurately reflect the real-world relationships between objects and objects.

 

An even shorter justification goes like this: the DBMS doesn't know anything about what's "out there," about what the data it manages represents. So there is no way we can ask the DBMS to relate objects; all it knows how to do is to relate rows to rows, and foreign keys are the method it uses to do that. Versioned tables or non-versioned tables, it's all the same to the DBMS. Foreign keys relate rows to rows.

 

But if we proceed on the assumption that foreign keys in versioned tables relate rows to rows, and thus versions to versions, we can easily find ourselves dealing with the cascade update problem described in Part 10. Specifically, whenever a new version is created for a client, then all the policies whose foreign keys pointed back to the original client version must be updated to point to the new client version. If this update is a versionable update, it will create a new version of the policy. Then, any versions anywhere else across the database that pointed to the policy which just received a new version will have to be updated to point to that new version. And so on, and so on.

 

In Part 10, we concluded that we could not ignore this problem. The reason we can't ignore the problem is that we are looking for an enterprise solution, one that will apply to all sets of version tables. It may be that in databases that contain only a few versioned tables, and/or databases whose versioned tables change very infrequently, the performance penalty of cascading updates might be manageable. But with an enterprise solution, we cannot count on that. With an enterprise solution, some databases may be crippled by the performance demands of frequently updated version tables with lengthy cascade chains.

 

So, on the assumption that we are relating versions to versions, we must somehow either fix the problem or avoid it. How can we do that? As we said in Part 10, we could avoid the problem by turning off RI. But DBAs just don't like to turn off RI. That opens the door to foreign keys that don't point to anything. So, is there a way to fix the problem rather than avoid it?

 

There are two ways to fix the problem. One way is to make foreign key updates non-versionable updates, i.e., updates in place, updates which overwrite the foreign key but which do not produce new versions in the process. The second way is to not update foreign keys among versions. In either case, RI among versions can remain on, but cascade updates are prevented.

 

Let's consider each solution in turn.

 

Solution 1. Make Foreign Key Updates Non-Versionable.

 

Figure 1 below shows the example we introduced in Part 10. Both the Client and Policy tables are versioned tables. However, the inclusion of policy-type as a column of the Client Version table, back in Part 10, was a mistake. But it turns out to be one we will need as this analysis continues. So to distinguish it from the policy-type column on the Policy Version table, let's assume that policy-type on client versions means something like "the type of policy which the client's sales representative most recently tried to sell to the client." So let's rename it "upsell-policy-type."

 


Figure 1: The Client Version and Policy Version Tables

 

Let's assume that a new version of client C882 has just been created. This is shown in Figure 2 below. We must now update all foreign keys that pointed to [C882, 5/23/04], and make them point to [C882, 10/01/04]. And if we make this a non-versionable change, then there is no cascade problem. Instead, the result is shown in Figure 3.

 


Figure 2: Client C882 is Versioned

 

There is no cascade problem because, as Figure 3 shows, the foreign key change in the Policy Version table was made as a physical update to C882's current policy version. No new version of that policy was created.

 


Figure 3: Client C882's Policies are Updated

 

This is one way that we can keep foreign keys pointing to the current versions of the objects they reference, and at the same time avoid the cascade update problem. But notice the drawback to this solution: it loses information. The specific information that is lost is that, from 6/01/04 to 10/01/04, policy version [P138, 6/01/04] pointed to client version [C882, 5/23/04]. During that period of time, a join from that policy version to its related client version would have shown an upsell policy type of PPO. This is the information that is lost.

 

Why is it lost? Because we physically updated the foreign key in the policy version. We did that in order to avoid the cascade update problem. But why does the foreign key need to be updated? The answer we get to this question, from the modelers and DBAs we have spoken to, is that joins should always join to the most current version of the object referenced. After all, with non-versioned tables, foreign keys point to the most current version – which also happens to be the only version.

 

In Part 12, we will continue this line of questioning, which will lead us to consider the second solution to the cascade update problem, which is: don't update foreign keys that point to versions. We will then conclude our discussion of foreign keys and versioned tables by considering the other answer to the question, "What do foreign keys relate?” that answer being, "Objects, not versions."

 


Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.

For more information on related topics, visit the following channels:



Industry Vendors