-
Marketplace
-
Channel Resources
Articles from this Site
Information Builders to Extend WebFOCUS to Predictive Analysis
A Statistical Stocking Stuffer for the Holidays
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
New Interfaces Integrated into MEGA International Modeling Software
Which would be a better choice of classes for career growth in data warehousing - ETL architecture or dimensional modeling?
White Papers
Best Practices: Eight Tips for Improving Your Professional Services Business
Metadata Management for Enterprise Applications
UML for C#
PHP Code Design
Domain-Specific Modeling: 10x Faster than UML
Web Seminars
Modeling Unstructured Data
Creative Strategies for Achieving 24/7 Uptime
Closing the Loop: Real-Time Event Detection and Response
Learning from Others: Best Practices for Data Governance
Books
Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models
Data Mining Using SAS Applications
Data Mining: Concepts, Models, Methods and Algorithms
Performance-Tuning Data Models
If you've made it alive through the technology downturn over the past few years, chances are you've had to do more for less. One of the things I've found myself doing more often than before is performance tuning my physical data models instead of handing them off to a DBA team for physical design. All of the "fox in the hen house" analogies apply, but coming from the logical modeling side of the house, I tend to be more conservative in making tuning changes due to the inevitable loss of data integrity. Nevertheless, most shops today are still willing to trade off data integrity for performance gains. As a result, I've compiled a list of some data model performance-tuning candidates for consideration when you start your physical data model tuning process. If you still hand off your logical model to another team for physical design, hopefully this article will help you (and your DBA teammate) make intelligent decisions.
It makes sense that this article is primarily applicable for online transaction processing (OLTP) data models and not online analytical processing (OLAP) data models. This is primarily due to two facts: all warehouses are not responsible for upholding the enterprise business rules, and dimensional models are designed for easy-read access, not normalized data.
The other ground rule in this article is that the logical data model always reflects the business rules. Therefore, when we discuss tuning we are really discussing denormalization or derivations of the physical data model.
In this article, I'll discuss super- and subtype performance considerations.
Where to start? Well the data model, of course. More specifically, the normalized logical data model. This should be the starting point for all your model performance enhancements. Why? Because the normalized logical data model leaves the data in a consistent state. If all performance goals are met with the normalized logical model, why make performance enhancements?Ease of query you say? Certainly some queries on normalized models can be complex. However, making reads easier usually makes inserts, updates and deletes harder. Furthermore, if you design your code for reuse, you only need to write the SQL once.
For the sake of brevity of this article, the rules of normalization will not be covered. If you'd like to see some good articles on normalization, I suggest reading Tom Johnston's many articles dealing with the topic.
A Few Prerequisites to Performance Tuning
Before we start tuning the physical data model, there are a few things you should consider.
First, understand that the data model may not be the source of your performance problems. Performance problems can be difficult to locate and identify. Some potential performance bottlenecks are:
- The application - The source code may be written inefficiently, causing bad performance.
- The network - Many of today's applications utilize overworked local area networks (LANs), wide area networks (WANs) or, even worse, the Internet with its many inherent problems.
- The database - The complexity of some of today's leading relational database products is astounding. While brushing up on some Oracle tuning procedures, I referenced Oracles 9i Performance Tuning manual, weighing in at a whopping 810 pages.
Before you start deriving and denormalizing the data model, make sure the other parts of the puzzle are put together properly.
Noted modeling expert Graeme Simsion recently stated in a column,"It's much easier to build performance into the design than to try to build it on."
This brings me to my second point: understand your performance requirements. Performance requirements are just as important and valid as other business requirements. The earlier you identify the performance requirements, the earlier you can make design decisions to improve your performance. If you don't have a documented performance requirement, you can't measure your success at the end of the tuning process.
Third, make sure you have a defined testing methodology. Your testing methodology should be repeatable, defined, managed and finally optimized. Sound familiar? It should; it's straight from the capability maturity model.
Don't Make Your Problem Someone Else's
If I remove a business rule from my database and put the responsibility for that business rule in the application, have I improved my performance?
If you have narrow-minded scope, you may be inclined to answer yes to this question. However, if your scope includes the application, as it should, then the answer may be no. Simply moving business rule responsibility from one application group to another may seem like a good performance tune, but you may be making your problem someone else's.
Even worse, you may be removing business rules that belong in the database and thereby inversely impacting your data integrity.
As the data architect, always keep in mind that you must have enterprise scope when designing your databases. It is not reasonable to expect the data requirements for an application won't be needed by many other applications throughout the enterprise. For this important reason, I strive to keep my data structures application independent.
Performance Tune #1 - Subtypes
Subtypes are the perfect way to show attribute optionality or relationship rules that vary by entity occurrence. That is, if I have an attribute or relationship that is mandatory for a subset of entity occurrences, I can reflect those business rules using a super/subtype data structure. Take the Employee model fragment in Figure 1, for example. For all Employees, I store the Employee name and address. I also store the Employee Type Code, known as a classifying attribute or discriminator, which tells me the Employee type (Full-Time or Temporary). For Full-Time Employees, I store the Hire Date, Job Grade, Salary Amount and Social Security number. For Temporary Employees, I store the Contract Identifier, Hourly Pay Rate and Tax Identifier.
Subtypes allow me to show the attribute (and relationship) optionality by entity occurrence. I can clearly show that Contract Identifier, Hourly Pay Rate and Tax Identifier are only applicable for Temporary Employees and, furthermore, that these attributes are mandatory (when displayed in the physical model). If I lump all of these attributes into one entity, I have to make all of the subtype attributes optional.

Figure 1: Employee Super/Subtype
There are three physical implementations for this logical model:
1. Create a table for the supertype and each subtype entity,
2. Roll up the subtype attributes into the supertype entity and create one table, or
3. Roll down the supertype attributes into each subtype and create a table for each subtype entity.
Not surprisingly, each solution has advantages and disadvantages.
Option 1 - Create a Table for the Supertype and all Subtype Entities
This option reflects the logical data structure. In the Figure 1 example, you would create an Employee supertype table as well as a Full Time and Temporary subtype table. As with all super/subtype relationships, the cardinality is one-to-one.
Here are the advantages of this approach:

And here are the disadvantages:

Option 2 - Roll Up the Subtype Columns into the Supertype
This is a popular option for obvious reasons; there are less database objects to maintain, and the queries are simple. In this option, you take all the columns from each of your subtype tables and "roll up" or move them to the supertype table. Figure 2 shows the result of Option 2 from our previous example.

Figure 2: Rolled-Up Subtypes
As I've previously noted, I am now forced to make all of my subtype columns optional (or nullable). Full-Time Employee columns that were not null are now null because I combined Full-Time and Temporary Employees in the same table. I have to use another means (e.g., check constraint, application code) to enforce the column optionality. Certainly a check constraint can enforce the business rules, but these cannot be shown in a data model. I've lost the ability to reflect some business rules in the physical model.
Here are the advantages of this approach:

And here are the disadvantages:

Option 3 - Roll Down the Supertype Columns into Each Subtype
The last option is to "roll down" the supertype entity into each of the subtype entities. By "roll down," I mean to take all of the supertype attributes and copy them to each of the subtype entities. Using our Employee example, the resulting data model would look like Figure 3.

Figure 3: Rolled-Down Subtypes
The benefit of this approach is that I get to keep my column optionality. I don't need to use database triggers, check constraints or application code to enforce the NOT NULL column constraints. The downside of this approach is that I've redundantly stored my supertype columns in each of my subtype tables. Instead of storing one fact one place, I now store the same fact for each subtype table. Add super type relationships that I have to redundantly create for each subtype to the mix and my model has become unnecessarily complex. For this simple example, the redundancy is easy to manage. On more complex examples, this could become a maintenance nightmare even with assistance from today's modeling tools. That said, there are performance advantages to this model approach.
Advantages:

And disadvantages:

The correct approach for your situation requires thought. Make sure you thoroughly understand the application access paths and how often they are performed to gain insight to the best approach for your application.
Performance tuning is always about tradeoffs. Where one option can improve, update and delete performance, it likewise can degrade read performance.
The author would like to thank Lee Leclair and Jean Anne Brown for contributions to this article.
References:1. Simsion, Graeme. "Better Data Models - Today: Understanding Data Model Quality." The Data Administration Newsletter, October 2005.
2. Niemiec, Richard J. Oracle 9i Performance Tuning Tips and Techniques. Chicago: McGraw-Hill, 2003.
Walter Howard is president of WallStreet Consulting Services, Inc., a company specializing in managing enterprise data assets. Walter has over twenty years of information technology experience. The majority of this time has been dedicated to helping clients manage their data assets in operational and data warehousing environments.
For more information on related topics, visit the following channels:


