-
Marketplace
-
Channel Resources
Articles from this Site
A Scoring Model and Choice Model for Multistage Cross Selling in the Insurance Industry, Part 2
Lawrence Technological University Uses Metastorm ProVision
Information Builders to Extend WebFOCUS to Predictive Analysis
A Statistical Stocking Stuffer for the Holidays
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
White Papers
Best Practices: Eight Tips for Improving Your Professional Services Business
Metadata Management for Enterprise Applications
UML for C#
PHP Code Design
Domain-Specific Modeling: 10x Faster than UML
Web Seminars
Modeling Unstructured Data
Creative Strategies for Achieving 24/7 Uptime
Closing the Loop: Real-Time Event Detection and Response
Learning from Others: Best Practices for Data Governance
Data Quality Getting Started the Right Way
Books
Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Modeler's Workbench: Tools and Techniques for Analysis and Design
The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models
Data Mining Using SAS Applications
Data Mining: Concepts, Models, Methods and Algorithms
Spam Model
Design Challenge
Imagine you are in this situation: as the senior data modeler in your organization, you review data models built by project teams to ensure they follow modeling best practices. There is a new reporting system currently being modeled called Spam. Spam will initially produce weekly spreadsheets for business users capturing spam email quantity by keyword on a given day of the week. Figure 1 is a subset of what might get produced, where the columns represent common spam words and the rows represent the days of the week.
![]() |
The project team could not agree on one single model for this project, so they present you with three different physical data models (see Figure 2). Note that in option 3, a Spam Reporting Factor Type Name could be Day of Week and Spam Reporting Factor Value could be Monday. Instead of selecting the best model from these three, you decide to play it safe and list the situations where each model would be ideal.
![]() |
The Challenge
In what situations would you use each of these models?
The Response
Modifying a logical data model into an efficient physical design always involves tradeoffs. For example, Do I want it fast, or do I want it flexible? The main situations for when to choose each of these options follow.
Option 1
- Names can change. For example, if the Spam Keyword of Rolex needs to change to Watch, this can be accomplished by updating a single Spam Keyword Name value.
- Integration is a strong possibility. Chris Bielinski, architect, suggests option 1 is a good choice when integration is a requirement, such as fitting this model within a data warehouse design. The Day of Week table could be used with other data values for expanded reporting, Chris says.
- Reporting requirements are stable. This model meets the requirements as long as spam quantity is always viewed by Day of Week and Spam Keyword. Both Peter Deotto, data architect, and Javier Mazzurco, BI architect, raise this factor. Javier also says that the reporting tool must support cross-tabulation of data.
Option 2
- Data retrieval performance and user-friendliness are priorities. Marcin Kulakowski, data analyst, mentions that data in this structure is formatted for quick response time. Alefiya Sabuwala, data modeler, says, This denormalized representation of the data makes it easy for users to run queries against. This structure can also be user-friendly for the development team. Bob Mosscrop, enterprise data architect, suggests this structure might be a good choice if there is limited database SQL knowledge among team members.
- Reporting requirements including report format are stable. Option 2 would require the most rework if requirements change. Georgia Prothero, data modeler, says this model can be used when Spam Keyword Name is unique and not expected to change and keywords will only ever be required to be counted by days of the week. Marcin recommends asking the question, Do we assume that we will only hold one weeks data?
- Reporting tool has limited functionality. Javier says that this structure is beneficial when the reporting tool does not support cross-tabulation of data or the cross-tabulation consumes lots of resources.
Option 3
- Option 3 should be chosen when flexibility is the priority. Alefiya explains: If the requirements for the measurement of the spam events are very loosely defined and the likelihood of changing the measurement from days to any other factor over time is possible, I would select this model for the flexibility it provides. Wade Baskin, senior database architect, and Patrick McMullen, data architect, both suggest carefully weighing the flexibility gained with this design against the challenge of explaining such an abstract, complex design to others.
If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at http://www.stevehoberman.com/. There is also an overview on how to read a data model at my Web site.
Steve Hoberman has worked as a business intelligence and data management practitioner and trainer since 1990. He is a Certified Business Intelligence Professional (CBIP), having achieved mastery level certification in data analysis and design. He is a popular and frequent presenter at industry conferences, both nationally and internationally. Hoberman is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler's Workbench and Data Modeling Made Simple (available for purchase through the DM Review bookstore). He is the founder of the Design Challenges group, inventor of the Data Model Scorecard and a recognized innovator and thought leader in the field of data modeling. He can be reached at me@stevehoberman.com.
Graeme Simsion's latest book is out! Data Modeling Theory and Practice. Here's a link where you can read more about the book and purchase it at a discounted price.
For more information on related topics, visit the following channels:




