Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Mining and Modeling:
Galileo and Marketing Analytics

online columnist David S. Coppock     Column published in DMReview.com
June 26, 2003
  By David S. Coppock

Galileo, to my knowledge, never used his scientific genius to apply data mining and modeling techniques to marketing. Perhaps he felt that being the first to study the cosmos through a telescope had stronger implications for science and society. However, those of us who practice the humbler pursuit of database marketing still follow the empirical method (use of direct observation to draw conclusions) that he and others pioneered.

Stephen Jay Gould, in his essay entitled "The Sharp-Eyed Lynx, Outfoxed by Nature," 1 describes Galileo's belief in observation and the trouble in which it landed him with the Catholic church - convicted of heresy for advocating the Copernican view that the earth orbits the sun. Fortunately, today's data miners and modelers are free to observe and draw conclusions without fear of such drastic reprisal. However, we are also free, and all too able, to make some of the same scientific mistakes as Galileo. In fact, the point of Gould's essay is not to celebrate the power of the empirical method, but to point out the dangers of relying on observation alone in drawing conclusions.

Gould illustrates this point with the story of how Galileo interpreted his observations of Saturn. Galileo's telescope was powerful enough to enable him to see the rings of Saturn, but not powerful enough to see them clearly for what they are. So after gazing many times at the mysterious planet, he concluded that Saturn was actually three planets - a large middle planet with two smaller planets touching it on either side. We know now that this was wrong. But Galileo considered it to be a fact, since he had "observed" it to be true.

What went wrong? Surely we can't complain that his observation technique was flawed. He was using the best technology available in his day and, therefore, collecting all the data he could. The problem was not with the data, but with his interpretation of the data. The idea of rings around a planet was too far outside his current understanding of nature to allow him to reach the correct conclusion. Instead, as Gould notes, he made a conclusion that was within his realm of comprehension and still consistent with the very imprecise image of the rings he was able to observe.

The possibility of misinterpreting data analysis is still a very real problem. As managers and analysts we often have expectations and hopes about what the data may tell us. It is very easy to take results that are merely consistent with our expectations and interpret them as proof of our expectations. Even if we avoid bias toward specific conclusions, we can easily overlook alternative explanations of the data.

An obvious example is confusing correlation with causality. As a specific example, suppose that a retailer with multiple channels observes that the average purchase amount is smaller on their Web site than through their catalog. Does this mean that the Web site inhibits the amount purchased (perhaps by being hard to navigate, so that shoppers can't find everything they want)? While the observation is consistent with this interpretation, there are less obvious explanations that could also be true. Perhaps internet shoppers tend to be younger. If younger shoppers tend to buy smaller amounts then this would also explain the observation.

The implications of these two hypotheses are very different. In one case, there is a problem with the Web site that is holding back sales. In the other case, the Web site is successfully reaching a specific market segment. Obviously, the reaction of marketing decision-makers would be very different in each case.

The basic point of Gould's essay is that the process of learning through empirical observation and analysis involves two parts. The first is what everybody automatically associates with the empirical method: collecting the data and doing the analysis. But equally important is the mental process of interpreting the results and drawing conclusions. We run the danger of doing a poor job of interpretation if we fail to explicitly realize the importance of this second step. Resist the temptation to jump to easy conclusions or create convenient explanations for the data. The full story is probably more complicated.


1 Gould, Stephan Jay. The Lying Stones of Marrakech. Harmony Books. 2000. pp. 27- 52.


For more information on related topics visit the following related portals...
Data Analysis and Database Marketing.

David Coppock has more than 20 years of experience in technical and strategic marketing positions. As senior vice president of Data Mining and Modeling at ANALYTICi he has implemented leading edge methodologies for targeting, segmentation and marketing strategy applications. He also led the Analytic Database Marketing Division at AT&T.; He holds a Ph.D. in economics from Yale University and can be contacted at dcoppock@patmedia.net.

Solutions Marketplace
Provided by IndustryBrains

Autotask: The IT Business Solution
Run your tech support, IT projects and more with our web-based business management. Optimizes resources and tracks billable project and service work. Get a demo via the web, then try it free with sample data. Click here for your FREE WHITE PAPER!

OutlookSoft Business Intelligence & BPM Software
OutlookSoft is real-time, Microsoft-based business intelligence and BPM software that unifies query, reporting, analysis & OLAP with planning, budgeting, forecasting, consolidation, reporting & scorecarding. Free demo & white paper

File Replication and Web Publishing - RepliWeb
Cross-platform peer-to-peer file replication, content synchronization and one-to-many file distribution solutions enabling content delivery. Replace site server publishing.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

OneData Integrates Enterprise Information Assets
Manage corporate information assets - Master Data, Reference Data, metadata, content, taxonomies - in a virtual/federated manner. Deliver a single view of your customers and products. Flexible yet non-intrusive, powerful yet lightweight.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.