Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Business Intelligence:
What is Data Mining?

online columnist Jonathan Wu     Column published in DMReview.com
August 11, 2000
 
  By Jonathan Wu

During the last several years, data mining techniques have been used by companies to understand the demographics of their customers and to provide them with personalized interactions. There are various data mining techniques that have been deployed in order to identify hidden trends and new opportunities within the data. These various data mining techniques have been embedded into software applications that process complex algorithms in order to provide meaningful information. While end-user data mining applications are available, they have not been extensively deployed throughout organizations because they are often not understood. One way to understand the capabilities of data mining is to compare it to other business intelligence (BI) technologies.

Ad Hoc Query

With an ad hoc query application, users have the ability to access information on demand. What they ask for is what they will get. For example, a user creates and executes an ad hoc query that answers the question, "How much revenue was generated by each customer during this year?" The results from the query would contain customer name and revenue for the year selected. Figure 1 is a representation of the result set produced by the ad hoc query.

Figure 1
Figure 1

The revenue by customer could be totaled to another question: "How much revenue was generated this year?" In addition, other questions such as: "What customer generated the most revenue for the company?" and "What customer generated the least amount of revenue for the company?" could also be answered. While the query result was useful and addressed several questions, this BI technology will not identify unusual patterns or reveal unusual relationships. What the user requested was revenue by customer for the current year and that is the information that was provided - no more, no less.

Online Analytical Processing (OLAP)

OLAP applications provide users with the ability to manually explore and analyze summary and detailed information. For example, a user creates and performs an OLAP analysis that answers the question, "What was the revenue for each quarter of this year by geographic region and customer?" The results from this analysis would contain geographic region, customer name, revenue and quarters selected. Figure 2 is a representation of the result set produced by the OLAP analysis.

Figure 2
Figure 2

Additional questions could be posed of the data that could highlight seasonal revenue patterns by geographic region. However, a user who understands how to navigate the data must direct this process. OLAP can only highlight the patterns within the data that was requested. It is left to the user to identify the trends and patterns highlighted by the OLAP analysis. This BI technology will not identify unusual or reveal hidden relationships.

Data Mining

Data mining can best be described as a BI technology that has various techniques to extract comprehensible, hidden and useful information from a population of data. Data mining makes it possible to discover hidden trends and patterns in large amounts of data. The output of a data mining exercise can take the form of patterns, trends or rules that are implicit in the data.

There are various data mining techniques that can be deployed; each serving a specific purpose and varying amounts of user involvement. Figure 3 displays the progression of data mining techniques in the order of user involvement.

Figure 3
Figure 3

Neural networks are highly evolved systems that provide predictive modeling. These systems are very complex, and it takes time to train the system to perform human-like thinking. This data mining technique has been used to detect potential fraudulent credit card transactions.

Induction is a data mining technique that induces rules inherent within the data. The rules are used to understand the relationships that exist. A classic example is: When people buy diapers, they also buy beer 50 percent of the time.

Statistics is the basis of all data mining techniques and requires individuals highly skilled in mathematics to build and interpret the results.

Visualization displays the data in a graphical or three-dimensional map, thereby allowing the user to identify trends, patterns and relationships. While an image that is produced provides another perspective of data relationships, visualization is often incorporated in data mining applications.

While OLAP and query language are listed by the GartnerGroup as data mining techniques, the amount of user involvement is extensive and extremely time-consuming to identify hidden trends and relationships. Therefore, using such techniques is not cost-effective.

Utilizing a data mining application, a user can ask, "What are the distinguishing characteristics of our credit customers who pay on time?" The results from the data mining exercise would then be used to create the condition statement of an ad hoc query that identifies customer names and contact information within the database for the purposes of cross- selling additional services.

Ad hoc query applications scratch the surface of the value that exists within a database while OLAP provides users with greater depth and understanding. However, data mining digs deeper and provides users with knowledge through the discovery of hidden trends and relationships. The combination of data mining with an ad hoc query or OLAP application is extremely powerful and provides users with knowledge about the data that is analyzed and the ability to act upon the knowledge. Figure 4 depicts the value and purpose of the BI technologies addressed herein.

Figure 4
Figure 4

References:

Parsaye, K. "A Characterization of Data Mining Technologies and Processes." The Journal of Data Warehousing. Fall 1997.
Brand, E. and Gerritsen, R. "Data Mining and Knowledge Discovery." DBMS Online. February 1998.

...............................................................................

For more information on related topics visit the following related portals...
Data Mining.

Jonathan Wu is a senior principal with Knightsbridge Solutions. He has extensive experience designing, developing and implementing information solutions for reporting, analysis and decision-making purposes. Serving Fortune 500 organizations, Knightsbridge delivers actionable and measurable business results that inform decision making, optimize IT efficiency and improve business performance.  Focusing exclusively on the information management disciplines of data warehousing, data integration, information quality and business intelligence, Knightsbridge delivers practical solutions that reduce time, reduce cost and reduce risk. Wu may be reached at jwu@knightsbridge.com.

Solutions Marketplace
Provided by IndustryBrains

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Email Regulatory Compliance
E-Trail Digital Archive is a feature rich, turnkey Electronic Communications Retention, Retrieval and Supervisory system.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

Rosette Linguistics Platform
Basis Technology utilizes powerful techniques to provide software solutions for extracting meaningful intelligence from unstructured text in Asian, European and Middle Eastern languages.

Click here to advertise in this space


E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Advertisement
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.