|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
During the last several years, data mining techniques have been used by companies to understand the demographics of their customers and to provide them with personalized interactions. There are various data mining techniques that have been deployed in order to identify hidden trends and new opportunities within the data. These various data mining techniques have been embedded into software applications that process complex algorithms in order to provide meaningful information. While end-user data mining applications are available, they have not been extensively deployed throughout organizations because they are often not understood. One way to understand the capabilities of data mining is to compare it to other business intelligence (BI) technologies.
With an ad hoc query application, users have the ability to access information on demand. What they ask for is what they will get. For example, a user creates and executes an ad hoc query that answers the question, "How much revenue was generated by each customer during this year?" The results from the query would contain customer name and revenue for the year selected. Figure 1 is a representation of the result set produced by the ad hoc query.
The revenue by customer could be totaled to another question: "How much revenue was generated this year?" In addition, other questions such as: "What customer generated the most revenue for the company?" and "What customer generated the least amount of revenue for the company?" could also be answered. While the query result was useful and addressed several questions, this BI technology will not identify unusual patterns or reveal unusual relationships. What the user requested was revenue by customer for the current year and that is the information that was provided - no more, no less.
OLAP applications provide users with the ability to manually explore and analyze summary and detailed information. For example, a user creates and performs an OLAP analysis that answers the question, "What was the revenue for each quarter of this year by geographic region and customer?" The results from this analysis would contain geographic region, customer name, revenue and quarters selected. Figure 2 is a representation of the result set produced by the OLAP analysis.
Additional questions could be posed of the data that could highlight seasonal revenue patterns by geographic region. However, a user who understands how to navigate the data must direct this process. OLAP can only highlight the patterns within the data that was requested. It is left to the user to identify the trends and patterns highlighted by the OLAP analysis. This BI technology will not identify unusual or reveal hidden relationships.
Data mining can best be described as a BI technology that has various techniques to extract comprehensible, hidden and useful information from a population of data. Data mining makes it possible to discover hidden trends and patterns in large amounts of data. The output of a data mining exercise can take the form of patterns, trends or rules that are implicit in the data.
There are various data mining techniques that can be deployed; each serving a specific purpose and varying amounts of user involvement. Figure 3 displays the progression of data mining techniques in the order of user involvement.
Neural networks are highly evolved systems that provide predictive modeling. These systems are very complex, and it takes time to train the system to perform human-like thinking. This data mining technique has been used to detect potential fraudulent credit card transactions.
Induction is a data mining technique that induces rules inherent within the data. The rules are used to understand the relationships that exist. A classic example is: When people buy diapers, they also buy beer 50 percent of the time.
Statistics is the basis of all data mining techniques and requires individuals highly skilled in mathematics to build and interpret the results.
Visualization displays the data in a graphical or three-dimensional map, thereby allowing the user to identify trends, patterns and relationships. While an image that is produced provides another perspective of data relationships, visualization is often incorporated in data mining applications.
While OLAP and query language are listed by the GartnerGroup as data mining techniques, the amount of user involvement is extensive and extremely time-consuming to identify hidden trends and relationships. Therefore, using such techniques is not cost-effective.
Utilizing a data mining application, a user can ask, "What are the distinguishing characteristics of our credit customers who pay on time?" The results from the data mining exercise would then be used to create the condition statement of an ad hoc query that identifies customer names and contact information within the database for the purposes of cross- selling additional services.
Ad hoc query applications scratch the surface of the value that exists within a database while OLAP provides users with greater depth and understanding. However, data mining digs deeper and provides users with knowledge through the discovery of hidden trends and relationships. The combination of data mining with an ad hoc query or OLAP application is extremely powerful and provides users with knowledge about the data that is analyzed and the ability to act upon the knowledge. Figure 4 depicts the value and purpose of the BI technologies addressed herein.
Parsaye, K. "A Characterization of Data Mining Technologies and Processes." The Journal of Data Warehousing. Fall 1997.
Brand, E. and Gerritsen, R. "Data Mining and Knowledge Discovery." DBMS Online. February 1998.
Jonathan Wu is a senior principal with Knightsbridge Solutions. He has extensive experience designing, developing and implementing information solutions for reporting, analysis and decision-making purposes. Serving Fortune 500 organizations, Knightsbridge delivers actionable and measurable business results that inform decision making, optimize IT efficiency and improve business performance. Focusing exclusively on the information management disciplines of data warehousing, data integration, information quality and business intelligence, Knightsbridge delivers practical solutions that reduce time, reduce cost and reduce risk. Wu may be reached at firstname.lastname@example.org.
|E-Mail This Column|