Unstructured Data
Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales

This unique book shows warehouse developers and managers how to build this new type of warehouse, how to organize free-form text for easy access, and, most importantly, how to exploit text mining techniques to provide timely and accurate information for decision-makers. The author covers the complete process of building and managing a document warehouse, including examples of actual implementations, a review of security issues and tools such as XML and Wide Area Information Servers and their selection criteria, and how text mining techniques are different from data mining techniques.
Effective Databases for Text & Document Management

Focused on the latest research on text and document management, this guide addresses the information management needs of organizations by providing the most recent findings. How the need for effective databases to house information is impacting organizations worldwide and how some organizations that possess a vast amount of data are not able to use the data in an economic and efficient manner is demonstrated. A taxonomy for object-oriented databases, metrics for controlling database complexity, and a guide to accommodating hierarchies in relational databases are provided. Also covered is how to apply Java-triggers for X-Link management and how to build signatures.
Mining the Web: Analysis of Hypertext and Semi Structured Data

Mining the Web: Discovering Knowledge from Hypertext Data is the first book devoted entirely to techniques for extracting and producing knowledge from the vast body of unstructured Web data. Building on an initial survey of infrastructural issues- including Web crawling and indexing-Chakrabarti examines machine learning techniques as they relate specifically to the challenges of Web mining and provides applications of machine learning to systematically acquire, store, and analyze data. Here the focus is on results: the strengths and weaknesses of these applications, along with their potential as foundations for further progress toward a Web that is more aware of content semantics. This thorough and forward-looking book gives the theoretical and practical foundations you need to build innovative applications for mining the Web.
Data on the Web: From Relations to Semistructured Data and XML

Offers detailed solutions to a wide range of practical problems while equipping you with keen understanding of the fundamental issues-including data models, query languages, and schemas- involved in their design, implementation and optimization. DLC: Database management.
Survey of Text Mining: Clustering, Classification, and Retrieval

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
Text Mining: Theoretical Aspects and Applications

Text Mining: Theoretical Aspects and Applications presents contributions from researchers from different disciplines. Each of them is studying the problem of mining text according to his scientific background: artificial intelligence, computational linguistics, document analysis, machine learning, information retrieval, pattern recognition. Their common goal is to analyse huge text collections in real world applications in order to support knowledge-intensive processes.
Mining the Web: Transforming Customer Data

Web sites gather a lot of detailed information about customers. Unfortunately, most companies lack the means to use that information to improve their marketing and customer support functions. Considered by most experts to be the new frontier in the database and data warehousing fields, Web mining solves that problem. Coauthored by two bestselling data mining authors, Mining the Web explains, for corporate decision makers, IT managers, and database marketers, how data mining principles and techniques can be applied to various types of Web sites. More importantly, they describe techniques for using the resulting goldmine of business data to develop more effective advertising campaigns and better customer service.
Mining the Talk

Leverage Unstructured Data to Become More Competitive, Responsive, and Innovative
In Mining the Talk, two leading-edge IBM researchers introduce a revolutionary new approach to unlocking the business value hidden in virtually any form of unstructured data-from word processing documents to websites, emails to instant messages.
The authors review the business drivers that have made unstructured data so important-and explain why conventional methods for working with it are inadequate. Then, writing for business professionals-not just data mining specialists-they walk step-by-step through exploring your unstructured data, understanding it, and analyzing it effectively.



