Features
- Cover Type: Hard Cover with 244 pages
- Published by: Springer
- Edition: 1st Edition September 9, 2003
- Written in: English
- ISBN 10 Number: 0387955631
- ISBN 13 Number: 978-0387955636
-
Book Dimensions:
9.3 x 6.1 x 0.7 inches
- Weighs: 10.4 ounces
Product Description
Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models,
probability theory, vector space models, statistical models, or even graph theory.
As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing requirements and environments.
This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
Book Info
Text provides a survey of text mining, organized into three parts: clustering and classification; information extraction and retrieval; and trend detection. For researchers and practitioners. DLC: Data mining--Congresses.
Reader ReviewsThe book is relatively brief, given the technical nature of its chapters, each written by different authors. Many clustering methods are described. Most can be seen to have some degree of subjectivity, in defining what ends up in a given cluster. Or whether a cluster even exists or not. The analysis of Web documents forms a major portion of the book. This data set is vast, continually changing and expanding. Plus, it is noisy. Unlike many clean data sets that might be extracted from a corpus of books, for example. Attention should be paid to methods of automatically extracting information from the Web. The book does not go much into the higher level problems of defining ontologies. Which are very hard tasks. The closest it seems to get is along the lines of finding similar words in documents. Which is still very useful.