Features
- Cover Type: Paperback with 432 pages
- Published by: Charles River Media
- Edition: 1st Edition May 4, 2006
- Written in: English
- ISBN 10 Number: 1584504609
- ISBN 13 Number: 978-1584504603
-
Book Dimensions:
9 x 7.3 x 1.1 inches
- Weighs: 1.8 pounds
Product Description
Text Mining Application Programming teaches
software developers how to mine the vast amounts of information available on the Web, internal networks, and desktop files and turn it into usable data. The book helps developers understand the problems associated with managing unstructured text, and explains how to build your own mining tools using standard statistical methods from information theory, artificial intelligence, and operations research. Each of the topics covered are thoroughly explained and then a practical implementation is provided. The book begins with a brief overview of text data, where it can be found, and the typical search engines and tools used to search and gather this text. It details how to build tools for extracting and using the text, and covers the mathematics behind many of the algorithms used in building these tools. From there you'll learn how to build tokens from text, construct indexes, and detect patterns in text. You'll also find methods to extract the names of people, places, and organizations from an email, a news article, or a Web page. The next portion of the book teaches you how to find information on the Web, the structure of the Web, and how to build spiders to crawl the Web. Text categorization is also described in the context of managing email. The final part of the book covers information monitoring, summarization, and a simple Question & Answer (Q&A) system. The code used in the book is written in Perl, but knowledge of Perl is not necessary to run the software. Developers with an intermediate level of experience with Perl can customize the software. Although the book is about programming, methods are explained with English-like pseudocode and the source code is provided on the CD-ROM. After reading this book, you'll be ready to tap into the bevy of information available online in ways you never thought possible.
About The Author
Manu Konchady (Oakton,VA) is a consultant working on open source text mining software. Previously, he worked at Mitre Corp. where he designed and developed
software to mine the Internet. He received his Ph.D. in Information Technology from George Mason University and his articles have appeared in Dr. Dobb's Journal and Linux Journal.
Reader ReviewsI am a Java web/search programmer who wanted to "get into" text mining. I found this book an excellent resource for this. Text Mining is a field in which active research is still going on, and other Text Mining books I have looked at reflect this - the authors expect you to have a certain degree of mathematical background to understand what they are saying. This book explains briefly the math behind each of the approaches, but it focuses more on the algorithms that result from the math, so it is easier to read. Of course, a side effect of this is that the approaches described are not necessarily the state of the art for solving any given problem, but once you get the basic approach to solving a problem, it is relatively easy to find and understand the documentation on the web for the more advanced approaches, since you now know what you are looking for and how it differs from your basic solution. The book does have a (fairly long) chapter where it covers the math background necessary to get started with Text Mining. If you understand the stuff in there, you will actually be able to think up solutions to text mining problems that are unique to your own situation. The algorithms in the book are in pseudo-code, but the book comes with a CD (or download from the author's sourceforge project textmine.sf.net) where you can see working Perl code. Overall, I think this is one of the most useful books that I have purchased in a while. It should appeal most to programmer types who have programmed in their language(s) of choice for a while in areas other than text mining, wants to get into text mining, and doesn't want to spend a lot of time relearning high school and college math before starting off.