Features
- Cover Type: Hard Cover with 296 pages
- Published by: Chapman & Hall/CRC
- Edition: 1st Edition April 29, 2005
- Written in: English
- ISBN 10 Number: 1584885343
- ISBN 13 Number: 978-1584885344
-
Book Dimensions:
9.2 x 6.4 x 0.9 inches
- Weighs: 1.2 pounds
Product Review
The particular decomposition studied in this book is the decomposition of the total sum of squares matrix into between and within cluster components, and the book develops this decomposition, and its associated diagnostics, further than I have seen them developed for cluster analysis before. Overall, the book presents an unusual, perhaps even rather idiosyncratic approach to cluster analysis, from the perspective of someone who is clearly an enthusiast for the insights these tools can bring to understanding data.
-D.J. Hand, Short Book Reviews of the ISI
Product Description
Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods--K-Means for partitioning the data set and Ward's method for hierarchical clustering--have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids. Rather than the traditional set of ad hoc techniques, Clustering for Data Mining: A Data Recovery Approach presents a theory that not only closes gaps in K-Means and Ward methods, but also extends them into areas of current interest, such as clustering mixed scale data and incomplete clustering. The author suggests original methods for both cluster finding and cluster description, addresses related topics such as principal component analysis, contingency measures, and data visualization, and includes nearly sixty computational examples covering all stages of clustering, from data pre-processing to cluster validation and results interpretation. This author's unique attention to data recovery methods, theory-based advice, pre- and post-processing issues that are beyond the scope of most texts, and clear, practical instructions for real-world data mining make this book ideally suited for virtually all purposes: for teaching, for self-study, and for professional reference.
Reader ReviewsFirst, understand that the type of clustering being discussed in this book is the statistical technique of finding clusters of data in a collection, where the collection is typically a database. This is not about clustered micro computers being used to work on big computational tasks as though it is a supercomputer. Clusters of customers is a key area in data mining and knowledge discovery. You are usually trying to find groups of people with similar buying patterns but not necessarily identical. For instance if you have a group of people that have purchased a book on PHP, you might want to try to sell them a book on MySQL, or Apache, or Linnux. These programs fit together, but are not identical. Still the customer who purchased the PHP book is more likely to want a MySQL book than he is to want an audio CD of a murder mystery. In this book, two of the most popular clustering techniques, K-Means and Ward's Method are presented. They are presented for a reader interested in the technical aspects of data mining as a theoretician or a practitioner. It is intended (the author says) that the material be useful to a reader with no mathematical background beyond high school. But the author also says, it might be of help if the reader is acquainted with basic notions of calculus, statistics, matrix algebra, graph theory and logic. (The author went to a different high school than I). Clustering is described in this book to be used in a wide variety of applications, most of which are oriented to discovering social patterns, biological taxonomies, machine learning, etc. The book discusses the various techniques that have been developed and gives examples where they have been used in a wide variety of applications.