Text Mining with Machine Learning: Principles and Techniques
by: Jan Žižka,František Dařena,et al.
Pages: 366 pages
Publisher finelybook 出版社: CRC Press; 1 edition (November 11,2019)
Language 语言: English
ISBN-10: 1138601829
ISBN-13: 9781138601826
Book Description
This book provides a perspective on the application of machine learning-based methods in knowledge discovery from natural languages texts. By analysing various data sets,conclusions which are not normally evident,emerge and can be used for various purposes and applications. The book provides explanations of principles of time-proven machine learning algorithms applied in text mining together with step-by-step demonstrations of how to reveal the semantic contents in real-world datasets using the popular R-language with its implemented machine learning algorithms. The book is not only aimed at IT specialists,but is meant for a wider audience that needs to process big sets of text documents and has basic knowledge of the subject,e.g. e-mail service providers,online shoppers,librarians,etc.
The book starts with an introduction to text-based natural language data processing and its goals and problems. It focuses on machine learning,presenting various algorithms with their use and possibilities,and reviews the positives and negatives. Beginning with the initial data pre-processing,a reader can follow the steps provided in the R-language including the subsuming of various available plug-ins into the resulting software tool. A big advantage is that R also contains many libraries implementing machine learning algorithms,so a reader can concentrate on the principal target without the need to implement the details of the algorithms her- or himself. To make sense of the results,the book also provides explanations of the algorithms,which supports the final evaluation and interpretation of the results. The examples are demonstrated using realworld data from commonly accessible Internet sources.
Contents
Authors’ Biographies
1. Introduction to Text Mining with Machine Learning
2. Introduction to R
3. Structured Text Representations
4. Classification
5. Bayes Classifier
6. Nearest Neighbors
7. Decision Trees
8. Random Forest
9. Adaboost
10. Support Vector Machines
11. Deep Learning
12. Clustering
13. Word Embeddings
14. Feature Selection
References
Index