The Handbook of NLP with Gensim: Leverage topic modeling to uncover hidden patterns, themes, and valuable insights within textual data
Author: Chris Kuo (Author)
Publisher finelybook 出版社: Packt Publishing
Publication Date 出版日期: October 27, 2023
Language 语言: English
Print length: 310 pages
ISBN-10: 1803244941
ISBN-13: 9781803244945
Book Description
Elevate your natural language processing skills with Gensim and become proficient in handling a wide range of NLP tasks and projects
Key Features
Advance your NLP skills with this comprehensive guide covering detailed explanations and code practices
Build real-world topical modeling pipelines and fine-tune hyperparameters to deliver optimal results
Adhere to the real-world industrial applications of topic modeling in medical, legal, and other fields
Purchase of the print or Kindle book includes a free PDF eBook
Book Description
Navigating the terrain of NLP research and applying it practically can be a formidable task made easy with The Handbook of NLP with Gensim. This book demystifies NLP and equips you with hands-on strategies spanning healthcare, e-commerce, finance, and more to enable you to leverage Gensim in real-world scenarios.
You’ll begin by exploring motives and techniques for extracting text information like bag-of-words, TF-IDF, and word embeddings. This book will then guide you on topic modeling using methods such as Latent Semantic Analysis (LSA) for dimensionality reduction and discovering latent semantic relationships in text data, Latent Dirichlet Allocation (LDA) for probabilistic topic modeling, and Ensemble LDA to enhance topic modeling stability and accuracy.
Next, you’ll learn text summarization techniques with Word2Vec and Doc2Vec to build the modeling pipeline and optimize models using hyperparameters. As you get acquainted with practical applications in various industries, this book will inspire you to design innovative projects. Alongside topic modeling, you’ll also explore named entity handling and NER tools, modeling procedures, and tools for effective topic modeling applications.
By the end of this book, you’ll have mastered the techniques essential to create applications with Gensim and integrate NLP into your business processes.
What you will learn
Convert text into numerical values such as bag-of-word, TF-IDF, and word embedding
Use various NLP techniques with Gensim, including Word2Vec, Doc2Vec, LSA, FastText, LDA, and Ensemble LDA
Build topical modeling pipelines and visualize the results of topic models
Implement text summarization for legal, clinical, or other documents
Apply core NLP techniques in healthcare, finance, and e-commerce
Create efficient chatbots by harnessing Gensim’s NLP capabilities
Who this book is for
This book is for data scientists and professionals who want to become proficient in topic modeling with Gensim. NLP practitioners can use this book as a code reference, while students or those considering a career transition will find this a valuable resource for advancing in the field of NLP. This book contains real-world applications for biomedical, healthcare, legal, and operations, making it a helpful guide for project managers designing their own topic modeling applications.
Table of Contents
1. Introduction to NLP
2. word Embedding
3. Text wrangling and Preprocessing
4. Latent Semantic Analysis with scikit-Learn
5. Cosine Similarity
6. Latent Semantic Indexing with Gensim
7. Using word2vec
8. Doc2vec with Gensim
9. Understanding Discrete Distributions
10. Latent Dirichlet Allocation
11. LDA Modeling
12.LDA Visualization
13. The Ensernble LDA for Model Stability
14.LDAandBERTopic
15. Real-world Use Cases
About the Author
Chris Kuo is a data scientist with over 23 years of experience. He led various data science solutions including customer analytics, health analytics, fraud detection, and litigation. He is also an inventor of a U.S. patent. He has worked at several Fortune 500 companies in the insurance and retail industries. Chris teaches at Columbia University and has taught at Boston University and other universities. He has published articles in economic and management journals and served as a journal reviewer. He is the author of the eXplainable A.I., Modern Time Series Anomaly Detection, Transfer Learning for Image Classification, and The Handbook of Anomaly Detection. He received his undergraduate degree in Nuclear Engineering and Ph.D. in Economics. Amazon page