Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle, 2nd Edition

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Author: Ramcharan Kakarla (Author), Sundar Krishnan (Author), Balaji Dhamodharan (Author), Venkata Gunnu (Author)

ASIN: ‎ B0DBBXKL4X

Publisher finelybook 出版社:‏ ‎ Apress

Edition 版本:‏ ‎ Second edition

Publication Date 出版日期:‏ ‎ 2024-12-2

Language 语言: ‎ English

Print Length 页数: ‎ 467 pages

ISBN-13: ‎ 9798868808197

Book Description

This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data.

In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark.

You will:

  • Gain an overview of end-to-end predictive model building
  • Understand multiple variable selection techniques and their implementations
  • Learn how to operationalize models
  • Perform data science experiments and learn useful tips

From the Back Cover

This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade.

In Chapters 1, 2 & 3, we will get started with setting up the environment, and the basics of PySpark focusing on data manipulations. In Chapter 4, we will dive into the art of Variable Selection where we demonstrate various selection techniques available in PySpark. In Chapters 5, 6 & 7, we take you on the journey of machine learning algorithms, implementations and fine-tuning techniques. Chapters 8 and 9 will walk you through machine learning pipelines, and various methods available to operationalize the model and serve it through docker/API. Chapter 10 will demonstrate how you can unlock the power of predictive models when used in coherence to create a meaningful impact on your business. Chapter 11 will introduce you to some of the most used and powerful modelling frameworks to unlock real value from data.

In this new edition, you will learn predictive modelling frameworks that can quantify customer lifetime values and estimate the return of your predictive modelling investments. This edition also contains methods to measure engagement and identify actionable populations for churn treatments effectively. In addition, a dedicated chapter for experimentation design including steps to efficiently design, conduct, test and measure the results of your models is added. All the codes will be refreshed as needed to reflect the latest stable version of Spark.

You will:

  • Learn the overview of end to end predictive model building
  • Understand Multiple variable selection techniques & implementations
  • Work with Operationalizing models
  • Perform Data science experimentations & tips

About the Author

Ramcharan Kakarla is currently Principal ML at Altice USA. He is a passionate data science and artificial intelligence advocate with 10 years of experience. He holds a master’s degree from Oklahoma State University with specialization in data mining. He is currently pursuing masters in management from University of California, LA. Prior to UCLA and OSU, he received his bachelor’s in electrical and electronics engineering from Sastra University in India. He was born and raised in the coastal town of Kakinada, India. He started his career working as a performance engineer with several Fortune 500 clients including State Farm, British Airways, Comcast and JP Morgan Chase. In his current role he is focused on building data science solutions and frameworks leveraging big data. He has published several papers and posters in the field of predictive analytics. He served as SAS Global Ambassador for the year 2015.

Sundar Krishnan is a Senior Data Science Manager at CVS Health. He has 12+ years of extensive experience leading cross-functional Data Science teams and is an AI, ML, and cloud platform expert. He has a proven track record of building high-performing teams and implementing innovative AI strategies to optimize operational costs and generate substantial revenue. Expert in 0 to 1 product development, successfully led teams from conception to market-ready products in Gen AI & data science. Sundar was born and raised in Tamil Nadu, India, and has a bachelor’s degree from the Government College of Technology, Coimbatore. He completed his master’s at Oklahoma State University, Stillwater. He blogs about his data science works on Medium in his spare time.

Balaji Dhamodharan isanaward winning global Data Science leader, guiding teams to develop and implement innovative, scalable ML solutions. He currently leads the AI/ML and MLOps strategy initiatives with NXP Semiconductors. He has over a decade of experience delivering large-scale technology solutions across diverse industries. His expertise spans Software Engineering, Enterprise AI platforms, AutoML, MLOps, and Generative AI technologies. Balaji holds Masters degrees in Management Information Systems and Data Science from Oklahoma State University and Indiana University. Originally from Chennai, India, Balaji currently resides in Austin, TX, USA.

Venkata Gunnu is a Senior Executive Director of Knowledge Management and Innovation at

JPM Chase. He is an executive with a successful background crafting enterprise-wide data and

data science solutions, GenAI, process improvements, and data and data science-centric

products. Concept to implementation strategist with demonstrated success controlling multiple

projects that elevate organizational efficiency while optimizing resources. Data-focused and

analytical with a track record of automating functions, standardizing data management protocol,and introducing new business intelligence solutions.

相关文件下载地址

PDF, EPUB | 25 MB | 2024-12-19

打赏
未经允许不得转载:finelybook » Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle, 2nd Edition

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫