Machine Learning with Spark,2nd Edition


Machine Learning with Spark Second Edition
by Dua,Rajdeep and Ghotra,Manpreet Singh
Print Length 页数: 532 pages
Publisher finelybook 出版社:‏ Packt Publishing; 2nd Revised edition edition (28 April 2017)
Language 语言: English
ISBN-10: 1785889931
ISBN-13: 9781785889936
B01DPR2ELW
Key Features
Get to the grips with the latest version of Apache Spark
Utilize Spark’s machine learning library to implement predictive analytics
Leverage Spark’s powerful tools to load,analyze,clean,and transform your data

Book Description


This book will teach you about popular machine learning algorithms and their implementation. You will learn how various machine learning concepts are implemented in the context of Spark ML. You will start by installing Spark in a single and multinode cluster. Next you’ll see how to execute Scala and Python based programs for Spark ML. Then we will take a few datasets and go deeper into clustering,classification,and regression. Toward the end,we will also cover text processing using Spark ML.
Once you have learned the concepts,they can be applied to implement algorithms in either green-field implementations or to migrate existing systems to this new platform. You can migrate from Mahout or Scikit to use Spark ML.
By the end of this book,you will acquire the skills to leverage Spark’s features to create your own scalable machine learning applications and power a modern data-driven business.
What you will learn
Get hands-on with the latest version of Spark ML
Create your first Spark program with Scala and Python
Set up and configure a development environment for Spark on your own computer,as well as on Amazon EC2
Access public machine learning datasets and use Spark to load,process,clean,and transform data
Use Spark’s machine learning library to implement programs by utilizing well-known machine learning models
Deal with large-scale text data,including feature extraction and using text data as input to your machine learning models
Write Spark functions to evaluate the performance of your machine learning models
Contents
1. Getting Up and Running with Spark
1. Installing and setting up Spark locally
2. Spark clusters
3. The Spark programming model
4. SchemaRDD
5. Spark data frame
6. The first step to a Spark program in Scala
7. The first step to a Spark program in Java
8. The first step to a Spark program in Python
9. The first step to a Spark program in R
10. Getting Spark running on Amazon EC2
11. Configuring and running Spark on Amazon Elastic Map Reduce
12. UI in Spark
13. Supported machine learning algorithms by Spark
14. Benefits of using Spark ML as compared to existing libraries
15. Spark Cluster on Google Compute Engine – DataProc
16. Summary
2. Math for Machine Learning
1. Linear algebra
2. Gradient descent
3. Prior,likelihood,and posterior
4. Calculus
5. Plotting
6. Summary
3. Designing a Machine Learning System
1. What is Machine Learning?
2. Introducing MovieStream
3. Business use cases for a machine learning system
4. Types of machine learning models
5. The components of a data-driven machine learning system
6. An architecture for a machine learning system
7. Spark MLlib
8. Performance improvements in Spark ML over Spark MLlib
9. Comparing algorithms supported by MLlib
10. MLlib supported methods and developer APIs
11. MLlib vision
12. MLlib versions compared
13. Summary
4. Obtaining,Processing,and Preparing Data with Spark
1. Accessing publicly available datasets
2. Exploring and visualizing your data
3. Processing and transforming your data
4. Extracting useful features from your data
5. Summary
5. Building a Recommendation Engine with Spark
1. Types of recommendation models
2. Extracting the right features from your data
3. Training the recommendation model
4. Using the recommendation model
5. Evaluating the performance of recommendation models
6. FP-Growth algorithm
7. Summary
6. Building a Classification Model with Spark
1. Types of classification models
2. Extracting the right features from your data
3. Training classification models
4. Using classification models
5. Improving model performance and tuning parameters
6. Additional features
7. Summary
7. Building a Regression Model with Spark
1. Types of regression models
2. Evaluating the performance of regression models
3. Extracting the right features from your data
4. Training and using regression models
5. Improving model performance and tuning parameters
6. Summary
8. Building a Clustering Model with Spark
1. Types of clustering models
2. Extracting the right features from your data
3. K-means – training a clustering model
4. K-means – evaluating the performance of clustering models
5. Effect of iterations on WSSSE
6. Bisecting KMeans
7. Bisecting K-means – training a clustering model
8. Gaussian Mixture Model
9. Summary
9. Dimensionality Reduction with Spark
1. Types of dimensionality reduction
2. Extracting the right features from your data
3. Training a dimensionality reduction model
4. Using a dimensionality reduction model
5. Evaluating dimensionality reduction models
6. Summary
10. Advanced Text Processing with Spark
1. What’s so special about text data?
2. Extracting the right features from your data
3. Using a tf-idf model
4. Evaluating the impact of text processing
5. Text classification with Spark 2.0
6. Word2Vec models
7. Word2Vec with Spark ML on the 20 Newsgroups dataset
8. Summary
11. Real-Time Machine Learning with Spark Streaming
1. Online learning
2. Stream processing
3. Online learning with Spark Streaming
4. Online model evaluation
5. Structured Streaming
6. Summary
12. Pipeline APIs for Spark ML
1. Introduction to pipelines
2. How pipelines work
3. Machine learning pipeline with an example
4. Summary
主要特征
掌握最新版本的Apache Spark
利用Spark的机器学习库来实现预测分析
利用Spark强大的工具来加载,分析,清理和转换数据
图书说明
本书将教你有关流行的机器学习算法及其实现。您将学习如何在Spark ML的上下文中实现各种机器学习概念。您将首先在单个和多节点集群中安装Spark。接下来,您将看到如何为Spark ML执行基于Scala和Python的程序。然后,我们将采取几个数据集,并进一步深入聚类,分类和回归。最后,我们还将使用Spark ML进行文字处理。
一旦了解了这些概念,就可以将它们应用于在绿色领域实现中实现算法,或将现有系统迁移到这个新平台。您可以从Mahout或Scikit迁移到使用Spark ML。
在本书的最后,您将获得利用Spark的功能来创建自己的可扩展机器学习应用程序并为现代数据驱动业务提供支持的技能。
你会学到什么
使用最新版本的Spark ML进行操作
用Scala和Python创建您的第一个Spark程序
在自己的电脑上以及Amazon EC2上设置和配置Spark的开发环境
访问公共机器学习数据集,并使用Spark加载,处理,清理和转换数据
使用Spark的机器学习库,通过利用知名的机器学习模型实现程序
处理大型文本数据,包括特征提取和使用文本数据作为机器学习模型的输入
编写Spark函数来评估机器学习模型的性能
目录
用Spark开始运行
1.在本地安装和设置Spark
火花簇
Spark编程模型
4. SchemaRDD
Spark数据框
6. Scala中Spark项目的第一步
7. Java中Spark程序的第一步
8.在Python中的Spark程序的第一步
9. R中Spark项目的第一步
在Amazon EC2上运行Spark
11.在Amazon弹性贴图上配置和运行Spark
12.在Spark中的UI
由Spark支持的机器学习算法
14.与现有库相比,使用Spark ML的好处
15. Google Compute Engine上的Spark Cluster – DataProc
总结
机器学习数学
线性代数
梯度下降
以前,可能性和后果
微积分
绘图
总结
3.设计机器学习系统
什么是机器学习?
2.介绍MovieStream
3.机器学习系统的业务用例
机器学习模型的类型
5.数据驱动机器学习系统的组件
6.机器学习系统的架构
火花MLlib
Spark ML在Spark MLlib上的性能改进
比较MLlib支持的算法
10. MLlib支持的方法和开发人员API
11. MLlib愿景
12. MLlib版本相比
总结
4.使用Spark获取,处理和准备数据
1.访问公开的数据集
探索和可视化您的数据
3.处理和转换您的数据
4.从您的数据中提取有用的功能
总结
用Spark构建推荐引擎
推荐模式的类型
2.从数据中提取正确的特征
3.培训推荐模式
4.使用推荐模型
5.评估推荐模型的性能
FP-Growth算法
7.总结
用Spark构建分类模型
分类模型的类型
2.从数据中提取正确的特征
3.训练分类模型
4.使用分类模型
5.提高模型性能和调整参数
6.附加功能
7.总结
用Spark构建回归模型
回归模型的类型
评估回归模型的表现
3.从数据中提取正确的特征
4.训练和运用回归模型
5.提高模型性能和调整参数
总结
用Spark构建集群模型
聚类模型的类型
2.从数据中提取正确的特征
K-means – 训练聚类模型
K-means – 评估聚类模型的性能
5.迭代对WSSSE的影响
平分KMeans
7.平分K均值 – 训练聚类模型
高斯混合模型
9.总结
火花减少尺寸
降维的类型
2.从数据中提取正确的特征
3.训练降维模型
4.使用维数降低模型
5.评估维数降低模型
总结
10.使用Spark进行高级文本处理
文本数据有什么特别之处?
2.从数据中提取正确的特征
3.使用tf-idf模型
4.评估文本处理的影响
5.使用Spark 2.0进行文本分类
6. Word2Vec机型
7.在20个新闻组数据集上使用Spark ML的Word2Vec
8.总结
用Spark Streaming实时机器学习
在线学习
流处理
3.使用Spark Streaming进行在线学习
4.在线模型评估
5.结构化流
总结
12. Spark ML的管道API
管道介绍
管道如何工作
以机器学习管道为例
4.总结

打赏
未经允许不得转载:finelybook » Machine Learning with Spark,2nd Edition

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫