Apache Spark 2 Cookbook,2nd Edition

Apache Spark 2 Cookbook Second EditionApache Spark 2 Cookbook Second Edition
by: Rishi Yadav
ISBN-10: 1787127265
ISBN-13: 9781787127265
Edition 版本:‏ 2nd Revised edition
Released: 2017-07-06
Pages: 322
Publisher finelybook 出版社:‏ Packt

Book Description


Key Features
Contains recipes on solving real-time data-processing problems with Apache Spark
Utilize core Spark modules such as Spark SQL,Spark MLlib,Spark Streaming,and GraphX processing
A practical guide to help you master Apache Spark as your single big data computing platform

Book Description


While Apache Spark 1.x gained lot of traction and adoption in the early years,Spark 2.0 delivers very notable improvements in the areas of API,Performance,Structured Streaming,and simplifying building blocks to build better,faster,smarter,and accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.
Starting with installing and configuring Apache Spark with various cluster managers,you will learn to set up development environments. Furthermore,you will be introduced to working with RDD’s,Data Frames to operate on data with schemas,and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning,including supervised learning,unsupervised learning,recommendation engines,deep learning algorithms,and GPU implementations on Spark.
Last but not the least,the final few chapters will help you delve more deeply into the concepts of graph processing using GraphX,securing your implementations,cluster optimization,and troubleshooting.
What you will learn
Install and configure Apache Spark with various cluster managers
Set up a development environment for Apache Spark
Learn to operate on data in Spark with schemas
Get to grips with real-time streaming analytics using Spark Streaming
Master supervised learning and unsupervised learning using MLlib
Build a recommendation engine using MLlib
Use Tensorframes to manipulate Spark’s DataFrames with TensorFlow programs for deep learning
Develop a set of common applications or project types,and solutions that solve complex big data problems
Contents
Chapter 1. Getting Started with Apache Spark
Chapter 2. Developing Applications with Spark
Chapter 3. Spark SQL
Chapter 4. Working with External Data Sources
Chapter 5. Spark Streaming
Chapter 6. Getting Started with Machine Learning
Chapter 7. Supervised Learning with MLlib — Regression
Chapter 8. Supervised Learning with MLlib — Classification
Chapter 9. Unsupervised Learning
Chapter 10. Recommendations Using Collaborative Filtering
Chapter 11. Graph Processing Using GraphX and GraphFrames
Chapter 12. Optimizations and Performance Tuning
主要特征
包含解决Apache Spark实时数据处理问题的方法
利用核心Spark模块,如Spark SQL,Spark MLlib,Spark Streaming和GraphX处理
一个实用的指南,帮助您掌握Apache Spark作为您的大型数据计算平台
图书说明
虽然Apache Spark 1.x在早期获得了很大的牵引力和采用,但Spark 2.0在API,性能,结构化流媒体方面提供了非常显着的改进,并简化了构建块,以构建更好,更快,更智能和可访问的大数据应用。本书以结构化配方的形式发现所有这些功能,以分析和成熟大型和复杂的数据集。
从安装和配置Apache Spark与各种群集管理器开始,您将学习设置开发环境。此外,您将介绍如何使用RDD的数据帧来对具有模式的数据进行操作,以及使用诸如Twitter Stream和Apache Kafka之类的各种源实时流式传输。您还将通过有关机器学习的食谱,包括监督学习,无人值守学习,推荐引擎,深入学习算法和Spark上的GPU实现。
最后,最后几章将帮助您深入了解使用GraphX的图形处理概念,确保实现,集群优化和故障排除。
你会学到什么
使用各种群集管理器安装和配置Apache Spark
为Apache Spark设置开发环境
学习使用具有模式的Spark中的数据操作
使用Spark Streaming掌握实时流分析
大师使用MLlib监督学习和无监督学习
使用MLlib构建推荐引擎
使用Tensorframes来处理Spark的DataFrames,使用TensorFlow程序进行深度学习
开发一套常见的应用程序或项目类型以及解决复杂大数据问题的解决方案
目录
第1章Apache Spark入门
第2章使用Spark开发应用程序
第3章Spark SQL
第4章使用外部数据源
火花流
第6章机器学习入门
第七章MLLIB的回归学习
第八章MLLIB监督学习 – 分类
第九章无监督学习
第10章使用协同过滤的建议
第11章使用GraphX和GraphFrames进行图形处理
第12章优化和性能调优

下载地址 Download解决验证以访问链接!
打赏
未经允许不得转载:finelybook » Apache Spark 2 Cookbook,2nd Edition

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫