PySpark Algorithms

PySpark Algorithms: (PDF version) (Mahmoud Parsian) book cover

PySpark Algorithms: (PDF version) (Mahmoud Parsian)

Author(s): Mahmoud Parsian (Author)

  • Publication Date 出版日期: August 16, 2019
  • Edition 版本: 2nd
  • Language 语言: English
  • Print length 页数: 784 pages
  • ASIN: B07WQHTVCJ

Book Description

GitHub Source Code for PySpark Algorithms book:

https://github.com/mahmoudparsian/pyspark-algorithms

Sample chapters:

download: https://github.com/mahmoudparsian/pyspark-algorithms/tree/master/sample_chapters

This is an introductory book on PySpark.

This book is about PySpark: Python API for Spark.

Apache Spark is an analytics engine for large-scale

data processing. Spark is the open source cluster

computing system that makes data analytics fast

to write and fast to run. This book provides a

large set of recipes for implementing big data

processing and analytics using Spark and Python.

The goal of this book is to show working examples

in PySpark so that you can do your ETL and

analytics easier. You may cut and paste examples to

deliver your applications in PySpark.

This book introduces PySpark (Python API for Spark).

You can use PySpark to tackle big datasets quickly

through simple APIs in Python. You will learn how to

express parallel tasks and computations with just a

few lines of code, and cover applications from ETL,

simple batch jobs to stream processing and machine

learning.

With this book, you may dive into Spark capabilities

such as RDDs (resilient distributed datasets),

DataFrames (data as a table of rows and columns),

in memory caching, and the interactive PySpark

shell, where you may leverage Spark’s powerful built

in libraries, including Spark SQL, Spark Streaming,

and MLlib.

In this book, you will learn Spark’s transformations

and actions by a set of well-defined and working

examples. All examples are tested and working: this

means that you can copy-cut-paste to your desired

PySpark applications. Writing PySpark is much easier

than writing Spark applications in Java and PySpark

applications are not bulky at all when compared to

Java Spark.

In this book you will learn:

* Short introduction to Spark and PySpark

* Learn about RDDs, DataFrames, SQL with worked examples

* How to use important Spark transformations on RDDs (low-level APIs)

* How to use SQL and DataFrame

* How to read data from many different data sources

and represent them as RDDs and DataFrames

* Learn the power of Data Design Patterns

* Learn the basics of Monoids and how you should use them in MapReduce

* Learn the basics of GraphFrames for solving graph-related data problems

* Implement Logistic Regression algorithms using PySpark

* Basics of data partitioning and understand reduction transformations

Table of Contents

chap01: Introduction to PySpark

chap02: Hello World

chap03: Data Abstractions

chap04: Getting Started

chap05: Transformations in Spark

chap06: Reductions in Spark

chap07: DataFrames and SQL

chap08: Spark DataSources

chap09: Logistic Regression

chap10: Movie Recommendations

chap11: Graph Algorithms

chap12: Design Patterns and Monoids

Appendix A: How To Install Spark

Appendix B: How to Use Lambda Expressions

Appendix C: Questions And Answers (50+ QA)

Future chapters:

chap13: FP-Growth

chap14: LDA

chap15: Linear Regression

Amazon Page

下载地址

EPUB, PDF(conv) | 21 MB | 2019-10-10 | 注:修复失效网盘
下载地址 Download解决验证以访问链接!
打赏
未经允许不得转载:finelybook » PySpark Algorithms

评论 2

觉得文章有用就打赏一下文章作者

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫