Spark in Action, Second Edition 版本: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Author: Jean-Georges Perrin (Author)
Publisher finelybook 出版社: Manning
Edition 版本: 2nd
Publication Date 出版日期: 2020-06-02
Language 语言: English
Print Length 页数: 576 pages
ISBN-10: 1617295523
ISBN-13: 9781617295522
Book Description
Summary
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.
Foreword by Rob Thomas.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.
About the book
What’s inside
Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL
About the reader
This book does not assume previous experience with Spark, Scala, or Hadoop.
About the author
Table of Contents
PART 1 – THE THEORY CRIPPLED BY AWESOME EXAMPLES
1 So, what is Spark, anyway?
2 Architecture and flow
3 The majestic role of the dataframe
4 Fundamentally lazy
5 Building a simple app for deployment
6 Deploying your simple app
PART 2 – INGESTION
7 Ingestion from files
8 Ingestion from databases
9 Advanced ingestion: finding data sources and building
your own
10 Ingestion through structured streaming
PART 3 – TRANSFORMING YOUR DATA
11 Working with SQL
12 Transforming your data
13 Transforming entire documents
14 Extending transformations with user-defined functions
15 Aggregating your data
PART 4 – GOING FURTHER
16 Cache and checkpoint: Enhancing Spark’s performances
17 Exporting data and building full data pipelines
18 Exploring deployment