Apache Spark in 24 Hours,Sams Teach Yourself
by Jeffrey Aven
Print Length 页数: 592 pages
Publisher finelybook 出版社: Sams; 01 edition (17 Aug. 2016)
Language 语言: English
ISBN-10: 0672338513
ISBN-13: 9780672338519
B01LBA79II
Apache Spark is a fast,scalable,and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less,Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed,scalability,simplicity,and versatility.
This book’s straightforward,step-by-step approach shows you how to deploy,program,optimize,manage,integrate,and extend Spark–now,and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing,real-time stream processing,machine learning,and more. Every lesson builds on what you’ve already learned,giving you a rock-solid foundation for real-world success.
Whether you are a data analyst,data engineer,data scientist,or data steward,learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data.
Learn how to
Discover what Apache Spark does and how it fits into the Big Data landscape
Deploy and run Spark locally or in the cloud
Interact with Spark from the shell
Make the most of the Spark Cluster Architecture
Develop Spark applications with Scala and functional Python
Program with the Spark API,including transformations and actions
Apply practical data engineering/analysis approaches designed for Spark
Use Resilient Distributed Datasets (RDDs) for caching,persistence,and output
Optimize Spark solution performance
Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra)
Leverage cutting-edge functional programming techniques
Extend Spark with streaming,R,and Sparkling Water
Start building Spark-based machine learning and graph-processing applications
Explore advanced messaging technologies,including Kafka
Preview and prepare for Spark’s next generation of innovations
Instructions walk you through common questions,issues,and tasks; Q-and-As,Quizzes,and Exercises build and test your knowledge; “Did You Know?” tips offer insider advice and shortcuts; and “Watch Out!” alerts help you avoid pitfalls. By the time you’re finished,you’ll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.
Contents
Part I: Getting Started with Apache Spark
HOUR 1 Introducing Apache Spark
HOUR 2 Understanding Hadoop
HOUR 3 Installing Spark
HOUR 4 Understanding the Spark Application Architecture
HOUR 5 Deploying Spark in the Cloud
Part II: Programming with Apache Spark
HOUR 6 Learning the Basics of Spark Programming with RDDs
HOUR 7 Understanding MapReduce Concepts
HOUR 8 Getting Started with Scala
HOUR 9 Functional Programming with Python
HOUR 10 Working with the Spark API (Transformations and Actions)
HOUR 11 Using RDDs: Caching,Persistence,and Output
HOUR 12 Advanced Spark Programming
Part III: Extensions to Spark
HOUR 13 Using SQL with Spark
HOUR 14 Stream Processing with Spark
HOUR 15 Getting Started with Spark and R
HOUR 16 Machine Learning with Spark
HOUR 17 Introducing Sparkling Water (H20 and Spark)
HOUR 18 Graph Processing with Spark
HOUR 19 Using Spark with NoSQL Systems
HOUR 20 Using Spark with Messaging Systems
Part IV: Managing Spark
HOUR 21 Administering Spark
HOUR 22 Monitoring Spark
HOUR 23 Extending and Securing Spark
HOUR 24 Improving Spark Performance
Apache Spark是一种用于大数据系统的快速,可扩展和灵活的开源分布式处理引擎,是迄今为止最活跃的开源大数据项目之一。在24小时或以下的24节课中,Sams在24小时内教你自己的Apache Spark,帮助您构建实用的Big Data解决方案,利用Spark惊人的速度,可扩展性,简单性和多功能性。
本书直截了当的逐步介绍了如何在未来几年内部署,编程,优化,管理,集成和扩展Spark。您将发现如何创建强大的解决方案,包括云计算,实时流处理,机器学习等。每一课都建立在您已经学到的知识之上,为现实世界的成功奠定了坚实的基础。
无论您是数据分析师,数据工程师,数据科学家还是数据管理人员,学习Spark将帮助您提升自己的职业生涯,或在大数据繁荣地区开创新的职业生涯。
了解如何
了解Apache Spark的功能,以及它如何适应Big Data环境
在本地或云中部署和运行Spark
从外壳与Spark进行交互
充分利用Spark群集架构
使用Scala和功能Python开发Spark应用程序
使用Spark API进行编程,包括转换和操作
应用为Spark设计的实用数据工程/分析方法
使用弹性分布式数据集(RDD)进行缓存,持久性和输出
优化Spark解决方案性能
使用Spark与SQL(通过Spark SQL)和NoSQL(通过Cassandra)
利用前沿的功能编程技术
用Stream,R和Sparkling Water扩展Spark
开始构建基于Spark的机器学习和图形处理应用程序
探索高级消息传递技术,包括Kafka
预览并准备Spark的下一代创新
说明指导您解决常见问题,问题和任务; Q-and-As,测验和练习构建和测试您的知识; “你知道吗?”技巧提供内部建议和捷径;和“小心!警报可以帮助您避免陷阱。当您完成之后,您将很乐意使用Apache Spark解决广泛的大数据问题。
目录
第一部分: Apache Spark入门
HOUR 1简介Apache Spark
HOUR 2了解Hadoop
HOUR 3安装Spark
小时4了解Spark应用架构
HOUR 5在云中部署Spark
第二部分: 使用Apache Spark编程
HOUR 6使用RDD学习Spark编程的基础知识
小时7了解MapReduce概念
小时8 Scala入门
HOUR 9功能编程与Python
HOUR 10使用Spark API(转换和操作)
HOUR 11使用RDD: 缓存,持久性和输出
HOUR 12高级火花编程
第三部分: 火花扩展
使用SQL与Spark
HOUR 14流处理与Spark
HOUR 15 Spark和R入门
HOUR 16机器学习与火花
HOUR 17引进闪电水(H20和Spark)
HOUR 18图形处理与Spark
HOUR 19使用Spark与NoSQL系统
HOUR 20使用Spark与消息系统
第四部分: 管理Spark
HOUR 21管理Spark
HOUR 22监控Spark
HOUR 23延长和保护火花
HOUR 24提高火花性能
Apache Spark in 24 Hours,Sams Teach Yourself
相关推荐
- Just Enough Data Science and Machine Learning: Essential Tools and Techniques
- Cloud Native Application Protection Platforms: A Guide to CNAPPs and the Foundations of Comprehensive Cloud Security
- Data Science and Machine Learning for Non-Programmers: Using SAS Enterprise Miner
- Microsoft Copilot Pro Step by Step
- Practical Business Statistics, 8th Edition
- Mastering Unity Game Development with C#: Harness the full potential of Unity 2022 game development using C#