Scala: Guide for Data Science Professionals
by Pascal Bugnion,Arun Manivannan,Patrick R. Nicolas
Publisher finelybook 出版社: Packt Publishing; 1 edition (24 Feb. 2017)
Language 语言: English
ISBN 139781787282858
Paperback 1100 pages
Book Description
Scala is especially good for analyzing large sets of data as the scale of the task doesn’t have any significant impact on performance. Scala’s powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines.
The first module introduces you to Scala libraries to ingest,store,manipulate,process,and visualize data. Using real world examples,you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this,you will also learn how to build interactive visualizations with web frameworks.
Once you have become familiar with all the tasks involved in data science,you will explore data analytics with Scala in the second module. You’ll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You’ll get a sufficient understanding of Spark streaming,machine learning for streaming data,and Spark graphX.
Armed with a firm understanding of data analysis,you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You’ll explore Scala for dependency injections and implicits,which are used to write machine learning algorithms. You’ll also explore machine learning topics such as clustering,dimentionality reduction,Naïve Bayes,Regression models,SVMs,neural networks,and more.
This learning path combines some of the best that Packt has to offer into one complete,curated package. It includes content from the following Packt products:
Scala for Data Science,Pascal Bugnion
Scala Data Analysis Cookbook,Arun Manivannan
Scala for Machine Learning,Patrick R. Nicolas
What You Will Learn
Transfer and filter tabular data to extract features for machine learning
Read,clean,transform,and write data to both SQL and NoSQL databases
Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations
Load data from HDFS and HIVE with ease
Run streaming and graph analytics in Spark for exploratory analysis
Bundle and scale up Spark jobs by deploying them into a variety of cluster managers
Build dynamic workflows for scientific computing
Leverage open source libraries to extract patterns from time series
Master probabilistic models for sequential data
Authors
Pascal Bugnion
Pascal Bugnion is a data engineer at the ASI,a consultancy offering bespoke data science services. Previously,he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University.
Besides Scala,Pascal is a keen Python developer. He has contributed to NumPy,matplotlib and IPython. He also maintains scikit-monaco,an open source library for Monte Carlo integration. He currently lives in London,UK.
Arun Manivannan
Arun Manivannan has been an engineer in various multinational companies,tier-1 financial institutions,and start-ups,primarily focusing on developing distributed applications that manage and mine data. His languages of choice are Scala and Java,but he also meddles around with various others for kicks. He blogs at http://rerun.me.
Arun holds a master’s degree in software engineering from the National University of Singapore.
He also holds degrees in commerce,computer applications,and HR management. His interests and education could probably be a good dataset for clustering.
Contents
1: SCALA AND DATA SCIENCE
2: MANIPULATING DATA WITH BREEZE
3: PLOTTING WITH BREEZE-VIZ
4: PARALLEL COLLECTIONS AND FUTURES
5: SCALA AND SQL THROUGH JDBC
6: SLICK – A FUNCTIONAL INTERFACE FOR SQL
7: WEB APIS
8: SCALA AND MONGODB
9: CONCURRENCY WITH AKKA
10: DISTRIBUTED BATCH PROCESSING WITH SPARK
11: SPARK SQL AND DATAFRAMES
12: DISTRIBUTED MACHINE LEARNING WITH MLLIB
13: WEB APIS WITH PLAY
14: VISUALIZATION WITH D3 AND THE PLAY FRAMEWORK
15: GETTING STARTED WITH BREEZE
16: GETTING STARTED WITH APACHE SPARK DATAFRAMES
17: LOADING AND PREPARING DATA – DATAFRAME
18: DATA VISUALIZATION
19: LEARNING FROM DATA
20: SCALING UP
21: GOING FURTHER
22: GETTING STARTED
23: HELLO WORLD!
24: DATA PREPROCESSING
25: UNSUPERVISED LEARNING
26: NAÏVE BAYES CLASSIFIERS
27: REGRESSION AND REGULARIZATION
28: SEQUENTIAL DATA MODELS
29: KERNEL MODELS AND SUPPORT VECTOR MACHINES
30: ARTIFICIAL NEURAL NETWORKS
31: GENETIC ALGORITHMS
32: REINFORCEMENT LEARNING
33: SCALABLE FRAMEWORKS
图书说明
Scala特别适用于分析大量数据,因为任务规模对性能没有任何重大影响。 Scala强大的功能库可以与数据库进行交互,构建可扩展的框架,从而创建强大的数据流水线。
第一个模块向您介绍Scala库以摄取,存储,操作,处理和可视化数据。使用现实世界的例子,您将学习如何设计可扩展架构来处理和建模数据 – 从简单并发结构开始,并演进到演员系统和Apache Spark。之后,您还将学习如何使用Web框架构建交互式可视化。
熟悉数据科学中涉及的所有任务后,您将在第二个模块中使用Scala进行数据分析。您将看到Scala如何通过易于遵循的食谱来了解数据的意义。您将学习使用Spark ML库的算法进行探索性数据分析和典型机器学习的Bokeh绑定。您将充分了解Spark流,流式数据的机器学习和Spark graphX。
掌握数据分析的坚定理解,您将准备好探索数据科学 – 机器学习的最前沿的方面。最终的模块教你如何使用Scala进行机器学习的A到Z。您将探索Scala的依赖注入和隐含,用于编写机器学习算法。您还将探索机器学习主题,如聚类,维数减少,朴素贝叶斯,回归模型,支持向量机,神经网络等。
这个学习路径将Packt的一些最好的功能与一个完整的,策划的包装相结合。它包含以下Packt产品的内容:
数据科学Scala,Pascal Bugnion
Scala数据分析食谱,Arun Manivannan
机器学习Scala,Patrick R. Nicolas
你会学到什么
传输和过滤表格数据以提取机器学习的功能
读取,清理,转换和写入数据到SQL和NoSQL数据库
创建与JavaScript库(如D3)耦合的Scala Web应用程序,以创建引人注目的交互式可视化
从HDFS和HIVE轻松载入数据
在Spark中运行流分析和图表分析,进行探索性分析
通过将Spark部署到各种群集管理器来捆绑和扩展Spark作业
构建科学计算的动态工作流程
利用开源库从时间序列中提取模式
序列数据的主概率模型
作者
帕斯卡尔布尼奇
Pascal Bugnion是ASI的数据工程师,该公司是一家提供定制数据科学服务的咨询公司。此前,他还是SCL选举的数据工程主管。他拥有剑桥大学的计算物理博士学位。
除了Scala,Pascal是一个热衷于Python的开发人员。他为NumPy,matplotlib和IPython做出了贡献。他还维护了Scikit-Monaco,这是一个用于蒙特卡洛一体化的开源库。他目前住在英国伦敦。
阿伦·马尼瓦南
Arun Manivannan是多家跨国公司,一级金融机构和初创公司的工程师,主要致力于开发管理和挖掘数据的分布式应用程序。他的选择语言是Scala和Java,但他也和各种各样的人一起踢。他博客http://rerun.me。
Arun拥有新加坡国立大学的软件工程硕士学位。
他还拥有商业,计算机应用和人力资源管理学位。他的兴趣和教育可能是一个很好的聚类数据集。
目录
1: SCALA AND DATA SCIENCE
2: 使用BREEZE操作数据
3: 用BREEZE-VIZ绘制
4: 并行收藏与期货
5: 通过JDBC进行SCALA和SQL
6: SLICK – SQL的功能界面
7: WEB APIS
8: SCALA和MONGODB
9: 与AKKA约定
10: 用SPARK进行分配处理
11: SPARK SQL和DATAFRAMES
12: 分布式机器与MLLIB学习
13: 具有播放的WEB APIS
14: 可视化与D3和播放框架
15: 以BREEZE开始
16: 开始使用APACHE SPARK DATAFRAMES
17: 装载和准备数据 – 数据帧
18: 数据可视化
19: 从数据学习
20: 缩放
21: 进一步
22: 入门
23: HELLO WORLD!
24: 数据预处理
25: 不安全的学习
26: NAVEVE BAYES CLASSIFIERS
27: 回归和正常化
28: 序列数据模型
29: KERNEL模型和支持向量机
30: 人造神经网络
31: 遗传算法
32: 加强学习
33: 可扩展的框架