Advanced Analytics with Spark: Patterns for Learning from Data at Scale


Advanced Analytics with Spark: Patterns for Learning from Data at Scale
by Sandy Ryza and Uri Laserson
Print Length 页数: 276 pages
Publisher finelybook 出版社: O’Reilly Media; 1 edition (20 April 2015)
Language 语言: English
ISBN-10: 1491912766
ISBN-13: 9781491912768
In this practical book,four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark,statistical methods,and real-world data sets together to teach you how to approach analytics problems by example.
You’ll start with an introduction to Spark and its ecosystem,and then dive into patterns that apply common techniques—classification,collaborative filtering,and anomaly detection among others—to fields such as genomics,security,and finance. If you have an entry-level understanding of machine learning and statistics,and you program in Java,Python,or Scala,you’ll find these patterns useful for working on your own data applications.
Patterns include:
Recommending music and the Audioscrobbler data set
Predicting forest cover with decision trees
Anomaly detection in network traffic with K-means clustering
Understanding Wikipedia with Latent Semantic Analysis
Analyzing co-occurrence networks with GraphX
Geospatial and temporal data analysis on the New York City Taxi Trips data
Estimating financial risk through Monte Carlo simulation
Analyzing genomics data and the BDG project
Analyzing neuroimaging data with PySpark and Thunder
在这本实用的书中,四个Cloudera数据科学家提供了一套独立的模式,用于使用Spark进行大规模数据分析。作者将Spark,统计方法和现实世界数据集合在一起,教你如何通过示例来解决分析问题。
您将首先介绍Spark及其生态系统,然后深入研究将常用技术 – 分类,协同过滤和异常检测等技术应用于基因组学,安全性和财务等领域。如果您具有对机器学习和统计信息的入门级了解,并且您可以使用Java,Python或Scala进行编程,那么您将发现这些模式对您自己的数据应用程序有用。
模式包括:
推荐音乐和Audioscrobbler数据集
用决策树预测森林覆盖
网络流量异常检测与K均值聚类
了解具有潜在语义分析的维基百科
使用GraphX分析同现网络
纽约市出租车旅行数据的地理空间和时间数据分析
通过蒙特卡罗模拟估算金融风险
分析基因组学数据和BDG项目
用PySpark和Thunder分析神经成像数据

相关文件下载地址

下载地址 Download解决验证以访问链接!
打赏
未经允许不得转载:finelybook » Advanced Analytics with Spark: Patterns for Learning from Data at Scale

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫