Advanced Analytics with Spark: Patterns for Learning from Data at Scale
by Sandy Ryza and Uri Laserson
Print Length 页数: 276 pages
Publisher finelybook 出版社: O’Reilly Media; 1 edition (20 April 2015)
Language 语言: English
ISBN-10: 1491912766
ISBN-13: 9781491912768
In this practical book,four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark,statistical methods,and real-world data sets together to teach you how to approach analytics problems by example.
You’ll start with an introduction to Spark and its ecosystem,and then dive into patterns that apply common techniques—classification,collaborative filtering,and anomaly detection among others—to fields such as genomics,security,and finance. If you have an entry-level understanding of machine learning and statistics,and you program in Java,Python,or Scala,you’ll find these patterns useful for working on your own data applications.
Patterns include:
Recommending music and the Audioscrobbler data set
Predicting forest cover with decision trees
Anomaly detection in network traffic with K-means clustering
Understanding Wikipedia with Latent Semantic Analysis
Analyzing co-occurrence networks with GraphX
Geospatial and temporal data analysis on the New York City Taxi Trips data
Estimating financial risk through Monte Carlo simulation
Analyzing genomics data and the BDG project
Analyzing neuroimaging data with PySpark and Thunder
在这本实用的书中,四个Cloudera数据科学家提供了一套独立的模式,用于使用Spark进行大规模数据分析。作者将Spark,统计方法和现实世界数据集合在一起,教你如何通过示例来解决分析问题。
您将首先介绍Spark及其生态系统,然后深入研究将常用技术 – 分类,协同过滤和异常检测等技术应用于基因组学,安全性和财务等领域。如果您具有对机器学习和统计信息的入门级了解,并且您可以使用Java,Python或Scala进行编程,那么您将发现这些模式对您自己的数据应用程序有用。
模式包括:
推荐音乐和Audioscrobbler数据集
用决策树预测森林覆盖
网络流量异常检测与K均值聚类
了解具有潜在语义分析的维基百科
使用GraphX分析同现网络
纽约市出租车旅行数据的地理空间和时间数据分析
通过蒙特卡罗模拟估算金融风险
分析基因组学数据和BDG项目
用PySpark和Thunder分析神经成像数据
Advanced Analytics with Spark: Patterns for Learning from Data at Scale
相关推荐
- Developing Apps with GPT-4 and ChatGPT: Build Intelligent Chatbots, Content Generators, and More 2nd Edition
- Head First C#: A Learner’s Guide to Real-World Programming with C# and .NET, 5th Edition
- Learning Microsoft Power Apps: Building Business Applications with Low-Code Technology
- Aerospike: Up and Running: Developing on a Modern Operational Database for Globally Distributed Apps
- ActivityPub: Programming for the Social Web
- Cloud Native Go: Building Reliable Services in Unreliable Environments, 2nd Edition