Data Algorithms: Recipes for Scaling Up with Hadoop and Spark


Data Algorithms: Recipes for Scaling Up with Hadoop and Spark
Authors: Mahmoud Parsian
ISBN-10: 1491906189
ISBN-13: 9781491906187
Edition 版本:‏ 1
Released: 2015-08-01
Print Length 页数: 778 pages

Book Description


If you are ready to dive into the MapReduce framework for processing large datasets,this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem,such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects.
Dr. Mahmoud Parsian covers basic design patterns,optimization techniques,and data mining and machine learning solutions for problems in bioinformatics,genomics,statistics,and social network analysis. This book also includes an overview of MapReduce,Hadoop,and Spark.
Topics include:
Market basket analysis for a large set of transactions
Data mining algorithms (K-means,KNN,and Naive Bayes)
Using huge genomic data to sequence DNA and RNA
Naive Bayes theorem and Markov chains for data and market prediction
Recommendation algorithms and pairwise document similarity
Linear regression,Cox regression,and Pearson correlation
Allelic frequency and mining DNA
Social network analysis (recommendation systems,counting triangles,sentiment analysis)
Foreword Preface
1. Secondary Sort: Introduction
2. Secondary Sort: A De1ailed Example
3. Top 10 List
4. Left Outer Join5. Order Inversion
6. Moving Average
7. Market Basket Analysis
8. Common Friends
9. Recommendation Engines Using MapReduce
10. Content-Based Recommendation: Movies
11. Smarter Email Marketing with the Markov Model
12.K-Means Clustering
13.k-Nearest Neighbors
14. Naive Bayes
15. Sentiment Analysis
16. Finding,Counting,and Listing All Triangles in Large Graphs
17.K-mer Counting18. DNA Sequencing19. Cox Regression
20. Cochran-Armitage Test for Trend
21. Allelic Frequency
22. The T-Test
23. Pearson Correlation
24. DNA Base Count25. RNA Sequencing
26. Gene Aggregation
27. Linear Regression
28. MapReduce and Monoids
29. The Small Files Problem
30. Huge Cache for MapReduce
31. The Bloom Filter A. Bioset B. Spark RDDsBibliography Index

下载地址 Download解决验证以访问链接!
打赏
未经允许不得转载:finelybook » Data Algorithms: Recipes for Scaling Up with Hadoop and Spark

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫