Mastering Parallel Programming with R


Mastering Parallel Programming with R
by Simon R. Chapple and Eilidh Troup
pages 页数: 244 pages
Publisher Finelybook 出版社: Packt Publishing (31 May 2016)
Language 语言: English
ISBN-10 书号: 1784394009
ISBN-13 书号: 9781784394004
B017XSFKFG


Book Description
M
aster the robust features of R parallel programming to accelerate your data science computations
About This Book
Create R programs that exploit the computational capability of your cloud platforms and computers to the fullest
Become an expert in writing the most efficient and highest performance parallel algorithms in R
Get to grips with the concept of parallelism to accelerate your existing R programs

Who this book is for
This book is for R programmers who want to step beyond its inherent single-threaded and restricted memory limitations and learn how to implement highly accelerated and scalable algorithms that are a necessity for the performant processing of Big Data. No previous knowledge of parallelism is required. This book also provides for the more advanced technical programmer seeking to go beyond high level parallel frameworks.

What you will learn
Create and structure efficient load-balanced parallel computation in R,using R's built-in parallel package
Deploy and utilize cloud-based parallel infrastructure from R,including launching a distributed computation on Hadoop running on Amazon Web Services (AWS)
Get accustomed to parallel efficiency,and apply simple techniques to benchmark,measure speed and target improvement in your own code
Develop complex parallel processing algorithms with the standard Message Passing Interface (MPI) using RMPI,pbdMPI,and SPRINT packages
Build and extend a parallel R package (SPRINT) with your own MPI-based routines
Implement accelerated numerical functions in R utilizing the vector processing capability of your Graphics Processing Unit (GPU) with OpenCL
Understand parallel programming pitfalls,such as deadlock and numerical instability,and the approaches to handle and avoid them
Build a task farm master-worker,spatial grid,and hybrid parallel R programs
In Detail
R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources.
Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques,from simple use of R's built-in parallel package versions of lapply(),to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing,applicable to clusters and supercomputers,and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book,you will understand the factors that influence parallel efficiency,including assessing code performance and implementing load balancing; pitfalls to avoid,including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.
Style and approach
This book leads you chapter by chapter from the easy to more complex forms of parallelism. The author's insights are presented through clear practical examples applied to a range of different problems,with comprehensive reference information for each of the R packages employed. The book can be read from start to finish,or by dipping in chapter by chapter,as each chapter describes a specific parallel approach and technology,so can be read as a standalone.
Contents
Chapter 1. Simple Parallelism with R
Chapter 2. Introduction to Message Passing
Chapter 3. Advanced Message Passing
Chapter 4. Developing SPRINT,an MPI-Based R Package for Supercomputers
Chapter 5. The Supercomputer in Your Laptop
Chapter 6. The Art of Parallel Programming
掌握R并行编程的强大功能,加速您的数据科学计算
关于这本书
创建R程序,充分利用您的云平台和计算机的计算能力
成为R中最有效和最高性能并行算法的专家
掌握并行概念来加速您现有的R程序
这本书是谁
这本书适用于想要超越其固有的单线程和受限内存限制的R程序员,并学习如何实现高速加速和可扩展的算法,这对于大数据的执行处理是必不可少的。以前不需要并行知识。这本书还提供了更先进的技术程序员寻求超越高水平的并行框架。
你会学到什么
使用R的内置并行包,在R中创建并构建有效的负载平衡并行计算
从R部署和利用基于云的并行基础架构,包括在Amazon Web Services(AWS)上运行的Hadoop上启动分布式计算,
习惯于并行效率,并将简单的技术应用于基准测试,测量速度和目标改进
使用标准的消息传递接口(MPI),使用RMPI,pbdMPI和SPRINT包开发复杂的并行处理算法
使用您自己的基于MPI的例程构建并扩展并行R包(SPRINT)
利用OpenCL的图形处理单元(GPU)的矢量处理能力,在R中实现加速数字功能
了解并行编程陷阱,如死锁和数值不稳定,以及处理和避免它们的方法
构建任务场主人,空间网格和混合并行R程序
详细
R是数据科学中最流行的编程语言之一。将R应用于大数据和复杂的分析任务需要利用可扩展的计算资源。
使用R进行并行编程的编程提供了一个关于如何在R中构建高度可扩展和高效的算法的全面实用的论文。它将教你多种并行化技术,从简单的使用R的内置并行包版本的lapply()到高级AWS基于云的Hadoop和Apache Spark框架。它还将教您使用RMPI和pbdMPI进行低级别的可扩展并行编程,用于消息传递,适用于集群和超级计算机,以及如何通过ROpenCL利用千倍简单的处理器GPU。在本书的最后,您将了解影响并行效率的因素,包括评估代码性能和实现负载平衡;避免陷阱,包括僵局和数字不稳定问题;如何构建您的代码和数据为您的问题域最合适的并行类型类型;以及如何从运行在各种计算机系统上的R代码中提取最大性能。
风格和方法
本书从简单到更复杂的并行形式逐章引导您。作者的见解是通过应用于一系列不同问题的明确实践示例提供的,并提供了每个使用的R包的全面参考信息。这本书可以从头到尾阅读,或者逐章地阅读,因为每章都会描述一种具体的并行方法和技术,所以可以单独阅读。
目录
第一章简单并行与R
第2章消息传递简介
第3章高级消息传递
第4章开发SPRINT,用于超级计算机的基于MPI的R软件包
第五章笔记本电脑中的超级计算机
第六章并行编程的艺术

下载地址 Download
打赏
未经允许不得转载:finelybook » Mastering Parallel Programming with R

相关推荐

  • 暂无文章

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫打赏

微信扫一扫打赏