Big Data Analytics with R


Big Data Analytics with R
by Simon Walkowiak
Print Length 页数: 506 pages
Publisher finelybook 出版社:‏ Packt Publishing (29 July 2016)
Language 语言: English
ISBN-10: 1786466457
ISBN-13: 9781786466457
Key Features
Perform computational analyses on Big Data to generate meaningful results
Get a practical knowledge of R programming language while working on Big Data platforms like Hadoop,Spark,H2O and SQL/NoSQL databases,
Explore fast,streaming,and scalable data analysis with the most cutting-edge technologies in the market

Book Description


Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science,consisting of powerful functions to tackle all problems related to Big Data processing.
The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development,structure,applications in real world,and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS,Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem,HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark,its machine learning library Spark MLlib,as well as H2O.
What you will learn
Learn about current state of Big Data processing using R programming language and its powerful statistical capabilities
Deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner
Apply the R language to real-world Big Data problems on a multi-node Hadoop cluster,e.g. electricity consumption across various socio-demographic indicators and bike share scheme usage
Explore the compatibility of R with Hadoop,Spark,SQL and NoSQL databases,and H2O platform
About the Author
Simon Walkowiak is a cognitive neuroscientist and a managing director of Mind Project Ltd – a Big Data and Predictive Analytics consultancy based in London,United Kingdom. As a former data curator at the UK Data Service (UKDS,University of Essex) – European largest socio-economic data repository,Simon has an extensive experience in processing and managing large-scale datasets such as censuses,sensor and smart meter data,telecommunication data and well-known governmental and social surveys such as the British Social Attitudes survey,Labour Force surveys,Understanding Society,National Travel survey,and many other socio-economic datasets collected and deposited by Eurostat,World Bank,Office for National Statistics,Department of Transport,NatCen and International Energy Agency,to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international companies. He has also taught a course in Big Data Methods in R at major UK universities and at the prestigious Big Data and Analytics Summer School organized by the Institute of Analytics and Data Science (IADS).
Contents
The Era of Big Data
Introduction to R Programming Language and Statistical Environment
Unleashing the Power of R from Within
Hadoop and MapReduce Framework for R
R with Relational Database Management Systems (RDBMSs)
R with Non-Relational (NoSQL) Databases
Faster than Hadoop – Spark with R
Machine Learning Methods for Big Data in R
The Future of R – Big,Fast,and Smart Data
主要特征
对大数据进行计算分析以产生有意义的结果
在大数据平台上工作,如Hadoop,Spark,H2O和SQL / NoSQL数据库,获得R编程语言的实践知识,
利用市场上最先进的技术,探索快速,流媒体和可扩展的数据分析
图书说明
大数据分析是检查经常超出计算能力的大型和复杂数据集的过程。 R是数据科学领先的编程语言,由处理与Big Data处理相关的所有问题的强大功能组成。
该书将首先简要介绍大数据世界及其当前行业标准。介绍R语言并介绍其在现实世界中的发展,结构,应用及其缺点。本书将进一步修订主要的R功能,用于数据管理和转换。读者将介绍基于云的大数据解决方案(例如Amazon EC2实例和Amazon RDS,Microsoft Azure及其HDInsight集群),并提供与关系和非关系数据库(如MongoDB和HBase等)的R连接的指导。它将进一步扩展到包括大型数据工具,如Apache Hadoop生态系统,HDFS和MapReduce框架。还有其他R兼容工具,如Apache Spark,其机器学习库Spark MLlib,以及H2O。
你会学到什么
了解使用R编程语言及其强大的统计功能的大数据处理的当前状态
以成本效益和省时的方式部署大型数据分析平台,选择R支持的大数据工具
将R语言应用于多节点Hadoop集群上的现实世界大数据问题,例如各种社会人口指标和自行车共享计划用途的电力消耗
探索R与Hadoop,Spark,SQL和NoSQL数据库以及H2O平台的兼容性
关于作者
Simon Walkowiak是Mind Project Ltd的认知神经科学家和总经理,Mind Project Ltd是一家位于英国伦敦的大数据和预测分析咨询公司。作为欧洲最大的社会经济数据库英国数据服务(UKDS,埃塞克斯大学)的前数据策展人,Simon在处理和管理大规模数据集方面拥有丰富的经验,如普查,传感器和智能电表数据,电信数据和众所周知的政治和社会调查,如英国社会态度调查,劳动力调查,理解社会,国家旅游调查和许多其他社会经济数据集,由欧统局,世行,国家统计局,的运输,国家能源署和国际能源机构,仅提及一些。 Simon在公共机构和国际公司提供了大量数据科学和R培训课程。他还在英国主要大学和由分析和数据科学研究所(IADS)组织的着名的大数据和分析暑期学校教授了大数据方法课程。
目录
大数据时代
R编程语言与统计环境介绍
从内部释放R的力量
Rado的Hadoop和MapReduce框架
R与关系数据库管理系统(RDBMS)
R与非关系(NoSQL)数据库
比Hadoop快 – 火花与R
R中大数据的机器学习方法
R – 大,快速和智能数据的未来

打赏
未经允许不得转载:finelybook » Big Data Analytics with R

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫