Mastering Spark for Data Science

Mastering Spark for Data Science
by Andrew Morgan, Antoine Amend, Matthew Hallett
pages 页数:560 pages
Publisher Finelybook 出版社:Packt Publishing (30 Mar. 2017)
Language 语言:English
ISBN-10 书号:1785882147
ISBN-13 书号:9781785882142


Book Description
Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs.
This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more.
You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly.
Contents
1:THE BIG DATA SCIENCE ECOSYSTEM
2:DATA ACQUISITION
3:INPUT FORMATS AND SCHEMA
4:EXPLORATORY DATA ANALYSIS
5:SPARK FOR GEOGRAPHIC ANALYSIS
6:SCRAPING LINK-BASED EXTERNAL DATA
7:BUILDING COMMUNITIES
8:BUILDING A RECOMMENDATION SYSTEM
9:NEWS DICTIONARY AND REAL-TIME TAGGING SYSTEM
10:STORY DE-DUPLICATION AND MUTATION
11:ANOMALY DETECTION ON SENTIMENT ANALYSIS
12:TRENDCALCULUS
13:SECURE DATA
14:SCALABLE ALGORITHMS
What You Will Learn
Learn the design patterns that integrate Spark into industrialized data science pipelines
See how commercial data scientists design scalable code and reusable code for data science services
Explore cutting edge data science methods so that you can study trends and causality
Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs
Find out how Spark can be used as a universal ingestion engine tool and as a web scraper
Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining
Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams
Study advanced Spark concepts, solution design patterns, and integration architectures
Demonstrate powerful data science pipelines
Authors
Andrew Morgan
Andrew Morgan is a specialist in data strategy and its execution, and has deep experience in the supporting technologies, system architecture, and data science that bring it to life. With over 20 years of experience in the data industry, he has worked designing systems for some of its most prestigious players and their global clients – often on large, complex and international projects. In 2013, he founded ByteSumo Ltd, a data science and big data engineering consultancy, and he now works with clients in Europe and the USA. Andrew is an active data scientist, and the inventor of the TrendCalculus algorithm. It was developed as part of his ongoing research project investigating long-range predictions based on machine learning the patterns found in drifting cultural, geopolitical and economic trends. He also sits on the Hadoop Summit EU data science selection committee, and has spoken at many conferences on a variety of data topics. He also enjoys participating in the Data Science and Big Data communities where he lives in London.
Antoine Amend
Antoine Amend is a data scientist passionate about big data engineering and scalable computing. The book’s theme of torturing astronomical amounts of unstructured data to gain new insights mainly comes from his background in theoretical physics. Graduating in 2008 with a Msc. in Astrophysics, he worked for a large consultancy business in Switzerland before discovering the concept of big data at the early stages of Hadoop. He has embraced big data technologies ever since, and is now working as the Head of Data Science for cyber security at Barclays Bank. By combining a scientific approach with core IT skills, Antoine qualified two years running for the Big Data World Championships finals held in Austin TX. He Placed in the top 12 in both 2014 and 2015 edition (over 2000+ competitors) where he additionally won the Innovation Award using the methodologies and technologies explained in this book.
Continue reading

  • 下载地址:应版权方要求,该资源内容链接已移除!

    你可以 登录 后获取帮助.

  • 赞(0) 觉得文章有用就打赏一下
    未经允许不得转载:finelybook » Mastering Spark for Data Science

    评论 下载问题及网盘链接失效反馈!

    评论前必须登录!

    觉得文章有用就打赏一下

    支付宝扫一扫打赏

    微信扫一扫打赏