Mastering Java for Data Science
by Grigorev,Alexey
Print Length 页数: 364 pages
Publisher finelybook 出版社: Packt Publishing (28 April 2017)
Language 语言: English
ISBN-10: 1782174273
ISBN-13: 9781782174271
B01JLBMHMM
Key Features
An overview of modern Data Science and Machine Learning libraries available in Java
Coverage of a broad set of topics,going from the basics of Machine Learning to Deep Learning and Big Data frameworks.
Easy-to-follow illustrations and the running example of building a search engine.
Book Description
By finelybook
Java is the most popular programming language,according to the TIOBE index,and it is a very typical choice for running production systems in many companies,both in the startup world and among large enterprises.
Not surprisingly,it is also a common choice for creating Data Science applications: it is fast,has a great set of data processing tools,both built-in and external. What is more,choosing Java for Data Science allows you to easily integrate the solutions with the existent software,and bring Data Science into production with less effort.
This book will teach you how to create Data Science applications with Java. First,we will revise the most important things when starting a Data Science application,and then brush up the basics of Java and Machine Learning before diving into more advanced topics.We start with going over the existing libraries for data processing and libraries with machine learning algorithms. After that,we cover topics such as classification and regression,dimensionality reduction and clustering,information retrieval and natural language processing,deep learning and big data.
Finally,we finish the book by talking about the ways to deploy the model and evaluate it in production settings.
Get a solid understanding of the data processing toolbox available in Java
Explore the Data Science ecosystem available in Java
Find out how to approach different Machine Learning problems with Java
Process unstructured information such as natural language texts or images
Create your own search engine
Get state-of-the-art performance with XGBoost
Learn to build deep neural networks with DeepLearning4j
Build applications that scale and process large amounts of data
Deploy the Data Science models to production and evaluate their performance
About the Author
Alexey Grigorev is a skilled data scientist,Machine Learning engineer,and software developer with more than 7 years of professional experience.
He started his career as a Java developer working at a number of large and small companies,but after a while,he switched to Data Science. Right now Alexey works as a data scientist at Searchmetrics,wherein his day-to-day job he actively uses Java and Python for data cleaning,data analysis,and modeling.
His areas of expertise are Machine Learning and Text Mining,but he also enjoys working on a broad set of problems,which is why he often participates in Data Science competitions on platforms such as kaggle.com.
You can connect with Alexey on LinkedIn at https://de.linkedin.com/in/agrigorev.
Contents
Chapter 1. Questions
Chapter 2. Data Science Using Java
Chapter 3. Data science in Java
Chapter 4. Summary
Chapter 5. Summary
Chapter 6. Unsupervised Learning – Clustering and Dimensionality Reduction
Chapter 7. Working with Text – Natural Language Processing and Information Retrieval
Chapter 8. Extreme Gradient Boosting
Chapter 9. Deep Learning with DeepLearning4J
Chapter 10. Scaling Data Science
Chapter 11. Deploying Data Science Models
主要特征
Java中可用的现代数据科学和机器学习库概述
涵盖广泛的主题,从机器学习到深度学习和大数据框架的基础。
易于理解的插图和构建搜索引擎的运行示例。
图书说明
根据TIOBE指数,Java是最受欢迎的编程语言,它是在许多公司,无论是在初创企业还是在大型企业之间运行生产系统的典型选择。
毫不奇怪,它也是创建数据科学应用程序的常用选择: 它是快速的,具有内置和外部的一整套数据处理工具。更重要的是,为数据科学选择Java可让您轻松将解决方案与现有软件集成,并以较少的努力将数据科学投入生产。
本书将教您如何使用Java创建数据科学应用程序。首先,我们将在启动Data Science应用程序时修改最重要的事情,然后在深入了解更多高级主题之前,先刷新Java和机器学习的基础知识。我们首先介绍现有的数据处理库和机器学习库算法。之后,我们讨论了分类和回归,维数降低和聚类,信息检索和自然语言处理,深度学习和大数据等主题。
最后,我们通过谈论在生产环境中部署模型并进行评估的方法来完成这本书。
了解Java中可用的数据处理工具箱
探索Java中提供的数据科学生态系统
了解如何使用Java来处理不同的机器学习问题
处理非结构化信息,如自然语言文本或图像
创建自己的搜索引擎
使用XGBoost获得最先进的表现
学习使用DeepLearning4j构建深层神经网络
构建扩展和处理大量数据的应用程序
将数据科学模型部署到生产并评估其性能
关于作者
Alexey Grigorev是一位熟练的数据科学家,机器学习工程师,以及具有7年以上专业经验的软件开发人员。
他开始了自己的职业生涯,在一些大型和小型公司工作的Java开发人员,但过了一段时间,他转而使用数据科学。现在,Alexey在Searchmetrics担任数据科学家,他的日常工作是积极使用Java和Python进行数据清理,数据分析和建模。
他的专业领域是机器学习和文本挖掘,但他也喜欢研究一系列广泛的问题,这就是为什么他经常参与像kaggle.com这样的平台上的数据科学比赛。
您可以通过https://de.linkedin.com/in/agrigorev与LinkedIn上的Alexey联系。
目录
问题
第2章数据科学使用Java
第3章Java中的数据科学
第四章总结
第五章总结
第6章无监督学习 – 聚类和维数减少
第7章使用文本 – 自然语言处理和信息检索
第八章极限渐变提升
第9章DeepLearning4J深入学习
第10章扩展数据科学
第11章部署数据科学模型