Exploring Data with RapidMiner
By 作者: Andrew Chisholm
ISBN-10 书号: 1782169334
ISBN-13 书号: 9781782169338
Release Finelybook 出版日期: 2013-11-25
pages 页数: 162
Book Description to Finelybook sorting
Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.
Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.
Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses
real examples to help you understand how to set up processes, quickly.
This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.
1: SETTING THE SCENE
2: LOADING DATA
3: VISUALIZING DATA
4: PARSING AND CONVERTING ATTRIBUTES
6: MISSING VALUES
7: TRANSFORMING DATA
8: REDUCING DATA SIZE
9: RESOURCE CONSTRAINTS
11: TAKING STOCK
What You Will Learn
Import real data from files in multiple formats and from databases
Extract features from structured and unstructured data
Restructure, reduce, and summarize data to help you understand it more easily and process it more quickly
Visualize data in new ways to help you understand it
Detect outliers and methods to handle them
Detect missing data and implement ways to handle it
Understand resource constraints and what to do about them
Andrew Chisholm completed his degree in Physics from Oxford University nearly thirty years ago. This coincided with the growth in software engineering and it led him to a career in the IT industry. For the last decade he has been very involved in mobile telecommunications, where he is currently a product manager for a market-leading test and monitoring solution used by many mobile operators worldwide. Throughout his career, he has always maintained an active interest in all aspects of data. In particular, he has always enjoyed finding ways to extract value from data and presenting this in compelling ways to help others meet their objectives. Recently, he completed a Master’s in Data Mining and Business Intelligence with first class honors. He is a certified RapidMiner expert and has been using this product to solve real problems for several years. He maintains a blog where he shares some miscellaneous helpful advice on how to get the best out of RapidMiner. He approaches problems from a practical perspective and has a great deal of relevant hands-on experience with real data. This book draws this experience together in the context of exploring data—the first and most important step in a data mining process. He has published conference papers relating to unsupervised clustering and cluster validity measures and contributed a chapter called Visualizing cluster validity measures to an upcoming book entitled RapidMiner: Use Cases and Business Analytics Applications, Chapman & Hall/CRC