Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows-finelybook

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Author:by Simon Aubury (Author), Ned Letcher (Author)

Publisher finelybook 出版社:‏‎Packt Publishing

Edition 版本:‏N/A

Publication Date 出版日期:‏2024-06-24

Language 语言:English

Print length 页数:382pages

ISBN-10:1803241004

ISBN-13:9781803241005

Book Description

Analyze and transform data efficiently with DuckDB, a versatile, modern, in-process SQL database

Key Features

Use DuckDB to rapidly load, transform, and query data across a range of sources and formats
Gain practical experience using SQL, Python, and R to effectively analyze data
Learn how open source tools and cloud services in the broader data ecosystem complement DuckDB’s versatile capabilities
Purchase of the print or Kindle book includes a free PDF eBook

Book Description

DuckDB is a fast in-process analytical database. Getting Started with DuckDB offers a practical overview of its usage. You’ll learn to load, transform, and query various data formats, including CSV, JSON, and Parquet. The book covers DuckDB’s optimizations, SQL enhancements, and extensions for specialized applications. Working with examples in SQL, Python, and R, you’ll explore analyzing public datasets and discover tools enhancing DuckDB workflows. This guide suits both experienced and new data practitioners, quickly equipping you to apply DuckDB’s capabilities in analytical projects. You’ll gain proficiency in using DuckDB for diverse tasks, enabling effective integration into your data workflows.

What you will learn

Understand the properties and applications of a columnar in-process database
Use SQL to load, transform, and query a range of data formats
Discover DuckDB’s rich extensions and learn how to apply them
Use nested data types to model semi-structured data and extract and model JSON data
Integrate DuckDB into your Python and R analytical workflows
Effectively leverage DuckDB’s convenient SQL enhancements
Explore the wider ecosystem and pathways for building DuckDB-powered data applications

Who this book is for

If you’re interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.

An Introduction to DuckDB
Loading Data into DuckDB
Data Manipulation with DuckDB
DuckDB Operations and Performance
DuckDB Extensions
Semi-Structured Data Manipulation
Setting up the DuckDB Python Client
Exploring DuckDB’s Python API
Exploring DuckDB’s R API
Using DuckDB Effectively
Hands-On Exploratory Data Analysis with DuckDB
DuckDB – The Wider Pond

Review

“In this excellent book, Simon and Ned have combined the practicalities of what you need to know now with a wealth of hints and tips for getting the most out of DuckDB. Tips for doing more, much more easily.

The chapter on DuckDB’s extensions is particularly fruitful if you’re looking to perform minor data miracles. You will learn how to pull raw data off S3, chew through it in seconds, and export it into an Excel spreadsheet, instantly becoming the favourite data guru of an entire marketing department.”

Kris Jenkins

Host of Developer Voices and Co-Founder of BullionVault

“Getting Started with DuckDB is a great resource, even for someone like me who’s already familiar with the tool. I was impressed by how well it organized DuckDB’s core features, providing fresh insights and practical examples that still managed to deepen my understanding. It’s not just a beginner’s guide—it offers valuable tips and optimizations for more advanced users too. Whether you’re revisiting the basics or finetuning your existing knowledge, this book covers it all in a clear and engaging way. It’s a great reference for any level of expertise.”

Stephanie Wang, Founding Engineer at MotherDuck

About the Author

Simon Aubury has been working in the IT industry since 2000 as a data engineering specialist. He has an extensive background in building large, flexible, highly available distributed data systems. Simon has delivered critical data systems for finance, transport, healthcare, insurance, and telecommunications clients in Australia, Europe, and Asia Pacific. In 2019, Simon joined Thoughtworks as a principal data engineer and today is associate director of data platforms at Simple Machines in Sydney, Australia. Simon is active in the data community, a regular conference speaker, and the organizer of local and international meetups and data engineering conferences.

Ned Letcher has worked as a data science and software engineering consultant since completing his PhD in computational linguistics in 2018 and currently works at Thoughtworks. He has designed and developed data-powered products and services across a range of industries and helped organizations and teams improve the effectiveness of their data processes and workflows. Ned has also worked as a Python trainer, supporting both tertiary students and data professionals across various organizations. He is active in the data community, speaking at and helping organize meetups and conferences, as well as contributing to a range of open source projects.

下载地址

PDF, EPUB | 21 MB | 2024-07-09

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Book Description

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

Review

About the Author

下载地址

相关推荐

评论抢沙发

分类

觉得文章有用就打赏一下文章作者

您的打赏，我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Book Description

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

Review

About the Author

下载地址

相关推荐

评论 抢沙发

分类

觉得文章有用就打赏一下文章作者

您的打赏，我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫

评论抢沙发