Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Author: Simon Aubury (Author), Ned Letcher (Author)

Publisher finelybook 出版社:‏ ‎Packt Publishing

Edition 版本:‏ ‎ N/A

Publication Date 出版日期:‏ ‎ 2024-06-24

Language 语言: ‎ English

Print Length 页数: ‎ 382 pages

ISBN-10: ‎ 1803241004

ISBN-13: ‎ 9781803241005

Book Description

Analyze and transform data efficiently with DuckDB, a versatile, modern, in-process SQL database

Key Features

  • Use DuckDB to rapidly load, transform, and query data across a range of sources and formats
  • Gain practical experience using SQL, Python, and R to effectively analyze data
  • Learn how open source tools and cloud services in the broader data ecosystem complement DuckDB’s versatile capabilities
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

DuckDB is a fast in-process analytical database. Getting Started with DuckDB offers a practical overview of its usage. You’ll learn to load, transform, and query various data formats, including CSV, JSON, and Parquet. The book covers DuckDB’s optimizations, SQL enhancements, and extensions for specialized applications. Working with examples in SQL, Python, and R, you’ll explore analyzing public datasets and discover tools enhancing DuckDB workflows. This guide suits both experienced and new data practitioners, quickly equipping you to apply DuckDB’s capabilities in analytical projects. You’ll gain proficiency in using DuckDB for diverse tasks, enabling effective integration into your data workflows.

What you will learn

  • Understand the properties and applications of a columnar in-process database
  • Use SQL to load, transform, and query a range of data formats
  • Discover DuckDB’s rich extensions and learn how to apply them
  • Use nested data types to model semi-structured data and extract and model JSON data
  • Integrate DuckDB into your Python and R analytical workflows
  • Effectively leverage DuckDB’s convenient SQL enhancements
  • Explore the wider ecosystem and pathways for building DuckDB-powered data applications

Who this book is for

If you’re interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.

Table of Contents

  1. An Introduction to DuckDB
  2. Loading Data into DuckDB
  3. Data Manipulation with DuckDB
  4. DuckDB Operations and Performance
  5. DuckDB Extensions
  6. Semi-Structured Data Manipulation
  7. Setting up the DuckDB Python Client
  8. Exploring DuckDB’s Python API
  9. Exploring DuckDB’s R API
  10. Using DuckDB Effectively
  11. Hands-On Exploratory Data Analysis with DuckDB
  12. DuckDB – The Wider Pond

Review

“In this excellent book, Simon and Ned have combined the practicalities of what you need to know now with a wealth of hints and tips for getting the most out of DuckDB. Tips for doing more, much more easily.

The chapter on DuckDB’s extensions is particularly fruitful if you’re looking to perform minor data miracles. You will learn how to pull raw data off S3, chew through it in seconds, and export it into an Excel spreadsheet, instantly becoming the favourite data guru of an entire marketing department.”

Kris Jenkins

Host of Developer Voices and Co-Founder of BullionVault

“Getting Started with DuckDB is a great resource, even for someone like me who’s already familiar with the tool. I was impressed by how well it organized DuckDB’s core features, providing fresh insights and practical examples that still managed to deepen my understanding. It’s not just a beginner’s guide—it offers valuable tips and optimizations for more advanced users too. Whether you’re revisiting the basics or finetuning your existing knowledge, this book covers it all in a clear and engaging way. It’s a great reference for any level of expertise.”

Stephanie Wang, Founding Engineer at MotherDuck

About the Author

Simon Aubury has been working in the IT industry since 2000 as a data engineering specialist. He has an extensive background in building large, flexible, highly available distributed data systems. Simon has delivered critical data systems for finance, transport, healthcare, insurance, and telecommunications clients in Australia, Europe, and Asia Pacific. In 2019, Simon joined Thoughtworks as a principal data engineer and today is associate director of data platforms at Simple Machines in Sydney, Australia. Simon is active in the data community, a regular conference speaker, and the organizer of local and international meetups and data engineering conferences.

Ned Letcher has worked as a data science and software engineering consultant since completing his PhD in computational linguistics in 2018 and currently works at Thoughtworks. He has designed and developed data-powered products and services across a range of industries and helped organizations and teams improve the effectiveness of their data processes and workflows. Ned has also worked as a Python trainer, supporting both tertiary students and data professionals across various organizations. He is active in the data community, speaking at and helping organize meetups and conferences, as well as contributing to a range of open source projects.

相关文件下载地址

PDF, EPUB | 21 MB | 2024-07-09

打赏
未经允许不得转载:finelybook » Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

评论 1

  1. #1

    下载提示还需要密码啊

    chunyuran3个月前 (09-29)回复

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫