
Apache Hudi: The Definitive Guide: Building Robust, Open, and High-Performing Data Lakehouses
Author(s): Shiyan Xu (Author), Prashant Wason (Author), Bhavani Sudha Saktheeswaran (Author), Rebecca Bilbro (Author)
- Publisher: O'Reilly Media
- Publication Date: December 2, 2025
- Edition: 1st
- Language: English
- Print length: 287 pages
- ISBN-10: 109817383X
- ISBN-13: 9781098173838
Book Description
Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using your query engine of choice.
Authors Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications.
- Understand the need for transactional data lakehouses and the challenges associated with building them
- Explore data ecosystem support provided by Apache Hudi for popular data sources and query engines
- Perform different write and read operations on Apache Hudi tables and effectively use them for various use cases, including batch and stream applications
- Apply different storage techniques and considerations such as indexing and clustering to maximize your lakehouse performance
- Build end-to-end incremental data pipelines using Apache Hudi for faster ingestion and fresher analytics
Editorial Reviews
About the Author
Prashant Wason is a Staff Software Engineer at Uber Technologies and a PMC member of the Apache Hudi project. He has been an active contributor to the Hudi project since 2019 with features like Metadata Table and Record Index. Prashant has been working in the Storage and Data Infrastructure space for over 15 years.
Sudha Saktheeswaran is a Software Engineer at Onehouse and a PMC member of the Apache Hudi project. She comes with vast experience in real-time and distributed data systems through her work at Moveworks, Uber and Linkedin’s data infra teams. Sudha is also a key contributor to the early Presto integrations of Hudi. She is passionate about engaging with and driving the Hudi community.
Dr. Rebecca Bilbro is a data scientist, Python programmer, and author in Washington, DC. She specializes in data visualization for machine learning, from feature analysis to model selection and hyperparameter tuning. Rebecca is an active contributor to the open source community and has conducted research on natural language processing, semantic network extraction, entity resolution, and high dimensional information visualization. She earned her doctorate from the University of Illinois, Urbana-Champaign, where her research centered on communication and visualization practices in engineering. Rebecca is co-founder and CTO of Rotational Labs.
finelybook

扣分后无法下载
已更新