
Architecting an Apache Iceberg Lakehouse: A scalable, open-source data platform
Author(s): Alex Merced (Author)
- Publisher Finelybook 出版社: Manning Publications
- Publication Date 出版日期: May 19, 2026
- Language 语言: English
- Print length 页数: 408 pages
- ISBN-10: 1633435105
- ISBN-13: 9781633435100
Book Description
Design an Apache Iceberg lakehouse from scratch!
The “lakehouse” data architecture is a powerful way to combine the flexibility of data lakes with the management features of data warehouses. The open source Apache Iceberg framework delivers the scalability, reliability, and performance you want from a lakehouse without the expense and vendor lock-in of platforms like Snowflake, BigQuery, and Redshift.
In Architecting an Apache Iceberg Data Lakehouse, data guru Alex Mercedshows you:
• How to create a modular, scalable Iceberg lakehouse architecture
• Where Spark, Flink, Dremio, Polaris fit into your design
• Reliable batch and streaming ingestion pipelines
• Strategies for governance, security, and performance at scale
Apache Iceberg is an open source table format perfect for massive analytic datasets. Iceberg enables ACID transactions, schema evolution, and high-performance queries on data lakes using multiple compute engines like Spark, Trino, Flink, Presto, and Hive. An Iceberg data lakehouse enables fast, reliable analytics at scale while retaining the observability you need for compliance audits, governance, and provable data security.
Foreword by Tim Berglund. Afterword by Adi Polak.
About the technology
Apache Iceberg is an open data format that lets data lake files work like database tables. It helps turn a data lake into a more reliable and capable lakehouse.
About the book
Architecting an Apache Iceberg Lakehouse shows you how to design an open, scalable, and cost-effective lakehouse platform with Apache Iceberg. More than a set of blueprints, the book explains the reasoning behind the architecture. You’ll build a mini lakehouse by ingesting sales and marketing data from PostgreSQL into Iceberg tables with Apache Spark and then create interactive dashboards in Apache Superset. You’ll appreciate expert Alex Merced’s real-world insights about operating an Iceberg lakehouse.
What’s inside
• Create a modular, scalable Iceberg lakehouse architecture
• Fit Spark, Flink, Dremio, Polaris and more into your design
• Batch and streaming ingestion pipelines
• Governance, security, and performance at scale
About the reader
For data architects familiar with the basics of a data lakehouse.
About the author
Alex Mercedis Head of Developer Relations at Dremio. He shares his expertise through videos, podcasts, and articles, and leads the DataLakehouseHub.com community.
Table of Contents
Part 1
1 The world of the data lakehouse
2 Apache Iceberg and the lakehouse
3 Hands-on with Apache Iceberg
Part 2
4 Preparing for your move to Apache Iceberg
5 Selecting the storage layer
6 Architecting the ingestion layer
7 Implementing the catalog layer
8 Designing the federation layer
9 Understanding the consumption layer
Part 3 Operating your Apache Iceberg lakehouse
10 Maintaining an Iceberg lakehouse
11 Operationalizing Apache Iceberg
A The metadata tables
B Python for Apache Iceberg
C The Apache Iceberg specification
finelybook
