Learning and Operating Presto: Fast, Reliable SQL for Data Analytics and Lakehouses
by: Angelica Lo Duca (Author), Tim Meehan (Author), Vivek Bharathan (Author), Ying Su (Author)
Language 语言: English
ISBN-10: 1098141857
Product Dimensions: 7 x 0.5 x 9.1 inches; 11.2 Ounces
Publication date: October 31, 2023
Publisher finelybook 出版社: O’Reilly Media; 1 edition
ISBN-13: 9781098141851
Release date: October 31, 2023
Book Description
The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside.
Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You’ll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production.
With this book, you will:
Learn how to install and configure Presto
Use Presto with business intelligence tools
Understand how to connect Presto to a variety of data sources
Extend Presto for real-time business insight
Learn how to apply best practices and tuning
Get troubleshooting tips for logs, error messages, and more
Explore Presto’s architectural concepts and usage patterns
Understand Presto security and administration
From the Preface
Why We Wrote This Book
Deploying Presto to meet your team’s warehouse and lakehouse infrastructure needs is not a minor undertaking. For the deployment to be successful, you need to understand the principles of Presto and the tools it provides. We wrote this book to help you get up to speed with Presto’s basic principles so you can successfully deploy Presto at your company, taking advantage of one of the most powerful distributed query engines in the data analytics space today. The book also includes chapters on the ecosystem around Presto and how you can integrate other popular open source projects like Apache Pinot, Apache Hudi, and more to open up even more use cases with Presto. After reading this book, you should be confident and empowered to deploy Presto in your team, and feel confident maintaining it going forward.
Who This Book Is For
This book is for individuals who are building data platforms for their teams. Job titles may include data engineers and architects, platform engineers, cloud engineers, and/or software engineers. They are the ones building and providing the platform that supports a variety of interconnected products. Their responsibilities include making sure all the components can work together as a single, integrated whole; resolving data processing and analytics issues; performing data cleaning, management, transformation, and deduplication; and developing tools and technologies to improve the analytics platform.
About the Author
Angelica Lo Duca is a researcher with a PhD in Computer Science. She currently works in Research and Technology at the Institute of Informatics and Telematics of the Italian National Research Council. Her research areas include Data Science, Machine Learning, Text Analytics, Data Visualization, Data Journalism, and Web Applications. She has also worked with Network Security, Semantic Web, Linked Data, and Blockchain. Additionally, she serves as a professor at the University of Pisa, where she teaches Data Journalism.
Tim has been fascinated by data problems for much of his career. He’s been working on the Presto project since 2018. He’s currently works at IBM and heads the Presto Technical Steering Committee. Before IBM, he worked at Meta, Bloomberg, Goldman Sachs, among others.
Vivek is the Cofounder and Principal Software Engineer at Ahana. Previously, Vivek was a Software Engineer at Uber where he managed Presto clusters with more than 2,500 nodes, processing 35PB of data per day, and worked on extending Presto to support Uber’s interactive analytics needs. Prior to Uber, Vivek was an early member of the query-optimizer team at Vertica Systems and made several contributions to the core database engine and the Vertica ecosystem. Earlier in his career at the Laboratory for Artificial Intelligence Research, he developed emerging technologies in decision-support systems and reasoning systems. His Presto contributions include the pushdown of partial aggregations. Vivek holds a M.S. in Computer Science and Engineering from The Ohio State University.
Ying is the performance architect at Ahana, where she works on building more efficient and better price-performant data lake services on Presto and Velox. She has worked for Microsoft SQLServer and Meta Presto in the past and is a Presto committer and TSC board member. Amazon page