Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services

Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services

Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services

Author: Trâm Ngọc Phạm (Author), Gonzalo Herreros González (Author), Viquar Khan (Author), Huda Nofal (Author)

Publisher finelybook 出版社:‏ Packt Publishing‎

Edition 版本:‏ ‎ N/A

Publication Date 出版日期:‏ ‎ 2024-11-29

Language 语言: ‎ English

Print Length 页数: ‎ 528 pages

ISBN-10: ‎ 1805127284

ISBN-13: ‎ 9781805127284

Book Description

Master AWS data engineering services and techniques for orchestrating pipelines, building layers, and managing migrations

Key Features

  • Get up to speed with the different AWS technologies for data engineering
  • Learn the different aspects and considerations of building data lakes, such as security, storage, and operations
  • Get hands on with key AWS services such as Glue, EMR, Redshift, QuickSight, and Athena for practical learning
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Performing data engineering with Amazon Web Services (AWS) combines AWS’s scalable infrastructure with robust data processing tools, enabling efficient data pipelines and analytics workflows. This comprehensive guide to AWS data engineering will teach you all you need to know about data lake management, pipeline orchestration, and serving layer construction.

Through clear explanations and hands-on exercises, you’ll master essential AWS services such as Glue, EMR, Redshift, QuickSight, and Athena. Additionally, you’ll explore various data platform topics such as data governance, data quality, DevOps, CI/CD, planning and performing data migration, and creating Infrastructure as Code. As you progress, you will gain insights into how to enrich your platform and use various AWS cloud services such as AWS EventBridge, AWS DataZone, and AWS SCT and DMS to solve data platform challenges.

Each recipe in this book is tailored to a daily challenge that a data engineer team faces while building a cloud platform. By the end of this book, you will be well-versed in AWS data engineering and have gained proficiency in key AWS services and data processing techniques. You will develop the necessary skills to tackle large-scale data challenges with confidence.

What you will learn

  • Define your centralized data lake solution, and secure and operate it at scale
  • Identify the most suitable AWS solution for your specific needs
  • Build data pipelines using multiple ETL technologies
  • Discover how to handle data orchestration and governance
  • Explore how to build a high-performing data serving layer
  • Delve into DevOps and data quality best practices
  • Migrate your data from on-premises to AWS

Who this book is for

If you’re involved in designing, building, or overseeing data solutions on AWS, this book provides proven strategies for addressing challenges in large-scale data environments. Data engineers as well as big data professionals looking to enhance their understanding of AWS features for optimizing their workflow, even if they’re new to the platform, will find value. Basic familiarity with AWS security (users and roles) and command shell is recommended.

Table of Contents

  1. Managing Data Lake Storage
  2. Sharing Your Data Across Environments and Accounts
  3. Ingesting and Transforming Your Data with AWS Glue
  4. A Deep Dive into AWS Orchestration Frameworks
  5. Running Big Data Workloads with Amazon EMR
  6. Governing Your Platform
  7. Data Quality Management
  8. DevOps – Defining IaC and Building CI/CD Pipelines
  9. Monitoring Data Lake Cloud Infrastructure
  10. Building a Serving Layer with AWS Analytics Services
  11. Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads
  12. Harnessing the Power of AWS for Seamless Data Warehouse Migration
  13. Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS

Review

“This book is highly recommended for both novices and seasoned professionals in the AWS analytics landscape. Its comprehensive coverage of services like AWS Glue, Amazon Redshift, and Amazon QuickSight, coupled with well-documented, replicable recipes, makes it an invaluable resource. The book’s dedication to migration strategies is particularly noteworthy, offering a well-designed guideline to organizations transitioning from legacy systems. While dense with information, its practical approach ensures readers can immediately apply their learnings. A must-have for anyone serious about leveraging AWS for data engineering.”

Noritaka Sekiyama, Principal Big Data Architect, AWS Glue; Author of Serverless ETL and Analytics with AWS Glue

About the Author

Trâm Ngọc Phạm is a senior data architect with over a decade of hands-on experience working in the big data and AI field, from playing a lead role in tailoring cloud data platforms to BI and analytics use cases for enterprises in Vietnam. While working as a Senior Data and Analytics consultant for the AWS Professional Services team, she specialized in guiding finance and telco companies across Southeast Asian countries to build enterprise-scale data platforms and drive analytics use cases that utilized AWS services and big data tools.

Gonzalo Herreros González is a principal data architect. He holds a bachelor’s degree in computer science and a master’s degree in data analytics. He has experience of over a decade in big data and two decades of software development, both in AWS and on-premises.

Previously, he worked at MasterCard where he achieved the first PCI-DSS Hadoop cluster in the world. More recently, he worked at AWS for over 6 years, building data pipelines for the internal network data, and later, as an architect in the AWS Glue service team, building transforms for AWS Glue Studio and helping large customers succeed with AWS data services.

Viquar Khan is a senior data architect at AWS Professional Services and brings over 20 years of expertise in finance and data analytics, empowering global financial institutions to harness the full potential of AWS technologies. He designs cutting-edge, customized data solutions tailored to complex industry needs. A polyglot developer skilled in Java, Scala, Python, and other languages, Viquar has excelled in various technical roles. As an expert group member of JSR368 (JavaTM Message Service 2.1), he has shaped industry standards and actively contributes to open source projects such as Apache Spark and Terraform. His technical insights have reached and benefited over 6.7 million users on Stack Overflow.

Huda Nofal is a seasoned data engineer with over 7 years of experience at Amazon, where she has played a key role in helping internal business teams achieve their data goals. With deep expertise in AWS services, she has successfully designed and implemented data pipelines that power critical decision-making processes across various organizations. Huda’s work primarily focuses on leveraging Redshift, Glue, data lakes, and Lambda to create scalable, efficient data solutions.

Amazon Page

下载地址

PDF, (conv), EPUB | 45 MB | 2025-01-06

打赏
未经允许不得转载:finelybook » Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services

评论 抢沙发

觉得文章有用就打赏一下

您的打赏,我们将继续给力更多优质内容

支付宝扫一扫

微信扫一扫