Data Engineering patterns on the cloud: How to solve common data engineering problems with cloud services?
303
PAGES
60 DAYS
GUARANTEE
ENGLISH
PDF
EPUB
Book Description
Doing data engineering without the cloud is hard to imagine nowadays. Does it mean, each cloud provider is different? Well, yes, each of them has its own specificities but it doesn’t mean they’re completely different. They all support several common data engineering patterns that you will find in the book classified in 7 categories: data processing, data storage, data security, data warehouse, data management, data orchestration and data transfer. This organization and the explanation of each pattern will help you understand a new cloud provider easier, whenever you come from AWS, Azure, or GCP environment.
Table of Contents
Introduction
Who is this book for?
Book organization
Conventions used in the book
Contact
Data processing
Batching
Consumer groups
Cooldown period
Data locality
Dead-Letter sink
Ephemeral cluster
Event-driven serverless worker
Exactly-once delivery
Micro-batch processing
No Code data processing
NoSQL Change Data Capture
Ordered delivery
Parallel data upload
Pull processing
Push processing
Reprocessing data from a streaming data source
Reusable job template
Serverless SQL worker
Shuffle service
Single-tenant client
Small data – batch services
Small data – single-node cluster
Streaming serverless SQL processing
Targeted data retrieval from files
Data storage
Auto-scaled throughput capacity
Data denormalization with nested and repeated structures
Data layout – partitions
Data layout – secondary partitions
Optimized data access – indexes
Optimized data access – row keys
Immutable storage
Interleaved storage
Multi-region storage
Serverless database
Streaming data retention
Throughput capacity reservation
Data security
Access without cloud identity
Cross-account data access with service accounts
Custom permissions
Data encryption at rest
Data encryption in transit
Data rollback
Denied public access
Identity-based access
Network access control
Network separation
Policy conditions
Private communication with cloud services from a cloud network
Resource-based permissions
Safe secrets storage
Shared permissions
Soft-deletes
Data warehouse
Fault-tolerance – backups
Batched DML operations
Big dataset loading
Column-level security
Extract Load
Data distribution – key-based
Data distribution – replicated
Data distribution – round-robin
Identity management service access
Materialized views
Nested data storage
Result set cache
Row-level security
Temporary table
Querying – external data sources
Querying – User Defined Functions
Data management
Audit logs
Automatic data expiration
Automatic lifecycle management
Data annotation
Data cataloging – automatic
Data cataloging – manual
Intelligent lifecycle management
Data orchestration
Data processing service orchestrator
No Code orchestrator
Open Source orchestrator as a service
Serverless orchestrator
Time-based trigger
Data transfer
Continuous data migration
Heterogeneous database migration
Homogeneous database migration
One-time data migration
Transfer without data movement