Understanding Distributed Systems: What every developer should know about large distributed applications
by: Roberto Vitillo
Publisher finelybook 出版社: Roberto Vitillo (28 Feb. 2021)
Language 语言: English
Print Length 页数: 253 pages
ISBN-10: 1838430202
ISBN-13: 9781838430207
Book Description
According to Stack Overflow’s 2020 developer survey,the best-paid engineering roles require distributed systems expertise. That comes as no surprise as modern applications are distributed systems. Learning to build distributed systems is hard,especially if they are large scale. It’s not that there is a lack of information out there. You can find academic papers,engineering blogs,and even books on the subject. The problem is that the available information is spread out all over the place,and if you were to put it on a spectrum from theory to practice,you would find a lot of material at the two ends,but not much in the middle. That is why I decided to write a book to teach the fundamentals of distributed systems so that you don’t have to spend countless hours scratching your head to understand how everything fits together. This is the guide I wished existed when I first started out,and it’s based on my experience building large distributed systems that scale to millions of requests per second and billions of devices. If you develop the back-end of web or mobile applications (or would like to!),this book is for you. When building distributed systems,you need to be familiar with the network stack,data consistency models,scalability and reliability patterns,and much more. Although you can build applications without knowing any of that,you will end up spending hours debugging and re-designing their architecture,learning lessons that you could have acquired in a much faster and less painful way. The book also makes for a great study companion for a system design interview if you want to land a job at a company that runs large-scale distributed systems,like Amazon,Google,Facebook,or Microsoft. If you are interviewing for a senior role,you are expected to be able to design complex networked services and dive deep into any vertical. You can be a world champion at balancing trees,but if you fail the design round,you are out. And if you just meet the bar,don’t be surprised when your offer is well below what you expected,even if you aced everything else.
Table of contents
1 Introduction
1.1 Communication
1.2 Coordination
1.3 Scalability
1.4 Resiliency
1.5 Operations
1.6 Anatomy of a distributed system
Communication
2 Reliable links
2.1 Reliability
2.2 Connection lifecycle
2.3 Flow control
2.4 Congestion control
2.5 Custom protocols
3 Secure links
3.1 Encryption
3.2 Authentication
3.3 Integrity
3.4 Handshake
4 Discovery
5 APIs
5.1 HTTP
5.2 Resources
5.3 Request methods
5.4 Response status codes
5.5 OpenAPI
5.6 Evolution
Coordination
6 System models
7 Failure detection
8 Time
8.1 Physical clocks
8.2 Logical clocks
8.3 Vector clocks
9 Leader election
9.1 Raft leader election
9.2 Practical considerations
10 Replication
10.1 State machine replication
10.2 Consensus
10.3 Consistency models
10.4 Chain replication
10.5 Solving the CAP theorem
10.6 Coordination avoidance
11 Transactions
11.1 ACID
11.2 Isolation
11.3 Atomicity
11.4 Asynchronous transactions
Scalability
12 Functional decomposition
12.1 Microservices
12.2 API gateway
12.3 CQRS
12.4 Messaging
13 Partitioning
13.1 Sharding strategies
13.2 Rebalancing
14 Duplication
14.1 Network load balancing
14.2 Replication
14.3 Caching
Resiliency
15 Common failure causes
15.1 Single point of failure
15.2 Unreliable network
15.3 Slow processes
15.4 Unexpected load
15.5 Cascading failures
15.6 Risk management
16 Downstream resiliency
16.1 Timeout
16.2 Retry
16.3 Circuit breaker
17 Upstream resiliency
17.1 Load shedding
17.2 Load leveling
17.3 Rate-limiting
17.4 Bulkhead
17.5 Health endpoint
17.6 Watchdog
Testing and operations
18 Testing
18.1 Scope
18.2 Size
18.3 Practical considerations
19 Continuous delivery and deployment
19.1 Review and build
19.2 Pre-production
19.3 Production
19.4 Rollbacks
20 Monitoring
20.1 Metrics
20.2 Service-level indicators
20.3 Service-level objectives
20.4 Alerts
20.5 Dashboards
20.6 On-call
21 Observability
21.1 Logs
21.2 Traces
21.3 Putting it all together
22 Final words