Building for a million users is easy. Finding them is the hard part. Most of the time, a Raspberry Pi could handle the load just fine.1
Distributed systems can only guarantee two out of three properties: consistency, availability, and partition tolerance.2
Models of distributed systems include synchronous (bounded delays), asynchronous (unbounded delays), and semi-synchronous (configurable bounds, reflecting reality).3
The FLP result shows that consensus is impossible in a purely asynchronous system with even one faulty node.4
Fault detection in distributed systems is inherently imprecise due to unpredictable network behavior.5
Fault-tolerant design doesn't just handle failures; it expects them. In large systems, continuous component failure is the default, not the exception. Erlang is a beautiful demonstration of this philosophy.6
Consensus in distributed systems, or reaching agreement among nodes, is crucial but complex due to failures and potential Byzantine actors.7
Message passing in distributed systems involves varying delivery semantics (one, any, or all), arrival guarantees (once, multiple, or never), and ordering constraints (total, partial, or causal).8
Idempotence in message passing ensures that repeated processing of a message has the same effect as processing it once.9
State machine replication ensures consistent state by applying the same operations in the same order across replicas.10
State-based replication directly replicates the system's state, but introduces efficiency challenges.11
Consistency models in distributed systems define the trade-off between data consistency and performance.12
Database transaction scopes, including isolation levels, have different semantics across database systems.13
Logical clocks in distributed systems, such as Lamport timestamps, vector clocks, and version vectors, establish order and track causality.14
CRDTs are data structures that enable conflict-free concurrent replication.15
The end-to-end argument states that true reliability necessitates application-level acknowledgments.16
Fallacies of distributed computing include making incorrect assumptions about network reliability, speed, and security.17
Common practical failure modes in distributed systems include network partitions, clock drift, and inconsistencies between backups.18
For additional resources and further reading on distributed systems, see the notes.19