Distributed Dilemmas

Building for a million users is easy. Finding them is the hard part. Most of the time, a Raspberry Pi could handle the load just fine.¹
Distributed systems can only guarantee two out of three properties: consistency, availability, and partition tolerance.²
Models of distributed systems include synchronous (bounded delays), asynchronous (unbounded delays), and semi-synchronous (configurable bounds, reflecting reality).³
The FLP result shows that consensus is impossible in a purely asynchronous system with even one faulty node.⁴
Fault detection in distributed systems is inherently imprecise due to unpredictable network behavior.⁵
Fault-tolerant design doesn't just handle failures; it expects them. In large systems, continuous component failure is the default, not the exception. Erlang is a beautiful demonstration of this philosophy.⁶
Consensus in distributed systems, or reaching agreement among nodes, is crucial but complex due to failures and potential Byzantine actors.⁷
Message passing in distributed systems involves varying delivery semantics (one, any, or all), arrival guarantees (once, multiple, or never), and ordering constraints (total, partial, or causal).⁸
Idempotence in message passing ensures that repeated processing of a message has the same effect as processing it once.⁹
State machine replication ensures consistent state by applying the same operations in the same order across replicas.¹⁰
State-based replication directly replicates the system's state, but introduces efficiency challenges.¹¹
Consistency models in distributed systems define the trade-off between data consistency and performance.¹²
Database transaction scopes, including isolation levels, have different semantics across database systems.¹³
Logical clocks in distributed systems, such as Lamport timestamps, vector clocks, and version vectors, establish order and track causality.¹⁴
CRDTs are data structures that enable conflict-free concurrent replication.¹⁵
The end-to-end argument states that true reliability necessitates application-level acknowledgments.¹⁶
Fallacies of distributed computing include making incorrect assumptions about network reliability, speed, and security.¹⁷
Common practical failure modes in distributed systems include network partitions, clock drift, and inconsistencies between backups.¹⁸
For additional resources and further reading on distributed systems, see the notes.¹⁹