Systemic Failure: Aphorisms
- Distributed systems are everywhere. Every system is distributed; ignoring this leads to failure.
- Designing and building distributed systems is challenging.
- CAP is the law. Don't defy it. Define your system within its constraints (AP or CP).
- Centralization is a risk. Avoid reliance on single points of failure.
- Clients are participants. Integrate them into your distributed systems design.
- Failure is inevitable. Software, networks, and hardware will fail. Prepare for it.
- Handle errors: network interruptions, hardware malfunctions, and human mistakes.
- Networks are dynamic. They shift topologies, multiply partitions, and vanish nodes. Adapt.
- Account for latency. Distinguish it from partitions and outages.
- Time is relative. Clocks drift; events clash. Manage concurrency carefully.
- Synchronization is delicate. Beware inconsistencies; deleted data can return.
- Actions have consequences. Mitigate irreversible side effects.
- Algorithms are fragile. Safeguard critical execution paths from failures.
- Read-only is insufficient. It doesn't guarantee no write capability.
- Quorums are flexible. Adjust cluster size and voting thresholds as needed.
- Storage is vulnerable. Mitigate corruption; data can disappear or reappear.
- Storage is limited. Plan for capacity limits, since unlimited storage does not exist.
- Bandwidth is precious. Minimize data transfer during resynchronization.
- Acknowledgement isn't confirmation. Ensure messages are received and processed.
- Persistence requires storage. Write messages to disk to prevent loss.
- Timeouts are finite; don't wait indefinitely for lost messages.
- Brief outages matter. Even short disruptions can have significant impacts.
- Theory isn't practice. Don't rely solely on unproven research.