← Week 1: Consistency Models

Day 1: Introduction to Distributed Systems

Phase 1 · May 20, 2026

← Week 1: Consistency Models

Agenda (2–3 hours)

  • Read (45 min): Tanenbaum & Van Steen §1.1–1.3; Lamport's "Time, Clocks, and the Ordering of Events" abstract
  • Study (45 min): The 8 fallacies of distributed computing — why each matters
  • Practice (45 min): Map each fallacy to a real incident or system you've worked on
  • Challenge (30 min): Write a one-page design doc for any distributed feature, calling out which fallacies apply
← Week 1: Consistency Models

What Is a Distributed System?

A set of independent computers that appears to users as a single coherent system.

Key properties (Tanenbaum):

  • No shared memory — communication only via message passing
  • No global clock — each node has its own clock
  • Independent failures — nodes can fail independently

Why build them?

  • Scale beyond what a single machine can handle
  • Fault tolerance through redundancy
  • Geographic distribution / low latency
← Week 1: Consistency Models

The 8 Fallacies of Distributed Computing

Peter Deutsch (Sun Microsystems, 1994):

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn't change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Every distributed system design must account for all eight.

← Week 1: Consistency Models

Failure Is the Norm

In a cluster of 1,000 nodes, each with 99.9% annual uptime:

P(all nodes up) = 0.999^1000 ≈ 0.37

Expected: at least one node is down 63% of the time.

Types of failures:

  • Crash-stop: node halts and stays halted
  • Crash-recovery: node halts then restarts with durable state
  • Byzantine: node behaves arbitrarily (lies, corrupts messages)
  • Network partition: nodes are reachable but cannot talk to each other
← Week 1: Consistency Models

Communication Models

Synchronous model: known upper bounds on message delay and processing time

  • Easier to reason about; rarely holds in practice

Asynchronous model: no timing bounds whatsoever

  • Realistic; makes many problems provably unsolvable (FLP impossibility)

Partially synchronous model (most practical systems):

  • Usually synchronous, occasionally asynchronous
  • Timeout-based failure detectors are imprecise but sufficient
← Week 1: Consistency Models

Key Takeaways

  • A distributed system's defining challenge is that partial failure is normal
  • The 8 fallacies are a checklist — violating any one silently creates bugs
  • Failure models determine what algorithms are applicable
  • Today established the vocabulary for everything that follows

Tomorrow: consistency models — what guarantees can a distributed system provide about the data it stores?