← Week 2: Consensus Algorithms

Day 12: ZooKeeper and etcd

Phase 1 · May 31, 2026

← Week 2: Consensus Algorithms

Agenda (2–3 hours)

  • Read (45 min): Hunt et al. "ZooKeeper: Wait-free Coordination for Internet-Scale Systems" (USENIX ATC 2010) §1–4; etcd documentation on the Raft implementation
  • Study (45 min): ZAB vs Raft — similarities and key differences
  • Practice (45 min): Run a local etcd cluster (3 nodes via Docker); observe leader election by killing the leader
  • Challenge (30 min): Implement a distributed lock using etcd's lease + if-not-exists CAS operation
← Week 2: Consensus Algorithms

ZooKeeper and ZAB

ZooKeeper (Apache, 2010): distributed coordination service for leader election, distributed locks, service discovery, configuration management.

ZAB (ZooKeeper Atomic Broadcast): consensus protocol used by ZooKeeper.

  • Similar to Multi-Paxos but designed specifically for primary-backup replication
  • Epoch-based: each epoch has one primary; primaries are ordered by epoch number
  • Two phases: discovery (establish new epoch) + synchronization + broadcast
  • Ordered delivery: all updates are delivered in FIFO order from the primary

ZAB guarantees sequential consistency (not linearizability) — read from any replica may be stale; sync() required before a linearizable read.

← Week 2: Consensus Algorithms

etcd

Kubernetes' backing store. Built on Raft.

Key properties:

  • Linearizable reads (by default via ReadIndex)
  • Watch API: stream changes to keys with guaranteed ordering
  • Leases: TTL-based locks that auto-expire on client failure
  • Transactions: compare-and-swap via multi-version concurrency control

Common uses:

  • Kubernetes: all cluster state (pods, services, configmaps) lives in etcd
  • Distributed locks: CREATE key WITH LEASE atomically — only one client succeeds
  • Leader election: clients race to create an ephemeral key; winner is leader
  • Service registry: services register health with a lease; expire = service removed
← Week 2: Consensus Algorithms

ZAB vs Raft

Property ZAB Raft
Consistency Sequential (sync for linearizable) Linearizable by default
Leader discovery ZAB epoch + sync phase Raft term + no-op commit
Log commitment Primary confirms quorum Leader confirms quorum
Read semantics May be stale (without sync) ReadIndex or lease
Used by ZooKeeper, Kafka (metadata) etcd, CockroachDB, TiKV

Raft is generally considered easier to implement correctly; ZAB has a more complex synchronization phase.

← Week 2: Consensus Algorithms

Practical: etcd Operations

# Put and Get
etcdctl put mykey myvalue
etcdctl get mykey

# Conditional put (distributed lock)
etcdctl txn <<EOF
compares:
  version("lock/myresource") = "0"
success requests:
  put lock/myresource myid --lease=abc123
failure requests:
  get lock/myresource
EOF

# Watch a key
etcdctl watch /services/ --prefix
← Week 2: Consensus Algorithms

Key Takeaways

  • ZooKeeper/ZAB: proven workhorse for coordination; sequential consistency by default
  • etcd/Raft: linearizable by default; tighter integration with modern cloud-native systems
  • Leases are the fundamental primitive for distributed locks and leader election
  • Kubernetes relies entirely on etcd — understanding etcd is essential for K8s reliability

Tomorrow: consensus in production systems — how Raft is used in real databases.