← Week 2: Consensus Algorithms

Day 9: Raft — Leader Election and Log Replication

Phase 1 · May 28, 2026

← Week 2: Consensus Algorithms

Agenda (2–3 hours)

  • Read (45 min): Ongaro & Ousterhout "In Search of an Understandable Consensus Algorithm" (USENIX ATC 2014) §1–5
  • Study (45 min): Trace leader election and log replication through the Raft visualization at raft.github.io
  • Practice (45 min): Work through the safety argument: why can a node only vote for a candidate whose log is at least as up-to-date as its own?
  • Challenge (30 min): What happens if network partitions create two leaders simultaneously? How does Raft resolve this?
← Week 2: Consensus Algorithms

Why Raft?

Paxos is notoriously difficult to implement correctly:

  • Multi-Paxos underspecifies leader election and log management
  • Many production Paxos implementations diverge from the original paper

Raft was designed with three explicit decompositions:

  1. Leader election: one node is authoritative at a time
  2. Log replication: leader accepts entries, followers replicate
  3. Safety: only nodes with up-to-date logs can become leaders
← Week 2: Consensus Algorithms

Leader Election

Raft uses terms — monotonically increasing epoch numbers.

Follower → Candidate:

  • Election timeout fires (150–300ms randomized)
  • Increment term, vote for self, send RequestVote to all peers

Candidate → Leader (wins election):

  • Receives votes from a majority
  • Immediately sends heartbeat AppendEntries to establish authority

Candidate → Follower (loses):

  • Sees a RequestVote or AppendEntries with higher term
  • Reverts to follower

Vote restriction: a node only votes for a candidate whose log is at least as up-to-date (by last log term, then index) as its own.

← Week 2: Consensus Algorithms

Log Replication

  1. Client sends command to leader
  2. Leader appends entry to its log (term + index + command)
  3. Leader sends AppendEntries (or heartbeat) to all followers
  4. Followers append and acknowledge
  5. Once a majority acknowledges, leader commits the entry
  6. Leader applies to state machine, responds to client
  7. Future heartbeats carry the new commit index; followers apply

AppendEntries consistency check: leader includes the index and term of the entry before the new entries. Follower rejects if it doesn't match → leader backs up and retries from an earlier point.

← Week 2: Consensus Algorithms

Safety: Leader Completeness

The key guarantee: a leader always has all committed entries.

Proof sketch: suppose entry (term T, index I) is committed by leader L. It was replicated to a majority M. Any future leader must win a majority election — at least one node in M must grant its vote. That node will only vote for a candidate with a log at least as current as its own, which includes entry (T, I). By induction, future leaders always have all committed entries.

This means no committed entry is ever lost.

← Week 2: Consensus Algorithms

Key Takeaways

  • Raft decomposes consensus into leader election + log replication + safety invariants
  • Terms serve as logical clocks: stale leaders are detected by term number
  • Log replication is a two-phase commit with the leader as coordinator
  • Safety: only nodes with up-to-date logs can win elections, preserving all committed entries
  • etcd (Kubernetes backing store) is the most prominent Raft implementation

Tomorrow: Raft deep dive — log compaction, snapshots, and membership changes.