← Week 2: Consensus Algorithms

Day 11: Byzantine Fault Tolerance

Phase 1 · May 30, 2026

← Week 2: Consensus Algorithms

Agenda (2–3 hours)

  • Read (45 min): Lamport, Shostak, Pease "The Byzantine Generals Problem" (1982) §1–3; Castro & Liskov "Practical BFT" (OSDI 1999) abstract + §1
  • Study (45 min): Work through the BFT impossibility result (need 3f+1 for f Byzantine nodes); contrast with CFT's 2f+1
  • Practice (45 min): Map BFT to real systems — where is Byzantine behavior an actual threat?
  • Challenge (30 min): Why does blockchain use BFT instead of CFT? What does Nakamoto consensus sacrifice compared to PBFT?
← Week 2: Consensus Algorithms

Crash Fault Tolerance vs Byzantine Fault Tolerance

CFT (Crash Fault Tolerant):

  • Faulty nodes stop responding (crash-stop model)
  • Nodes don't lie or send conflicting messages
  • Requires 2f+1 nodes to tolerate f failures

BFT (Byzantine Fault Tolerant):

  • Faulty nodes can behave arbitrarily — send conflicting messages, lie, collude
  • A compromised or malicious node
  • Requires 3f+1 nodes to tolerate f Byzantine failures

The Byzantine Generals: f traitors among n generals must not prevent the n-f loyal generals from reaching agreement.

← Week 2: Consensus Algorithms

Why 3f+1?

Suppose we have f Byzantine nodes. We need:

  • A quorum large enough that even if all f Byzantine nodes participate, the honest nodes form a majority of the quorum
  • Two quorums must overlap in at least f+1 nodes, so at least one honest node is in both
  • Minimum: 3f+1 total nodes; quorum size 2f+1

With 3f or fewer nodes, two quorums could overlap only in Byzantine nodes, and they could equivocate (tell each quorum different things).

← Week 2: Consensus Algorithms

PBFT (Practical Byzantine Fault Tolerance, Castro & Liskov 1999)

Three-phase protocol: Pre-prepare → Prepare → Commit

  1. Pre-prepare: primary assigns sequence number, multicasts to replicas
  2. Prepare: each replica broadcasts prepare message; waits for 2f matching prepares
  3. Commit: each replica broadcasts commit; waits for 2f+1 commits before executing

Safety: even if the primary is Byzantine, a replica won't commit unless 2f+1 nodes (including itself) have agreed on the same value for the same sequence number.

Performance: O(n²) message complexity per operation — PBFT is expensive at scale.

← Week 2: Consensus Algorithms

Modern BFT and Blockchain

HotStuff (Facebook's LibraBFT/Diem): O(n) message complexity with pipelined phases; chained BFT

Nakamoto consensus (Bitcoin): probabilistic BFT via proof-of-work

  • No fixed quorum; longest chain wins
  • Safety is probabilistic — 6 confirmations ≈ irreversibility in practice
  • Byzantine threshold: 50% of hash power (vs 33% for PBFT)
  • Trades performance (10-minute blocks) for permissionless participation

Tendermint / PBFT variants (Cosmos, Ethereum PoS): round-based BFT with rotating proposers; instant finality, permissioned validator set

← Week 2: Consensus Algorithms

Key Takeaways

  • CFT handles crash failures (honest but unavailable); BFT handles malicious failures
  • BFT requires 3f+1 nodes vs CFT's 2f+1 — significantly more expensive
  • PBFT provides deterministic BFT with O(n²) messages
  • Blockchain Nakamoto consensus is probabilistic BFT for open, permissionless networks
  • For Leo Secure Comms: BFT is relevant only if you trust no single node (e.g., multi-party key management)

Tomorrow: ZooKeeper and etcd — production consensus in coordination services.