← Week 2: Distributed Transactions

Day 8: Two-Phase Commit

Phase 4 · Jul 29, 2026

← Week 2: Distributed Transactions

Agenda (2–3 hours)

  • Read (45 min): Gray & Lamport "Consensus on Transaction Commit" (2004); PostgreSQL two-phase commit documentation
  • Study (45 min): Walk through all four failure scenarios: coordinator crash in phase 1, coordinator crash in phase 2, cohort crash in phase 1, cohort crash in phase 2
  • Practice (45 min): Use PostgreSQL's PREPARE TRANSACTION + COMMIT PREPARED to simulate 2PC between two schemas
  • Challenge (30 min): 2PC is called a "blocking" protocol. In what scenario is a cohort permanently blocked? How does 3PC claim to fix this?
← Week 2: Distributed Transactions

The Distributed Transaction Problem

Business requirement: "when a user places an order, deduct inventory AND charge their card — both or neither."

With microservices:

  • Inventory service: separate DB
  • Payment service: separate DB
  • No shared transaction coordinator

A local transaction in each service is insufficient — one can succeed while the other fails.

← Week 2: Distributed Transactions

Two-Phase Commit Protocol

Phase 1 — Prepare:

  1. Coordinator sends PREPARE to all cohorts
  2. Each cohort locks resources, writes a prepare log, replies VOTE-COMMIT or VOTE-ABORT
  3. Coordinator collects all votes

Phase 2 — Commit/Abort:

  1. If all votes are VOTE-COMMIT: coordinator logs COMMIT, sends COMMIT to all cohorts
  2. If any vote is VOTE-ABORT: coordinator logs ABORT, sends ABORT to all cohorts
  3. Cohorts apply or rollback, release locks, acknowledge
  4. Coordinator marks transaction complete after all acks
← Week 2: Distributed Transactions

Failure Scenarios

Failure point Cohort behavior Recovery
Coordinator crashes after PREPARE Cohort waits forever (blocked!) Coordinator recovers from log
Coordinator crashes after COMMIT written Cohort waits; coordinator re-sends COMMIT on recovery Coordinator log drives recovery
Cohort crashes before voting Coordinator sees timeout → abort Cohort replays prepare log on recovery
Cohort crashes after voting YES Cohort must commit when coordinator decision arrives Cohort's prepared transaction persists until resolved

Blocking problem: if the coordinator crashes after phase 1 but before sending phase 2, cohorts are blocked (holding locks) until the coordinator recovers.

← Week 2: Distributed Transactions

2PC in Practice

PostgreSQL supports 2PC explicitly:

-- Coordinator: begin prepared transaction on each cohort
BEGIN;
UPDATE inventory SET qty = qty - 1 WHERE item_id = 42;
PREPARE TRANSACTION 'order-1234-inventory';

-- On all cohorts committed:
COMMIT PREPARED 'order-1234-inventory';

-- On any failure:
ROLLBACK PREPARED 'order-1234-inventory';

Used in: distributed databases (CockroachDB, Spanner), XA transactions (Java EE), and any system that needs cross-resource atomicity.

← Week 2: Distributed Transactions

Key Takeaways

  • 2PC guarantees atomicity across distributed resources
  • Blocking: if coordinator crashes between phases, cohorts hold locks indefinitely
  • Recovery requires coordinator to re-apply its decision from the log after restart
  • 2PC requires stable, low-latency coordinator — use Raft-backed coordinator (like CockroachDB) for production

Tomorrow: 3PC and Sagas — alternatives to 2PC.