← Week 2: Distributed Transactions

Day 12: Distributed Locks

Phase 4 · Aug 2, 2026

← Week 2: Distributed Transactions

Agenda (2–3 hours)

  • Read (45 min): Martin Kleppmann "How to do Distributed Locking" blog post; Redis Redlock documentation and Kleppmann's critique
  • Study (45 min): What happens when a process holds a distributed lock and pauses (GC pause, network hiccup)? How does fencing prevent damage?
  • Practice (45 min): Implement a distributed lock using DynamoDB conditional writes; test concurrent acquisition with two Rust processes
  • Challenge (30 min): Design a lock manager using etcd leases; what happens when the process holding the lock crashes?
← Week 2: Distributed Transactions

Why Distributed Locks Are Hard

A distributed lock protects a resource from concurrent modification across multiple processes.

Problem: the lock holder can be paused (GC pause, OOM swap, VM migration) after acquiring the lock but before the protected operation completes.

Process A: acquires lock
Process A: [pauses for 30 seconds — GC pause]
Process B: lock expires; acquires lock; modifies resource
Process A: [resumes]; modifies resource — DATA CORRUPTION

Solution: fencing tokens — a monotonically increasing number issued with each lock grant. The protected resource rejects operations with stale tokens.

← Week 2: Distributed Transactions

Redis Redlock

Redlock acquires a lock on N independent Redis instances:

  1. Get current time
  2. Acquire lock on each instance with TTL
  3. Lock is acquired if majority (N/2+1) succeed AND total time < TTL/2
  4. Lock validity = TTL - elapsed time

Controversy (Kleppmann):

  • Redis persistence is async (not guaranteed durable) — crash can lose lock state
  • Clock skew between Redis instances can invalidate lock timing assumptions
  • No fencing token support

Recommendation: Use etcd leases or DynamoDB conditional writes for critical distributed locking.

← Week 2: Distributed Transactions

DynamoDB Distributed Lock

let result = dynamodb.put_item()
    .table_name("locks")
    .item("lock_id", AttributeValue::S("my-resource".to_string()))
    .item("owner", AttributeValue::S(owner_id.to_string()))
    .item("ttl", AttributeValue::N(expiry_timestamp.to_string()))
    .item("fence_token", AttributeValue::N(token.to_string()))
    .condition_expression("attribute_not_exists(lock_id) OR ttl < :now")
    .expression_attribute_values(":now", AttributeValue::N(now.to_string()))
    .send()
    .await;

match result {
    Ok(_) => { /* lock acquired */ }
    Err(e) if e.is_conditional_check_failed() => { /* lock held by another */ }
    Err(e) => Err(e)?,
}
← Week 2: Distributed Transactions

etcd Lease-Based Lock

# Create a lease (30s TTL)
etcdctl lease grant 30
# lease c03060f5e0dce027 granted with TTL(30s)

# Acquire lock (fails if key exists)
etcdctl txn <<EOF
compare: version("lock/myresource") = "0"
success: put "lock/myresource" "owner-123" --lease=c03060f5e0dce027
failure: get "lock/myresource"
EOF

# Keep alive (call periodically while holding lock)
etcdctl lease keep-alive c03060f5e0dce027

# Release
etcdctl lease revoke c03060f5e0dce027

Lease expiry = automatic lock release when the holder crashes. No orphaned locks.

← Week 2: Distributed Transactions

Key Takeaways

  • Distributed locks are vulnerable to holder pause — use fencing tokens to prevent damage
  • Redlock is controversial for critical operations; prefer etcd or DynamoDB
  • etcd leases are the production-grade choice: Raft-backed, auto-expire on crash, fencing token via revision
  • DynamoDB conditional writes provide compare-and-swap semantics without a separate lock service

Tomorrow: idempotency keys — safe at-least-once delivery.