← Week 3: Distributed Data Structures

Day 16: Consistent Hashing

Phase 1 · Jun 4, 2026

← Week 3: Distributed Data Structures

Agenda (2–3 hours)

  • Read (45 min): Karger et al. "Consistent Hashing and Random Trees" (STOC 1997) §1–2; Amazon Dynamo paper §4.2 (virtual nodes)
  • Study (45 min): Trace what happens when a node joins/leaves — which keys move? Why is this O(K/n)?
  • Practice (45 min): Implement a consistent hash ring in Rust with a sorted Vec of virtual nodes and binary search
  • Challenge (30 min): How does Jump Consistent Hash avoid the overhead of a ring data structure? What does it sacrifice?
← Week 3: Distributed Data Structures

The Problem with Naive Hashing

For n nodes, assign key k to node hash(k) mod n.

Add or remove one node → n changes → almost all keys relocate.
In a distributed cache, this means a massive cache invalidation storm.

← Week 3: Distributed Data Structures

Consistent Hashing

Map both nodes and keys to the same circular hash space (e.g., 0 to 2³²-1).

Each key is assigned to the first node clockwise from it on the ring.

When a node is added: only the keys between the new node and its predecessor need to move.
When a node is removed: only the keys owned by that node need to move.

Average keys moved: K/n where K is total keys and n is node count.

← Week 3: Distributed Data Structures

Virtual Nodes

Problem: with few physical nodes, even consistent hashing produces uneven load.

Virtual nodes (vnodes): each physical node is represented by V points on the ring (e.g., V=150 in Dynamo).

Benefits:

  • Load balances naturally across the ring
  • When a node is added/removed, its virtual nodes are distributed across many predecessors — work is spread evenly

Used by: Amazon Dynamo, Cassandra, Riak, Redis Cluster

← Week 3: Distributed Data Structures

Jump Consistent Hash (Lamping & Veach, 2014)

Given a key and n buckets, produces an output in O(ln n) time with no ring data structure:

int64_t jump_consistent_hash(uint64_t key, int32_t num_buckets) {
    int64_t b = -1, j = 0;
    while (j < num_buckets) {
        b = j;
        key = key * 2862933555777941757ULL + 1;
        j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
    }
    return b;
}

Produces minimal remapping when n grows (only when increasing, not decreasing).
Used in Google's Guice and Abseil; simpler and faster than ring hashing.

← Week 3: Distributed Data Structures

Key Takeaways

  • Consistent hashing remaps only K/n keys when a node joins or leaves
  • Virtual nodes improve load distribution and simplify node failure handling
  • Jump Consistent Hash is simpler and faster when you only need to add nodes (not arbitrary removal)
  • Cassandra's partition design is built directly on consistent hashing with vnodes

Tomorrow: Merkle trees — efficient verification of large datasets.