← Week 1: Circuit Breakers & Bulkheads

Day 4: Bulkhead Pattern

Phase 4 · Jul 25, 2026

← Week 1: Circuit Breakers & Bulkheads

Agenda (2–3 hours)

  • Read (45 min): Nygard "Release It!" Chapter 5 (bulkheads); AWS "Bulkhead pattern" Well-Architected documentation
  • Study (45 min): Design a bulkhead scheme for a service that has three tenant tiers (premium, standard, free); what resources should each bulkhead own?
  • Practice (45 min): Implement two separate connection pools (bulkheads) in Rust using tower::limit::ConcurrencyLimit; verify that saturation in one pool doesn't affect the other
  • Challenge (30 min): What is the tradeoff between bulkhead isolation and resource efficiency? How does AWS Lambda's account-level concurrency limit act as a bulkhead?
← Week 1: Circuit Breakers & Bulkheads

The Bulkhead Metaphor

A ship's hull is divided into watertight compartments (bulkheads). If one compartment floods, the others remain dry.

Applied to software:

  • Divide resource pools so that failure in one pool can't exhaust resources needed by another
  • A slow/failing feature doesn't starve critical features
← Week 1: Circuit Breakers & Bulkheads

What Gets Bulkheaded

Thread pools (Java, .NET):

  • Pool A: critical payment processing (fixed 20 threads)
  • Pool B: background report generation (fixed 5 threads)
  • Report generation can't starve payment processing

Connection pools (DB, HTTP):

  • Critical path: dedicated pool, max 50 connections
  • Analytics queries: dedicated pool, max 10 connections

Async semaphores (Rust/Tokio):

let payment_sem = Arc::new(Semaphore::new(100)); // 100 concurrent payment reqs
let report_sem = Arc::new(Semaphore::new(20));   // 20 concurrent report reqs
← Week 1: Circuit Breakers & Bulkheads

Semaphore-Based Bulkhead in Tower

struct BulkheadLayer {
    semaphore: Arc<Semaphore>,
}

impl<S> Layer<S> for BulkheadLayer {
    type Service = BulkheadService<S>;
    fn layer(&self, inner: S) -> Self::Service {
        BulkheadService { inner, semaphore: self.semaphore.clone() }
    }
}

impl<S, Req> Service<Req> for BulkheadService<S> where S: Service<Req> {
    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        // Try to acquire permit; return Pending if pool is saturated
        match self.semaphore.try_acquire() {
            Ok(permit) => { self.permit = Some(permit); Poll::Ready(Ok(())) }
            Err(_) => Poll::Ready(Err(BulkheadError::Full))
        }
    }
}
← Week 1: Circuit Breakers & Bulkheads

Tenant-Based Bulkheads

Multi-tenant services can use per-tenant bulkheads to prevent noisy-neighbor problems:

let bulkheads: DashMap<TenantTier, Arc<Semaphore>> = DashMap::new();
bulkheads.insert(TenantTier::Premium, Arc::new(Semaphore::new(500)));
bulkheads.insert(TenantTier::Standard, Arc::new(Semaphore::new(100)));
bulkheads.insert(TenantTier::Free, Arc::new(Semaphore::new(20)));

Free-tier saturation doesn't affect premium-tier capacity.

← Week 1: Circuit Breakers & Bulkheads

Key Takeaways

  • Bulkheads partition resource pools to prevent failure propagation between features/tenants
  • Implement with: separate thread pools (Java), separate connection pools, or semaphores (Rust)
  • Size each bulkhead to its expected peak load plus a safety margin
  • Bulkheads and circuit breakers are complementary: bulkhead limits blast radius; circuit breaker stops the cascade

Tomorrow: timeout hierarchy — multi-level timeouts for defense in depth.