← Week 2: Tower Middleware Stack

Day 10: Retry Logic

Phase 2 · Jun 19, 2026

← Week 2: Tower Middleware Stack

Agenda (2–3 hours)

  • Read (45 min): tower::retry source and RetryPolicy trait; AWS retry guidance (exponential backoff with jitter)
  • Study (45 min): Why is retry + timeout ordering important? Which goes on the outside?
  • Practice (45 min): Implement a retry policy that retries up to 3 times on transient errors with exponential backoff + jitter
  • Challenge (30 min): What makes an operation safe to retry? Design an idempotency key scheme for an order submission endpoint
← Week 2: Tower Middleware Stack

When to Retry

Safe to retry (idempotent or read):

  • GET requests
  • DNS lookups
  • Connection failures before a request was sent

NOT safe to retry (non-idempotent without guards):

  • Creating an order (may double-charge)
  • Sending an email
  • Deducting from a balance

Solution: idempotency keys — include a client-generated UUID; server deduplicates by key.

← Week 2: Tower Middleware Stack

Tower Retry

use tower::retry::{Retry, Policy};

#[derive(Clone)]
struct RetryPolicy {
    max_attempts: u32,
}

impl<Req: Clone, Res, E: IsTransient> Policy<Req, Res, E> for RetryPolicy {
    type Future = Ready<Self>;
    fn retry(&self, req: &Req, result: Result<&Res, &E>) -> Option<Self::Future> {
        match result {
            Ok(_) => None, // success, don't retry
            Err(e) if e.is_transient() && self.max_attempts > 0 => {
                Some(ready(RetryPolicy { max_attempts: self.max_attempts - 1 }))
            }
            _ => None, // non-transient or exhausted
        }
    }
    fn clone_request(&self, req: &Req) -> Option<Req> { Some(req.clone()) }
}
← Week 2: Tower Middleware Stack

Exponential Backoff with Jitter

fn backoff_duration(attempt: u32) -> Duration {
    let base = Duration::from_millis(100);
    let cap = Duration::from_secs(30);
    // Exponential: 100ms, 200ms, 400ms, 800ms, ...
    let exp = base * 2u32.pow(attempt);
    let capped = exp.min(cap);
    // Full jitter: [0, capped)
    let jitter_ms = rand::random::<u64>() % capped.as_millis() as u64;
    Duration::from_millis(jitter_ms)
}

Why jitter? Without it, all clients back off to the same exponential values and storm the server simultaneously when the backoff expires. Jitter spreads the load.

AWS recommends "full jitter" (random in [0, cap]) as the default.

← Week 2: Tower Middleware Stack

Retry + Timeout Ordering

// WRONG: timeout wraps retries — total time budget includes all retry attempts
let service = ServiceBuilder::new()
    .layer(TimeoutLayer::new(Duration::from_secs(10))) // outer
    .layer(RetryLayer::new(policy))
    .service(inner);

// RIGHT (usually): retry wraps timeout — each attempt gets its own timeout budget
let service = ServiceBuilder::new()
    .layer(RetryLayer::new(policy))                    // outer
    .layer(TimeoutLayer::new(Duration::from_secs(2)))  // per-attempt timeout
    .service(inner);

Context matters: for total deadline enforcement, put timeout on the outside.

← Week 2: Tower Middleware Stack

Key Takeaways

  • Only retry idempotent operations; use idempotency keys for non-idempotent ones
  • Exponential backoff + full jitter prevents retry storms
  • Tower's RetryPolicy trait lets you classify errors as transient/permanent
  • Layer ordering matters: which timeout applies to which scope?

Tomorrow: rate limiting — preventing any single caller from overwhelming the service.