← Week 2: Tower Middleware Stack

Day 11: Rate Limiting

Phase 2 · Jun 20, 2026

← Week 2: Tower Middleware Stack

Agenda (2–3 hours)

  • Read (45 min): tower::limit::rate source; the governor crate README; Cloudflare "How we rate limit" blog post
  • Study (45 min): Compare token bucket vs leaky bucket vs sliding window; when does each algorithm overshoot?
  • Practice (45 min): Add per-IP rate limiting to the echo server from Day 7 using a DashMap<IpAddr, Arc<RateLimiter>>
  • Challenge (30 min): Implement a distributed rate limiter backed by Redis: use INCRBY + EXPIRE for sliding window counting
← Week 2: Tower Middleware Stack

Why Rate Limit?

Without rate limiting:

  • One buggy client can exhaust server resources
  • DDoS / abuse patterns consume capacity intended for legitimate users
  • Database connections, external API quotas can be exceeded

Rate limiting is a protection boundary — applied at the edge, per client, per API key, or globally.

← Week 2: Tower Middleware Stack

Token Bucket

The most common algorithm:

  • Bucket starts with capacity tokens
  • Tokens refill at rate r tokens/second (up to capacity)
  • Each request consumes 1 token
  • No tokens → request is rejected or queued

Properties:

  • Allows short bursts up to capacity
  • Sustained rate capped at r
  • Fair: each client gets its own bucket
use governor::{RateLimiter, Quota, clock::DefaultClock};
let limiter = RateLimiter::direct(Quota::per_second(100.try_into().unwrap()));
limiter.check().unwrap(); // returns Err(NotUntil) if over limit
← Week 2: Tower Middleware Stack

Tower Rate Limit Layer

use tower::limit::RateLimitLayer;

let service = ServiceBuilder::new()
    .layer(RateLimitLayer::new(100, Duration::from_secs(1))) // 100 req/s
    .service(my_service);

tower::limit::RateLimitLayer uses a simple token refill mechanism internally.

For production use, tower-governor wraps the governor crate (GCRA algorithm — more precise than naive token bucket):

use tower_governor::GovernorLayer;
let service = ServiceBuilder::new()
    .layer(GovernorLayer::new(governor_config))
    .service(my_service);
← Week 2: Tower Middleware Stack

Sliding Window vs Fixed Window

Fixed window: count requests in time buckets (e.g., per minute). A burst at the window boundary gets double the limit.

Sliding window: track request timestamps in a circular buffer; count requests in the last N seconds. More accurate; higher memory cost.

GCRA (Generic Cell Rate Algorithm): equivalent to a sliding window log, implemented with a single timestamp. Used by Cloudflare, Stripe, GitHub API.

← Week 2: Tower Middleware Stack

Key Takeaways

  • Token bucket allows bursts up to capacity; GCRA prevents window-boundary abuse
  • Per-client rate limits require keying the limiter by client identity (IP, API key)
  • governor + tower-governor is the production-quality choice for Rust services
  • For distributed rate limiting, a Redis-backed counter (with Lua script for atomicity) is common

Tomorrow: load balancing with Tower — distributing requests across service replicas.