← Week 3: Testing & Deployment

Day 15: Load Testing

Phase 7 · Oct 7, 2026

← Week 3: Testing & Deployment

Agenda (2–3 hours)

  • Implement (60 min): Write a ghz or custom Tokio load generator that submits tasks at a target rate; ramp from 50 to 500 tasks/second
  • Test (90 min): Run load test against the staging environment; measure P50/P95/P99 latency, error rate, and DynamoDB WCU consumption
  • Review (30 min): Compare observed latency to the P99 < 100ms SLO; identify and fix the bottleneck
← Week 3: Testing & Deployment

Load Generator

// Submit tasks at `rate` req/s using a token bucket
async fn load_test(client: TaskQueueClient, rate: u32, duration: Duration) {
    let interval = Duration::from_nanos(1_000_000_000 / rate as u64);
    let mut ticker = tokio::time::interval(interval);
    let deadline = Instant::now() + duration;
    let mut i: u64 = 0;

    while Instant::now() < deadline {
        ticker.tick().await;
        let client = client.clone();
        tokio::spawn(async move {
            let start = Instant::now();
            let result = client.submit_task(SubmitTaskRequest {
                idempotency_key: format!("load-{i}"),
                task_type: "noop".to_string(),
                ..Default::default()
            }).await;
            histogram!("load_test.latency_ms").record(start.elapsed().as_millis() as f64);
            if result.is_err() { counter!("load_test.errors").increment(1); }
        });
        i += 1;
    }
}
← Week 3: Testing & Deployment

Expected Results at 500 req/s

Metric Target Actual (measure)
P50 latency < 20ms TBD
P95 latency < 50ms TBD
P99 latency < 100ms TBD
Error rate < 0.1% TBD
DynamoDB WCU < 2000/s TBD
SQS queue depth < 500 TBD

Bottleneck candidates: DynamoDB throttling, tonic connection limits, ECS CPU saturation.

← Week 3: Testing & Deployment

Profiling Bottlenecks

If P99 > 100ms:

  1. Check request_duration_seconds histogram by span — which span is slow?
  2. Check db.PutItem span duration — DynamoDB throttling?
  3. Check ECS task CPU utilization in CloudWatch — API service saturated?
  4. Check sqs.SendMessage duration — SQS slow?

If DynamoDB throttles:

  • Switch to on-demand capacity mode (auto-scales without provisioning)
  • Or increase provisioned WCU and enable auto-scaling
← Week 3: Testing & Deployment

Key Takeaways

  • Load test before declaring the SLO met — paper estimates are not measurements
  • The token-bucket pattern in Tokio (interval + spawn) produces accurate open-loop load
  • P99 latency bottlenecks are rarely where you expect — trace data from the load test reveals the real culprit
  • DynamoDB on-demand capacity absorbs load spikes without pre-planning; provisioned mode is cheaper at steady state

Tomorrow: failure injection — chaos engineering for the task queue.