← Week 3: Testing & Deployment

Day 21: Final Challenge — End-to-End System Review

Phase 7 · Oct 13, 2026

← Week 3: Testing & Deployment

Final Challenge

Answer each question from memory. Then verify by re-reading the relevant day file.

Correctness:

  1. A client submits a task and immediately submits it again with the same idempotency key. Trace the exact execution path for the second call.
  2. A worker claims a task but crashes 10 seconds later. What happens to the task? What is the state in DynamoDB after 30 seconds? After 35 seconds?
  3. Two workers race to claim the same task. Only one succeeds. Which DynamoDB operation makes this safe, and what error does the loser receive?
← Week 3: Testing & Deployment

Performance & Reliability

  1. The system is processing 500 tasks/second. DynamoDB starts throttling. List the exact sequence of events from throttle to recovery.
  2. The error budget is 43 minutes of downtime per month. The API service crashes for 2 minutes. What percentage of the monthly budget is consumed?
  3. A FastBudgetBurn alert fires at 2am. Walk through the investigation using only traces, metrics, and logs — no SSH access.
← Week 3: Testing & Deployment

Observability & Operations

  1. A customer reports a task submitted at 10:03:45 never completed. Using Jaeger, find the trace. Using Loki, find the log lines. What is the fastest path to the root cause?
  2. The CI pipeline fails at the cargo clippy step. What is the exact command to reproduce the failure locally?
  3. Describe the exact steps to roll back the API service to the previous image tag.
← Week 3: Testing & Deployment

Course Complete

You have completed the 22-week distributed systems course.

What you built:

  • Consistent, idempotent distributed task queue
  • gRPC API with mTLS and streaming
  • Event-sourced state in DynamoDB (single-table design)
  • SQS FIFO worker pool with heartbeats, retries, and DLQ
  • ECS Fargate deployment with ALB, VPC endpoints, and IAM least privilege
  • Full OTel observability: traces → Tempo, metrics → Prometheus/Grafana, logs → Loki

What comes next: apply this at Amazon Leo Secure Comms.