Key Takeaways
- Every failure mode from the architecture review should map to at least one alert
- Runbooks written at design time (not post-incident) lead to faster incident resolution
- SLOs frame the observability plan: what to alert on is derived from what the SLO protects
- Trace context in SQS message attributes is the critical link for worker-side spans
Tomorrow: deployment architecture — ECS services, VPC, ALB, and CI/CD pipeline.