← Week 3: Log Aggregation & Analysis

Day 21: Challenge — Unified Observability Stack

Phase 6 · Sep 22, 2026

← Week 3: Log Aggregation & Analysis

Challenge Overview

Instrument the full task management system with all three observability signals:

  • Traces: OTel SDK → OTel Collector → Tempo; correlate across API Gateway + Task Service + Worker
  • Metrics: metrics crate → Prometheus → Grafana; four golden signals + SLO dashboard
  • Logs: tracing-subscriber JSON → FluentBit → CloudWatch Logs + Loki

Every signal must include trace_id and service fields for cross-signal correlation.

← Week 3: Log Aggregation & Analysis

Instrumentation Checklist

Traces:

  • [ ] tracing-opentelemetry + OTLP exporter on all three services
  • [ ] W3C traceparent propagated over HTTP and SQS message attributes
  • [ ] Semantic convention attributes on all spans (http, db, messaging)
  • [ ] Tail-based sampling: 100% errors + 5% success

Metrics:

  • [ ] metrics crate with Prometheus exporter on /metrics
  • [ ] Four golden signals: traffic, errors, P95 latency, queue depth
  • [ ] SLO recording rules (99.9% availability target)
  • [ ] Fast burn + slow burn alert rules

Logs:

  • [ ] tracing-subscriber with JSON format and with_current_span(true)
  • [ ] trace_id present in every log event
  • [ ] FluentBit sidecar routing to CloudWatch Logs and Loki
← Week 3: Log Aggregation & Analysis

Verification Scenarios

  1. Slow trace investigation: trigger a slow DynamoDB query; find it via metric spike → Tempo exemplar → log lines
  2. Error investigation: return HTTP 500 for 6 minutes; verify HighErrorRate alert fires; trace the request with the error span; read the correlated error log
  3. SLO burn rate: consume 15% of the 30-day error budget in 1 hour; verify FastBudgetBurn alert fires
← Week 3: Log Aggregation & Analysis

Phase 6 Recap

Week Topic Key Insight
Week 1 Distributed Tracing OTel SDK → Collector → Tempo; W3C propagation
Week 2 Metrics & Alerting Prometheus + PromQL + SLO burn rate alerts
Week 3 Log Aggregation Structured JSON + FluentBit + cross-signal correlation

Next phase: Integration Project — design, implement, and deploy a production-grade distributed system using all skills from Phases 1–6.