← Week 1: Distributed Tracing

Day 7: Challenge — Distributed Tracing Across 3 Services

Phase 6 · Sep 8, 2026

← Week 1: Distributed Tracing

Challenge Overview

Add end-to-end distributed tracing to a system with:

  • API Gateway (Axum) — receives HTTP requests, calls Task Service via gRPC
  • Task Service (tonic) — handles task CRUD, writes to DynamoDB, publishes to SQS
  • Worker (Tokio) — consumes SQS messages, updates task status

Trace a CreateTask request from API Gateway through to the SQS worker.

← Week 1: Distributed Tracing

Instrumentation Checklist

API Gateway:

  • [ ] OtelAxumLayer on all routes
  • [ ] gRPC client propagates traceparent in metadata headers
  • [ ] Spans: http.request, grpc.client.CreateTask

Task Service:

  • [ ] tracing-opentelemetry tonic interceptor on server
  • [ ] DynamoDB spans with db.system = "dynamodb", db.operation
  • [ ] SQS publish span with messaging.system = "aws_sqs", messaging.destination

Worker:

  • [ ] Extract trace context from SQS message attributes
  • [ ] Link span to the producer trace (use SpanKind::Consumer)
  • [ ] DynamoDB update span
← Week 1: Distributed Tracing

Verification

In Jaeger, search for the CreateTask trace:

Service: api-gateway
Operation: POST /tasks
Duration: > 0ms

Expected trace tree:

api-gateway: POST /tasks (root)
  task-service: /tasks.TaskService/CreateTask
    db: PutItem (tasks table)
    sqs: SendMessage (task-events)
      worker: process task-created          ← linked span
        db: UpdateItem (tasks table)
← Week 1: Distributed Tracing

Week 1 Recap

Topic Key Insight
Trace anatomy Trace ID + span tree; traceparent header
OTel SDK Batch exporter → Collector → backend
Jaeger / Tempo Indexed search vs object storage
Sampling Tail-based for error capture; head-based for cost
Context propagation W3C headers; manual inject/extract for queues
Instrumentation Semantic conventions; skip PII with #[instrument(skip)]

Next week: Metrics & Alerting — Prometheus, PromQL, Grafana, and SLOs.