← Week 1: Distributed Tracing

Day 1: Observability Foundations & Distributed Tracing Concepts

Phase 6 · Sep 2, 2026

← Week 1: Distributed Tracing

Agenda (2–3 hours)

  • Read (45 min): OpenTelemetry specification — traces, metrics, logs, baggage; Dapper (Google) paper §1–3
  • Study (45 min): What problem does distributed tracing solve that logs alone cannot? Draw a trace for a 3-service request chain
  • Practice (45 min): Instrument a simple Axum service with the tracing crate; emit spans for each handler and a database call
  • Challenge (30 min): A request spans 5 services; latency P99 is 500ms but the root service only sees 50ms locally. Where is the time going, and how does tracing help?
← Week 1: Distributed Tracing

The Three Pillars

Signal Answers Storage
Traces Which services were called? Where is latency? Jaeger, Tempo, X-Ray
Metrics How is the system performing right now? Prometheus, CloudWatch
Logs What happened in detail? CloudWatch Logs, OpenSearch

Observability = being able to answer "why is this slow/broken?" from external outputs alone — without SSH access or code changes.

← Week 1: Distributed Tracing

Trace Anatomy

Trace ID: abc123
├── Span: api-gateway (0ms → 200ms)      ← root span
│   ├── Span: auth-service (2ms → 15ms)
│   └── Span: task-service (20ms → 190ms)
│       ├── Span: db:query (25ms → 85ms)  ← hot spot
│       └── Span: cache:get (90ms → 92ms)
  • Trace: entire request journey across services
  • Span: single unit of work with start time, duration, attributes, events
  • Trace ID: propagated in HTTP headers (traceparent W3C format)
  • Parent span ID: links child spans to their parent
← Week 1: Distributed Tracing

tracing Crate Basics

use tracing::{info, instrument, span, Level};

#[instrument(skip(db), fields(user_id = %user_id))]
async fn get_tasks(user_id: &str, db: &Db) -> Result<Vec<Task>, Error> {
    info!("fetching tasks");
    let span = span!(Level::DEBUG, "db.query", table = "tasks");
    let _enter = span.enter();
    db.query_tasks(user_id).await
}

#[instrument] creates a span for the function scope.
fields(...) attaches structured key-value data to the span.

← Week 1: Distributed Tracing

Key Takeaways

  • Distributed tracing correlates work across service boundaries via a shared trace ID
  • Spans form a tree; parent-child links reconstruct the full call graph
  • The tracing crate is the Rust standard for structured, level-filtered span emission
  • Observability requires all three signals: traces for path, metrics for volume, logs for detail

Tomorrow: the OpenTelemetry Rust SDK — connecting tracing spans to OTLP exporters.