← Week 2: Metrics & Alerting

Day 8: Prometheus Data Model

Phase 6 · Sep 9, 2026

← Week 2: Metrics & Alerting

Agenda (2–3 hours)

  • Read (45 min): Prometheus data model documentation; metric types — Counter, Gauge, Histogram, Summary; metrics crate documentation
  • Study (45 min): Why does Prometheus use a pull model? What are the trade-offs vs push (StatsD, CloudWatch)?
  • Practice (45 min): Add Prometheus metrics to an Axum service using the metrics crate; expose /metrics endpoint; scrape with a local Prometheus instance
  • Challenge (30 min): A histogram bucket boundary is wrong — le="0.1" is too coarse for a service with P99=50ms. Re-design the bucket boundaries for an HTTP service with P50=5ms, P95=50ms, P99=200ms
← Week 2: Metrics & Alerting

Prometheus Metric Types

Type Use case Example
Counter Monotonically increasing count http_requests_total
Gauge Point-in-time value queue_depth, memory_bytes
Histogram Distribution of values request_duration_seconds
Summary Pre-computed quantiles (client-side) Rarely used with Prometheus

Always prefer Histogram over Summary — PromQL can compute quantiles from histograms server-side; summaries cannot be aggregated across instances.

← Week 2: Metrics & Alerting

metrics Crate

use metrics::{counter, gauge, histogram};
use metrics_exporter_prometheus::PrometheusBuilder;

// In main() — register the Prometheus recorder
PrometheusBuilder::new()
    .with_http_listener(([0, 0, 0, 0], 9090))
    .install()
    .expect("failed to install Prometheus recorder");

// In handlers — record observations
counter!("http_requests_total", "method" => "GET", "status" => "200").increment(1);
histogram!("request_duration_seconds").record(elapsed.as_secs_f64());
gauge!("queue_depth", "queue" => "task-events").set(depth as f64);

Labels ("key" => "value") create separate time series per label combination.

← Week 2: Metrics & Alerting

Prometheus Pull Model

Prometheus server
  → HTTP GET /metrics (every 15s)
  → Parses exposition format:
      http_requests_total{method="GET",status="200"} 1234
      request_duration_seconds_bucket{le="0.005"} 100
      request_duration_seconds_bucket{le="0.05"} 890
      request_duration_seconds_bucket{le="0.2"} 999
      request_duration_seconds_bucket{le="+Inf"} 1000
      request_duration_seconds_sum 45.2
      request_duration_seconds_count 1000

Pull model: Prometheus controls the scrape interval; no SDK buffering needed; works behind NAT.

← Week 2: Metrics & Alerting

Key Takeaways

  • Counters: always increasing; Gauges: current value; Histograms: distribution with bucketed counts
  • Histograms are aggregatable; summaries are not — prefer histograms for multi-instance services
  • The metrics crate is backend-agnostic; swap Prometheus for CloudWatch without changing code
  • Histogram buckets should cover P50–P99 with ~5 buckets between common latency thresholds

Tomorrow: PromQL — querying rates, quantiles, and aggregations.