← Week 2: Metrics & Alerting

Day 13: CloudWatch Metrics and Alarms

Phase 6 · Sep 14, 2026

← Week 2: Metrics & Alerting

Agenda (2–3 hours)

  • Read (45 min): CloudWatch custom metrics documentation; CloudWatch Alarms; CloudWatch Contributor Insights
  • Study (45 min): Compare Prometheus pull-scrape vs CloudWatch PutMetricData push. Which is better for Lambda functions? For ECS services?
  • Practice (45 min): Publish custom metrics from a Lambda function using the aws-sdk-cloudwatch crate; create a CloudWatch alarm for error count > 10 per minute
  • Challenge (30 min): CloudWatch charges per metric dimension combination. A service with 10 routes × 5 status codes = 50 metric series. Design a dimension strategy that stays under 10 series while preserving useful granularity
← Week 2: Metrics & Alerting

CloudWatch Custom Metrics

use aws_sdk_cloudwatch::types::{Dimension, MetricDatum, StandardUnit};

cloudwatch.put_metric_data()
    .namespace("TaskService")
    .metric_data(
        MetricDatum::builder()
            .metric_name("TasksCreated")
            .value(1.0)
            .unit(StandardUnit::Count)
            .dimensions(
                Dimension::builder()
                    .name("Environment")
                    .value("production")
                    .build()
            )
            .build()
    )
    .send().await?;

High-resolution metrics: storage_resolution(1) for 1-second granularity (higher cost).

← Week 2: Metrics & Alerting

CloudWatch Alarms

{
  "AlarmName": "TaskService-HighErrorRate",
  "MetricName": "ErrorCount",
  "Namespace": "TaskService",
  "Statistic": "Sum",
  "Period": 60,
  "EvaluationPeriods": 5,
  "DatapointsToAlarm": 3,
  "Threshold": 10,
  "ComparisonOperator": "GreaterThanThreshold",
  "TreatMissingData": "notBreaching",
  "AlarmActions": ["arn:aws:sns:us-east-1:123:pagerduty-topic"]
}

DatapointsToAlarm: 3 of EvaluationPeriods: 5 — 3 of the last 5 minutes must breach.

← Week 2: Metrics & Alerting

EMF — Embedded Metric Format

Emit structured logs that CloudWatch auto-extracts as metrics (no SDK call needed):

// Lambda function: write EMF JSON to stdout
let emf = serde_json::json!({
    "_aws": {
        "Timestamp": unix_ms,
        "CloudWatchMetrics": [{
            "Namespace": "TaskService",
            "Dimensions": [["FunctionName"]],
            "Metrics": [{"Name": "TasksProcessed", "Unit": "Count"}]
        }]
    },
    "FunctionName": context.function_name(),
    "TasksProcessed": batch_size,
});
println!("{}", emf);
← Week 2: Metrics & Alerting

Key Takeaways

  • CloudWatch PutMetricData push is natural for Lambda (no scrape target); ECS works with Prometheus or EMF
  • EMF (Embedded Metric Format) generates CloudWatch metrics from structured log lines — no extra SDK calls
  • DatapointsToAlarm provides a tolerance window similar to Prometheus for: duration
  • Limit metric dimensions to avoid cardinality explosion and cost ($0.30/metric/month)

Tomorrow: Phase 6 Week 2 Challenge — full Prometheus + Grafana + Alertmanager stack.