← Week 1: Design & Architecture

Day 6: Deployment Architecture

Phase 7 · Sep 28, 2026

← Week 1: Design & Architecture

Agenda (2–3 hours)

  • Design (60 min): Full deployment architecture — VPC layout, ECS services, ALB, IAM roles, ECR repositories
  • Review (60 min): Verify the deployment can survive an AZ failure; walk through the rolling deployment process
  • Implement (60 min): Write the Terraform (or CDK) module structure; create ECR repositories; test docker build && docker push
← Week 1: Design & Architecture

VPC Layout

VPC (10.0.0.0/16)
├── Availability Zone A
│   ├── Public subnet (10.0.1.0/24) — ALB
│   └── Private subnet (10.0.2.0/24) — API Service, Worker, VPC Endpoints
├── Availability Zone B
│   ├── Public subnet (10.0.3.0/24) — ALB
│   └── Private subnet (10.0.4.0/24) — API Service, Worker, VPC Endpoints
└── VPC Endpoints
    ├── ECR API, ECR DKR, S3 (image pull, no NAT gateway)
    ├── DynamoDB (Gateway endpoint, free)
    └── SQS (Interface endpoint)
← Week 1: Design & Architecture

ECS Services

Service Desired count CPU Memory Task role
API Service 2 (min) → 10 (max) 512 1024 DynamoDB:PutItem,UpdateItem,GetItem; SQS:SendMessage
Worker 4 (min) → 20 (max) 1024 2048 DynamoDB:PutItem,UpdateItem; SQS:ReceiveMessage,DeleteMessage
DLQ Processor 1 (min) → 2 (max) 256 512 DynamoDB:UpdateItem; SQS:ReceiveMessage,DeleteMessage

Auto-scaling triggers:

  • API: CPU > 70% for 2 minutes
  • Worker: SQS queue depth > 500
← Week 1: Design & Architecture

Rolling Deployment

ECS rolling update (minimumHealthyPercent: 50, maximumPercent: 200)

Step 1: Launch 2 new tasks with new image (total: 4 tasks)
Step 2: Wait for new tasks to pass ALB health check
Step 3: Drain and stop 2 old tasks (deregistration delay: 30s)
Step 4: Repeat until all tasks updated

Zero-downtime requirement: new task must pass health check before old task is stopped.
Worker tasks: use SIGTERM handler to complete the current task before exiting.

← Week 1: Design & Architecture

Key Takeaways

  • Private subnets + VPC endpoints eliminate NAT gateway ($0.045/GB) for ECR and AWS service calls
  • ALB deregistration delay (30s) allows in-flight gRPC calls to complete before a task is removed
  • Worker graceful shutdown: catch SIGTERM, finish current task, then exit — prevents orphaned tasks
  • IAM task roles follow least privilege — no dynamodb:*, only specific operations on specific tables

Tomorrow: architecture review — full design critique before implementation begins.