← Week 3: Integration + failure modes + fit analysis

Day 20: Complete acm-pca-design.md §5–6

Phase 5 · September 14, 2026

← Week 3: Integration + failure modes + fit analysis

Agenda (3 hours)

  • No new reading — synthesize all of Phase 5
  • Write (150 min): Complete §5 (Failure Mode Analysis) and §6 (Integration with Provisioning Service)
  • Review (30 min): Read the full document as your tech lead
← Week 3: Integration + failure modes + fit analysis

Phase 5 Full Concepts Inventory

Rate each: ✓ solid / ~ partial / ✗ need review

Week 1 — HSM Fundamentals

  • [ ] Why CA private keys need hardware protection (vs. disk/memory)
  • [ ] FIPS 140-2 Level 3 — what it means and what it prevents
  • [ ] PKCS#11 object model (slot, token, session, handle, mechanism)
  • [ ] CKA_EXTRACTABLE = false and its implications
  • [ ] C_GenerateKeyPair → C_Sign call sequence
  • [ ] CloudHSM cluster model (multi-AZ, CU/CO user roles)
  • [ ] Key ceremony: offline root, m-of-n quorum, audit trail
← Week 3: Integration + failure modes + fit analysis

Phase 5 Full Concepts Inventory (continued)

Week 2 — ACM Private CA

  • [ ] Three-tier CA hierarchy (root, subordinate, issuance)
  • [ ] CA lifecycle: PENDING → ACTIVE (CreateCA → GetCSR → sign → ImportCert)
  • [ ] Certificate templates: built-in ARNs + APIPassthrough for URI SAN
  • [ ] IssueCertificate → GetCertificate polling + idempotency token
  • [ ] CRL configuration: expiration, S3 bucket, nextUpdate
  • [ ] OCSP: real-time revocation; AWS-managed endpoint
  • [ ] Audit: CloudTrail events + ACM PCA audit report; DynamoDB inventory

Week 3 — Integration

  • [ ] Failure mode analysis: 6 failure scenarios
  • [ ] Cross-account sharing: RAM vs. resource policy
  • [ ] Multi-region architecture: single CA vs. regional CAs
  • [ ] SPIRE + ACM PCA: UpstreamAuthority aws_pca plugin; ca_ttl cadence
  • [ ] PQC dependency chain: CloudHSM → ACM PCA → SPIRE → leaf SVIDs
  • [ ] Make-vs.-buy: ACM PCA vs. CloudHSM + self-managed CA
← Week 3: Integration + failure modes + fit analysis

§5 Complete: Failure Mode Analysis

Remaining §5 sections to write (Days 15–16 started these):

## §5. Failure Mode Analysis

### 5.1 Failure Mode Table ← Day 15
### 5.2 Monitoring and Alerting Design ← Day 15
### 5.3 Multi-Account Architecture ← Day 16
### 5.4 Cost Model and Make-vs.-Buy ← Day 19

### 5.5 Operational Runbooks (new today)

§5.5 Operational Runbooks: for each failure mode, write a 3-step recovery procedure:

  1. Detect (what alarm fires?)
  2. Diagnose (what do you check?)
  3. Recover (what do you do?)
← Week 3: Integration + failure modes + fit analysis

§6: Integration with Provisioning Service

## §6. Integration with Leo Provisioning Service

### 6.1 End-to-End Certificate Issuance Flow
<Sequence diagram: clientLambdaACM PCADynamoDBresponse>

### 6.2 SPIRE + ACM PCA Integration
<UpstreamAuthority config; certificate chain; rotation cadence>

### 6.3 IAM Policy Design
<Which role gets which ACM PCA permissions? Conditions on template ARN?>

### 6.4 DynamoDB Schema
<Table design for certificate inventory>

### 6.5 Async Issuance Pattern (optional)
<SQS-based decoupling for ACM PCA latency>

### 6.6 Fit Analysis and Recommendation
<ACM PCA vs. self-managed; concrete first step to adopt>
← Week 3: Integration + failure modes + fit analysis

§6.1: End-to-End Flow (Diagram)

Device / Service                Lambda               ACM PCA        DynamoDB
     │                            │                    │               │
     │── POST /provision {csr} ──►│                    │               │
     │                            │── validate CSR ───►│               │
     │                            │── IssueCertificate ────────────────┤
     │                            │◄── {cert_arn} ─────┤               │
     │                            │── GetCertificate ──►│               │
     │                            │◄── {cert PEM} ──────┤               │
     │                            │── PutItem ─────────────────────────►│
     │                            │  (device_id, serial,               │
     │                            │   issued_at, expires_at)            │
     │◄── 200 {cert PEM} ─────────│                    │               │

This diagram is the core of §6. Everything else annotates it.

← Week 3: Integration + failure modes + fit analysis

§6.4: DynamoDB Schema

Table: leo-certificate-inventory

Primary key:
  PK: device_id (String)  e.g., "SN-12345"
  SK: cert_serial (String) e.g., "1a:2b:3c:4d"

Attributes:
  ca_arn: "arn:aws:acm-pca:us-east-1:..."
  issued_at: "2026-09-06T10:00:00Z"
  expires_at: "2026-12-05T10:00:00Z"
  revoked_at: null or "2026-10-01T00:00:00Z"
  revocation_reason: null or "KEY_COMPROMISE"
  cert_pem: (optional; or store in S3 for large certs)
  idempotency_token: "device-SN-12345-20260906-1000"

GSI: expires_at-index  (for "find certs expiring in < 30 days")
← Week 3: Integration + failure modes + fit analysis

Challenge Assignment

Complete §5–6 and read the whole document aloud.

Quality bar for §6:

  1. A new team member should be able to implement the provisioning Lambda
    from §6.1 + §6.3 alone — no other documentation needed.

  2. A security reviewer should be able to approve §6.3 (IAM) without asking:
    "Can the Lambda issue CA certs?" (It cannot — template ARN condition blocks this.)

  3. An on-call engineer should be able to follow §5.5 during an incident
    without reading anything else.

← Week 3: Integration + failure modes + fit analysis

Preview: Phase 5 Final Day

Tomorrow:

  • Complete hsm-demo (all 5 subcommands tested)
  • Complete acm-pca-design.md (all 6 sections)
  • Phase 5 reflection
  • Preview of Phase 6: Integration Project
← Week 3: Integration + failure modes + fit analysis

Resources

  • Your Phase 5 notes (Days 1–19)
  • Phase 4 spiffe-analysis.md — same structure and quality bar for §6.6
  • Phase 3 pqc-migration-roadmap.md §8 — open questions format
  • ACM PCA Architecture Center: aws.amazon.com/architecture/security-identity-compliance