Cloud Architecture Guide
DAY_03 / SECTION_04 // LIVE DEMO
DEMO READY

Observability & scale walkthrough

The same console as Day 1 — different layers. Metrics, alarms, audit, autoscaling. The drama is when an instance goes unhealthy and the ASG replaces it live.

~/aws/observability-walkthrough.sh ~30 MIN
  1. 01

    CloudWatch Metrics

    Pick the EC2 instance from Day 1. Find CPU / network / disk. Show the per-AZ split if multi-AZ.

  2. 02

    Custom metric

    Push one from CLI or Lambda. Talk through what custom metrics cost and when they're worth it.

  3. 03

    Set up an alarm

    CPU > 80% for 5 min → SNS email. Discuss alerting hygiene — what should page someone, what shouldn't.

  4. 04

    CloudTrail audit

    Find a single API call from Day 1's demo. Show retention, immutability, and why this is the source of truth in incidents.

  5. 05

    Auto Scaling Group

    Create a launch template. Scale on CPU. Walk through min/max/desired and what each one does.

  6. 06

    Application Load Balancer

    Register the ASG as a target group. Show health checks, listener rules, and how the LB hides instance churn from clients.

  7. 07

    Take an instance unhealthy

    The ASG replaces it. The drama is the lesson — engineering self-healing as a teaching moment.

  8. 08

    CloudWatch Logs Insights

    Run a query against the log stream. Show how observability becomes investigatable, not just visible.

// the four golden signals

From Google SRE — start with these four when instrumenting any service:

latency

how slow

traffic

how busy

errors

how broken

saturation

how full

// discussion: which of the four are we not covering with this demo, and why?

// what to take away
  • — Find any AWS metric, log, or audit event in under a minute
  • — Set an alarm and explain why it should fire
  • — Read an Auto Scaling Group config
  • — Justify what's missing from "good-enough" observability