Observability & scale walkthrough

The same console as Day 1 — different layers. Metrics, alarms, audit, autoscaling. The drama is when an instance goes unhealthy and the ASG replaces it live.

~/aws/observability-walkthrough.sh ~30 MIN

01

CloudWatch Metrics

Pick the EC2 instance from Day 1. Find CPU / network / disk. Show the per-AZ split if multi-AZ.
02

Custom metric

Push one from CLI or Lambda. Talk through what custom metrics cost and when they're worth it.
03

Set up an alarm

CPU > 80% for 5 min → SNS email. Discuss alerting hygiene — what should page someone, what shouldn't.
04

CloudTrail audit

Find a single API call from Day 1's demo. Show retention, immutability, and why this is the source of truth in incidents.
05

Auto Scaling Group

Create a launch template. Scale on CPU. Walk through min/max/desired and what each one does.
06

Application Load Balancer

Register the ASG as a target group. Show health checks, listener rules, and how the LB hides instance churn from clients.
07

Take an instance unhealthy

The ASG replaces it. The drama is the lesson — engineering self-healing as a teaching moment.
08

CloudWatch Logs Insights

Run a query against the log stream. Show how observability becomes investigatable, not just visible.

// the four golden signals

From Google SRE — start with these four when instrumenting any service:

latency

how slow

traffic

how busy

errors

how broken

saturation

how full

// discussion: which of the four are we not covering with this demo, and why?

// what to take away

— Find any AWS metric, log, or audit event in under a minute
— Set an alarm and explain why it should fire
— Read an Auto Scaling Group config
— Justify what's missing from "good-enough" observability

arrow_back day-3 / consensus day-3 / synthesis arrow_forward

CloudWatch Metrics

Custom metric

Set up an alarm

CloudTrail audit

Auto Scaling Group

Application Load Balancer

Take an instance unhealthy

CloudWatch Logs Insights