TDD, performance + non-functional tests, the andon cord

The pipeline is built. The pyramid is in shape. Now the operating discipline: write the test first, treat performance + non-functional concerns as first-class pipeline citizens, and stop the line when the build goes red.

// test-driven development · kent beck

Developed by Kent Beck in the late 1990s as part of Extreme Programming. Three steps, repeated forever:

01 · red

Write a failing test

Express what the code should do before it exists. The test fails because the behavior isn't built yet.

02 · green

Write the smallest code that passes

Minimum implementation that makes the test pass. Don't anticipate; just get to green.

03 · refactor

Clean up while tests stay green

Remove duplication, improve names, tighten boundaries — with the safety net of tests still passing.

// automate as many manual tests as possible

"Although testing can be automated, creating quality cannot. To have humans executing tests that should be automated is a waste of human potential."

// Elisabeth Hendrickson — Flowcon 2013.

BUT — automating unreliable tests is worse than not automating them. False positives waste time, stress out devs, and eventually get ignored. Better: a small number of reliable tests than a large number of flaky ones. Grow the suite over time, never at the expense of trust.

// performance tests in the pipeline

Performance problems are often invisible until production — database indexes missing, a code change that 10x's network calls, query plans that go non-linear under load. Catch these in the pipeline by running automated performance tests against the full stack (code, DB, storage, network, virtualization).

catch · query plans

DB query time grows non-linearly (e.g., missing index — page load jumps from 100ms to 30s).

catch · call-rate spikes

Code change 10x's DB calls, storage use, or network traffic per request.

catch · regressions

Fail the perf job when latency deviates >2% from the previous run.

enables · capacity planning

Knowing how the system behaves under realistic load = real capacity planning, not guessing.

// non-functional / infrastructure tests

Many non-functional requirements (security, performance, availability) depend on the environment being configured right. If env-as-code lives in Terraform / Puppet / Chef / Ansible / Salt, the same testing frameworks can validate it — encode environment assertions as Cucumber / Gherkin scenarios so failures are readable.

— Required OS packages and versions installed.
— Required services running.
— Network reachable as expected; no unintended open ports.
— TLS certs valid and not near expiry.
— No secrets in plain text on disk.

// pull the andon cord when the pipeline breaks

In a Toyota plant, any worker can pull the andon cord to stop the line when they spot a defect. The whole team swarms to fix it before the problem moves downstream. Same rule for the deployment pipeline.

When the pipeline goes red:

— Stop adding new work on top of the broken build.
— The author of the breaking commit owns the fix; the team helps.
— No code goes downstream from a red build.

Randy Shoup (former Engineering Director, Google App Engine): the green build is the prerequisite for everything else. Letting it stay red turns every downstream deploy into a roll of the dice — by the time you stabilize, you're back in waterfall.

help Knowledge Check

Question 1/2

A team has 800 flaky tests that fail at random. Devs are turning off CI checks to merge. What's the right move?

// pick one to verify

help Knowledge Check

Question 2/2

Pipeline goes red Tuesday morning. The team agrees to "fix it Friday" and keeps merging features on top. What does the andon-cord principle predict?

// pick one to verify

arrow_back mod-05 / test-types mod-06 / ci + releases arrow_forward