TDD, performance + non-functional tests, the andon cord
The pipeline is built. The pyramid is in shape. Now the operating discipline: write the test first, treat performance + non-functional concerns as first-class pipeline citizens, and stop the line when the build goes red.
Developed by Kent Beck in the late 1990s as part of Extreme Programming. Three steps, repeated forever:
Express what the code should do before it exists. The test fails because the behavior isn't built yet.
Minimum implementation that makes the test pass. Don't anticipate; just get to green.
Remove duplication, improve names, tighten boundaries — with the safety net of tests still passing.
"Although testing can be automated, creating quality cannot. To have humans executing tests that should be automated is a waste of human potential."
// Elisabeth Hendrickson — Flowcon 2013.
BUT — automating unreliable tests is worse than not automating them. False positives waste time, stress out devs, and eventually get ignored. Better: a small number of reliable tests than a large number of flaky ones. Grow the suite over time, never at the expense of trust.
Performance problems are often invisible until production — database indexes missing, a code change that 10x's network calls, query plans that go non-linear under load. Catch these in the pipeline by running automated performance tests against the full stack (code, DB, storage, network, virtualization).
DB query time grows non-linearly (e.g., missing index — page load jumps from 100ms to 30s).
Code change 10x's DB calls, storage use, or network traffic per request.
Fail the perf job when latency deviates >2% from the previous run.
Knowing how the system behaves under realistic load = real capacity planning, not guessing.
Many non-functional requirements (security, performance, availability) depend on the environment being configured right. If env-as-code lives in Terraform / Puppet / Chef / Ansible / Salt, the same testing frameworks can validate it — encode environment assertions as Cucumber / Gherkin scenarios so failures are readable.
- — Required OS packages and versions installed.
- — Required services running.
- — Network reachable as expected; no unintended open ports.
- — TLS certs valid and not near expiry.
- — No secrets in plain text on disk.
In a Toyota plant, any worker can pull the andon cord to stop the line when they spot a defect. The whole team swarms to fix it before the problem moves downstream. Same rule for the deployment pipeline.
When the pipeline goes red:
- — Stop adding new work on top of the broken build.
- — The author of the breaking commit owns the fix; the team helps.
- — No code goes downstream from a red build.
Randy Shoup (former Engineering Director, Google App Engine): the green build is the prerequisite for everything else. Letting it stay red turns every downstream deploy into a roll of the dice — by the time you stabilize, you're back in waterfall.
A team has 800 flaky tests that fail at random. Devs are turning off CI checks to merge. What's the right move?
// pick one to verify
Pipeline goes red Tuesday morning. The team agrees to "fix it Friday" and keeps merging features on top. What does the andon-cord principle predict?
// pick one to verify