Skip to content

Test Suite Maintenance

Tests are part of the codebase — reviewed, maintained, pruned. A flaky test is a test that has stopped serving the chain. Either fix it or delete it; never just retry it.

Owners: QA, Developer, Tech Lead Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: Technical debt is unresolved decisions in the chain — including in the test suite.

Where it lives in the chain

How to do this

The test suite needs the same disciplines as the codebase:

  1. Flaky tests get a 3-strike rule. A test that fails intermittently is logged. Three fails in a month → either fixed or deleted with a follow-up scenario in the next amigos. A retried flaky test is a chain that has trained itself to ignore failure.
  2. Tests are pruned with the code. When a feature is removed or refactored, its tests are removed in the same PR. Orphan tests for dead code are noise the next reader has to parse.
  3. Test names match scenarios. When the brief's vocabulary changes, the test name changes. gradeExam_clamps_negative_balance_at_zero is meaningful three years later; test_grade_3 is not.
  4. The full suite stays under a budget. Unit < 60s. Integration < 5min. Regression < 15min. A suite that takes 40 minutes is a suite developers stop running locally.

What good practice looks like

The team's test suite has a known shape"230 unit tests, 45 integration scenarios, 12 critical paths, 8 visual baselines per major component." Growth is named and tracked. Adding 20 unit tests in a single PR is reviewed for whether all 20 add value, not waved through because "more tests is better."

A flaky test is treated as a chain-aware signal. A test that passes 95% of the time is testing something the system actually does 95% of the time — the test caught a real bug, but the team has trained itself to retry past it. The chain-aware fix is to understand why the variance exists, then either stabilise the system or stabilise the test against the variance.

The discipline: the test suite serves the chain, not the other way around. A team that adds tests for every commit ends up with a suite that takes 30 minutes and produces dozens of flaky failures every week. Curating beats accumulating.

200apps · How We Work · NWIRE