Quality & Testing · master area
Test Suite Maintenance
Tests are part of the codebase — reviewed, maintained, pruned. A flaky test is a test that has stopped serving the chain. Either fix it or delete it; never just retry it.
Owners: QA, Developer, Tech Lead Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: Technical debt is unresolved decisions in the chain — including in the test suite.
Where it lives in the chain
- How We Build · Testing as Chain Verification — the canon
How to do this
The test suite needs the same disciplines as the codebase:
- Flaky tests get a 3-strike rule. A test that fails intermittently is logged. Three fails in a month → either fixed or deleted with a follow-up scenario in the next amigos. A retried flaky test is a chain that has trained itself to ignore failure.
- Tests are pruned with the code. When a feature is removed or refactored, its tests are removed in the same PR. Orphan tests for dead code are noise the next reader has to parse.
- Test names match scenarios. When the brief's vocabulary changes, the test name changes.
gradeExam_clamps_negative_balance_at_zerois meaningful three years later;test_grade_3is not. - The full suite stays under a budget. Unit < 60s. Integration < 5min. Regression < 15min. A suite that takes 40 minutes is a suite developers stop running locally.
What good practice looks like
The team's test suite has a known shape — "230 unit tests, 45 integration scenarios, 12 critical paths, 8 visual baselines per major component." Growth is named and tracked. Adding 20 unit tests in a single PR is reviewed for whether all 20 add value, not waved through because "more tests is better."
A flaky test is treated as a chain-aware signal. A test that passes 95% of the time is testing something the system actually does 95% of the time — the test caught a real bug, but the team has trained itself to retry past it. The chain-aware fix is to understand why the variance exists, then either stabilise the system or stabilise the test against the variance.
The discipline: the test suite serves the chain, not the other way around. A team that adds tests for every commit ends up with a suite that takes 30 minutes and produces dozens of flaky failures every week. Curating beats accumulating.
Related crafts
- Unit Testing — the layer most at risk of bloat
- Regression Testing — the layer most at risk of staleness