part five · testing layers

Testing Layers

Unit, contract, integration, visual regression — each layer for a different gap.

Testing is not a uniform activity. Different layers catch different mistakes. The corpus pattern is to use each layer for what it is good at and not lean on one layer to do the work of another.

The layers

Layer	What it catches	Authored by	Frequency
Unit	Logic errors inside a function	Developer	Every story
Contract	Boundary errors between caller and callee	Developer	Every API/service boundary
Integration	Wiring errors across modules	Developer + QA	Every Epic
End-to-end (Gherkin)	The whole-flow scenario	QA writes, Developer implements	Every story's amigos output
Visual regression	Unintended UI changes	Designer + QA	Every UI change
Accessibility	Keyboard, screen reader, contrast	Designer + QA	Every UI change
Performance / load	SLO breaches under expected load	Tech Lead + QA	Per Epic when ility-relevant
Exploratory	The unknown unknowns	QA	Pre-merge

Unit tests

Smallest unit of test. Single function or module. Mocks out the world. Runs in milliseconds. The corpus rule: a story with no unit tests is a story whose logic was never written down twice. Twice is the discipline — once in the implementation, once in the test that proves it does what was named.

Unit tests are not the place for Gherkin scenarios that span multiple modules. Those go in integration or end-to-end.

Contract tests

Boundary tests. Given this caller sends this shape, the service returns this shape; given this caller sends a malformed shape, the service returns this error.

The corpus rule: every API the project exposes has at least one contract test per endpoint per documented response. The contract tests are derived from the API contract written in Volume III Part 7.

Integration tests

Wire several modules together. Often spin up real dependencies — a real database, a real Redis, a stubbed third-party. Catch the wiring mistakes that no unit test sees.

Integration tests are slower. The corpus pattern: write enough of them to cover the Epic's main flows; do not write so many that the pipeline becomes painful.

End-to-end / Gherkin

The Gherkin scenarios from amigos (Volume III Part 5) become e2e tests. Run in a browser-driver against the real frontend, real backend, real database. They are the slowest, most fragile, and most valuable.

The corpus pattern: every story has at least the required-for-prediction Gherkin scenarios as e2e. Negative cases as e2e where they cross system boundaries. The Gherkin lives next to the story in source control; the test code is generated from or aligned with the Gherkin.

Visual regression

For UI work, the rendered output is compared against an approved baseline. Failures surface as image diffs. The Designer is the approver — yes that is the intended change or no that is a regression.

The baselines come from Figma frames. Every named state has a baseline image. The Designer can update baselines when they intended the change.

Accessibility tests

Automated checks for contrast, ARIA, keyboard reachability. Manual checks for screen reader output and keyboard flow. Both run pre-merge for any UI change.

The corpus rule: accessibility failures block merge unless explicitly accepted as known issue with a remediation date. We'll fix it later without a date is the chain failing the discipline.

Performance / load

For Epics where ility selection (Volume III Part 8) names performance as material, a load test runs against staging. The test is shaped against the expected load profile.

The corpus pattern: load tests are prediction-checked, like everything else. We expect the new endpoint to hold p95 under 200ms at 200 RPS — checked.

Exploratory testing

The QA, with the brief and the journey map open, uses the feature like the named person would. Outside the scenarios. Looking for the moments that nobody named.

The corpus pattern: every pre-merge QA includes 30+ minutes of exploratory testing on the major stories. The output is the QA report (next section).

Pre-merge QA verification

Before a PR can merge, the QA verifies on the branch:

Gherkin scenarios from amigos pass.
Both flag-on and flag-off paths work.
Edge cases the QA imagined during exploration.
Accessibility baseline holds.

The verification is a checked artifact, not a Slack message. The QA writes a short report — what was tested, what was explored, what surprised. The report lives next to the PR.

QA report

The artifact at the end of pre-merge QA.

text

QA Report — PR #482 (GRD-142 Hebrew name support)
QA: Mira
Date: 2026-05-22

Tested (Gherkin):
  ✅ Hebrew name renders correctly on first load
  ✅ Mixed-form Hebrew name renders correctly
  ✅ Rare unicode form falls back gracefully
  ✅ Edit attempt is now disabled (story explicitly removes the workaround)

Explored:
  - Tried 12 names with various unicode forms; all rendered
  - Tried with extreme name length (84 chars); rendered with truncation
  - Tried Hebrew name in queue search (passes; bonus discovery)
  - Tried with screen reader; name read correctly

Surprises:
  - Unicode-fallback log line is duplicated when the same name is
    rendered twice on the same page. Filed GRD-148 (P3, cosmetic).

Not tested:
  - Bulk export view (out of scope per brief)

Accessibility:
  ✅ Contrast unchanged
  ✅ Keyboard nav unchanged
  ✅ Screen reader reads names correctly (NVDA, VoiceOver)

Visual regression:
  ✅ One intended change accepted (queue row height +2px for RTL names)

Pre-merge: APPROVED

The report is what the chain reads later — at signal reading, at postmortem, at retrospective. It is the QA's structured record of what they witnessed.

Test maintenance

Tests that are flaky, slow, or wrong are themselves chain debt. The corpus pattern: when a test is repeatedly failing for the wrong reasons, the test is fixed or deleted, not retried.

A test suite is part of the codebase. It is reviewed, maintained, and pruned. A 4,000-test suite that nobody trusts is worse than a 400-test suite that is solid.

Part 6 — Release Gate →

✦ Why We Build

◐ Before We Build

◑ What We Shape

● As We Build

◔ After We Build

◕ Did We Serve?

Testing Layers

The layers

Unit tests

Contract tests

Integration tests

End-to-end / Gherkin

Visual regression

Accessibility tests

Performance / load

Exploratory testing

Pre-merge QA verification

QA report

Test maintenance

Testing Layers ​

The layers ​

Unit tests ​

Contract tests ​

Integration tests ​

End-to-end / Gherkin ​

Visual regression ​

Accessibility tests ​

Performance / load ​

Exploratory testing ​

Pre-merge QA verification ​

QA report ​

Test maintenance ​

Testing Layers

The layers

Unit tests

Contract tests

Integration tests

End-to-end / Gherkin

Visual regression

Accessibility tests

Performance / load

Exploratory testing

Pre-merge QA verification

QA report

Test maintenance