Release & Communication · master area
Gradual Rollout
Pilot → 1% → 5% → 25% → 100%. Named exit criteria at each step. The dashboard tells the team whether to advance, hold, or revert.
Owners: PO, Tech Lead Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: The check is observation — and observation at 1% is cheaper than observation at 100%.
Where it lives in the chain
- How We Build · Gradual rollout and flag setup — the canon
How to do this
For every release behind a flag, the rollout plan names:
- Stages and audiences. "Pilot 5 schools → 1% all schools → 5% → 25% → 100%."
- Exit criteria per stage. "Advance to next when: error rate within SLO for 24h, adoption above 60% of available users, no P0/P1 bugs filed."
- Soak time per stage. "24 hours minimum at each percentage. Pilot stage 1 week."
- Watch dashboards. Named, owned, reviewed at each step. Same dashboards the first-48-hours discipline uses.
- The revert condition. "If any of: SLO breach >5 min, data integrity concern, P0 incident — flag off immediately, investigate in staging."
What good practice looks like
A team rolling out the grading shortcut:
| Stage | Audience | Soak | Exit criteria |
|---|---|---|---|
| Pilot | 5 schools, opted in | 1 week | Five PO reads of the dashboard; one round of pilot feedback; zero P0/P1 bugs. |
| 1% | random sample | 24h | Adoption ≥40%, error rate within SLO, no helpdesk pattern. |
| 5% | random sample | 24h | Same metrics, broader sample. |
| 25% | random sample | 24h | Same metrics; load profile matches expected. |
| 100% | all | — | Soak for 2 weeks before flag removal; signal reading at +30 days. |
The PO does not advance based on absence of bad news. Advance is an active decision, based on the dashboard. If nobody watched the dashboard, the answer is "we don't know yet" — and the stage holds.
The wallet bug surfaced at staging exploratory, before any rollout. The JWT outage bypassed gradual rollout entirely — deployed to all environments simultaneously. That bypass is the lesson: gradual rollout exists for exactly the changes the team is most sure of, because those are the changes that surprise.
Related crafts
- Feature Flag Platform — the mechanism
- Release Gate Checklist — what must be in place before stage 1
- The First 48 Hours — the discipline applied at 100%