Release & Communication · master area

Gradual Rollout

Pilot → 1% → 5% → 25% → 100%. Named exit criteria at each step. The dashboard tells the team whether to advance, hold, or revert.

Owners: PO, Tech Lead Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: The check is observation — and observation at 1% is cheaper than observation at 100%.

Where it lives in the chain

How We Build · Gradual rollout and flag setup — the canon

How to do this

For every release behind a flag, the rollout plan names:

Stages and audiences. "Pilot 5 schools → 1% all schools → 5% → 25% → 100%."
Exit criteria per stage. "Advance to next when: error rate within SLO for 24h, adoption above 60% of available users, no P0/P1 bugs filed."
Soak time per stage. "24 hours minimum at each percentage. Pilot stage 1 week."
Watch dashboards. Named, owned, reviewed at each step. Same dashboards the first-48-hours discipline uses.
The revert condition. "If any of: SLO breach >5 min, data integrity concern, P0 incident — flag off immediately, investigate in staging."

What good practice looks like

A team rolling out the grading shortcut:

Stage	Audience	Soak	Exit criteria
Pilot	5 schools, opted in	1 week	Five PO reads of the dashboard; one round of pilot feedback; zero P0/P1 bugs.
1%	random sample	24h	Adoption ≥40%, error rate within SLO, no helpdesk pattern.
5%	random sample	24h	Same metrics, broader sample.
25%	random sample	24h	Same metrics; load profile matches expected.
100%	all	—	Soak for 2 weeks before flag removal; signal reading at +30 days.

The PO does not advance based on absence of bad news. Advance is an active decision, based on the dashboard. If nobody watched the dashboard, the answer is "we don't know yet" — and the stage holds.

The wallet bug surfaced at staging exploratory, before any rollout. The JWT outage bypassed gradual rollout entirely — deployed to all environments simultaneously. That bypass is the lesson: gradual rollout exists for exactly the changes the team is most sure of, because those are the changes that surprise.

Feature Flag Platform — the mechanism
Release Gate Checklist — what must be in place before stage 1
The First 48 Hours — the discipline applied at 100%

Gradual Rollout ​

Where it lives in the chain ​

How to do this ​

What good practice looks like ​

Related crafts ​

Gradual Rollout

Where it lives in the chain

How to do this

What good practice looks like

Related crafts