Pipeline & Operations · master area
Rollback Discipline
Four levels — flag, deploy, migration, data. Plan for the level you might need before you ship. "Rollback possible" is not. A confirmed time is.
Owners: Tech Lead, On-call Phase it lives in: How We Build → After We Build The corpus principle this enacts: Rollback possible is not. A confirmed time is.
Where it lives in the chain
- How We Build · The Release · four-level rollback — the canon
The four levels
| Level | When | Cost | Time |
|---|---|---|---|
| 1. Flag off | The change was flag-wrapped. | Lowest. | Seconds. |
| 2. Deploy rollback | The change is in deployed code. | Low. Last known good. | Minutes. |
| 3. Migration rollback | The change altered schema or data shape. | Medium. Only if migration was authored to be reversible. | Minutes-to-hours. |
| 4. Data correction | The change corrupted or lost data. | Highest. Always after bleeding has stopped, never during. | Hours-to-days. |
How to do this
For every change, the rollback level is named before the change ships. The release brief includes:
- Rollback level required — "This is flag-wrapped; rollback at level 1."
- Rollback rehearsed in staging — yes, last week. Took 2 minutes.
- Confirmed time — "From decision-to-rollback to recovered state: 5 minutes."
If the rollback level is 3 or 4, the change has elevated risk and warrants additional review at the release gate.
What good practice looks like
The JWT outage: detection 4 minutes, root cause 8 minutes, revert 5 minutes. 17 minutes total. The rollback was at level 2 (deploy rollback) because the change was not flag-wrapped. Had it been flag-wrapped — level 1 — the recovery would have been seconds and 12,400 users would not have been locked out. The lesson: anything that can lock users out deserves the flag.
A team that says "we can roll back" without naming the level or the time is a team that has not yet planned the rollback. On the day, that team discovers the migration is one-way; the flag was never wired; the deploy rolled back to a state the migration left incompatible. The four-level discipline forces the team to name what they are committing to before they need it.
Related crafts
- Runbooks — what describes each rollback level's procedure
- Migration Design — where reversibility is built in
- Incident Management — where rollback is exercised