Pipeline & Operations · master area
Environment Management
Dev → staging → production. Three environments, three purposes, three rules. Test in staging unless the alternative is impossible. The JWT outage is the cautionary tale for the rule.
Owners: Tech Lead, DevOps Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: Configuration changes are code changes.
Where it lives in the chain
- How We Build · The Pipeline — three environments, trunk-based
The three environments
| Environment | Purpose | Rules |
|---|---|---|
| Dev | Developers iterate locally and on shared dev. | Move fast. Reset freely. Data is synthetic or anonymised. Failure is cheap. |
| Staging | Pre-production verification — same shape as prod, smaller scale. | Migrations run here first. Load tests run here. Soak time before promotion. Data is anonymised production sample or seeded. |
| Production | The users live here. | Changes arrive via the pipeline, never directly. Migrations are tested in staging first. Rollback is rehearsed, not improvised. |
How to do this
- Same code in all three. If staging passes and production fails, the difference is data, config, or scale — not code.
- Migrations follow
dev → staging → prod, with a soak window in each. Skipping staging is what produced the JWT outage — the 6-line XML change deployed to all environments simultaneously. - Production access is read-only by default. Write access is paged-in for incidents and signed-off, not casual.
- Each environment has its own runbooks. "Reset dev" is one document; "Restore prod from backup" is another, and they share nothing.
What good practice looks like
A migration PR's lifecycle:
- Dev — local migration runs; developer iterates.
- Staging — migration runs in CI against staging snapshot; soak for 30 min; load test confirms SLO.
- Production — migration runs at low-traffic window; rollback rehearsed in staging same week; monitoring watched for 48 hours.
A team that skips environments to ship faster is a team that pays for the skip in incidents. The cost of staging is a few hours; the cost of an outage that staging would have caught is days.
Related crafts
- Developer Experience (DX) — what dev should feel like
- CI/CD Pipeline — what runs across environments
- Rollback Discipline — the path back