Post-Release & Learning · master area
Escalation
Escalation is information flow, not blame flow. When you escalate, you say: "This problem has exceeded my ability to resolve it within the expected timeframe." Three principles. One rule.
Owners: On-call, Leadership, Tech Lead Phase it lives in: After We Build (Volume V) The corpus principle this enacts: Trace to levels, never to people.
Where it lives in the chain
The three principles
- Speed over perfection. A 90%-accurate message in 5 minutes beats a perfect one in 30.
- Over-escalate, then stand down. De-escalating a false alarm is always cheaper than under-escalating a real incident.
- Escalation is a skill. Teams must practice it, not discover it under pressure.
The no-surprises rule
Leadership should never learn about a problem from a customer. If a customer knows before your manager does, the escalation process has failed.
This doesn't mean every bug is escalated — it means any issue with customer-visible impact is communicated upward before customers start calling.
The matrix
| Severity | Who hears in 15 min | War room | Status page | Postmortem |
|---|---|---|---|---|
| P0 | Leadership + on-call | 30 min | Automatic | 24h |
| P1 | Eng manager + PM (1h) | If unresolved at 2h | Manual | 48h |
| P2 | PM (4h) | — | Internal note | Optional |
| P3 | Team lead at next standup | — | — | — |
How to escalate well
The escalation message has four parts:
- What is happening — "P1 incident: grading submission failures, 200+ users affected since 09:42."
- What we are doing — "Flag disabled at 09:50, investigating root cause."
- What we need — "None — will update in 30 minutes. If unresolved at 11:00 will escalate to P0."
- Who is the commander — "I am commanding; @Esti investigating; @Maya on comms."
That is escalation. "Things might be broken, looking into it" is not.
Related crafts
- Incident Management — the larger flow
- De-escalation — as important as escalation