Skip to content

Escalation

Escalation is information flow, not blame flow. When you escalate, you say: "This problem has exceeded my ability to resolve it within the expected timeframe." Three principles. One rule.

Owners: On-call, Leadership, Tech Lead Phase it lives in: After We Build (Volume V) The corpus principle this enacts: Trace to levels, never to people.

Where it lives in the chain

The three principles

  1. Speed over perfection. A 90%-accurate message in 5 minutes beats a perfect one in 30.
  2. Over-escalate, then stand down. De-escalating a false alarm is always cheaper than under-escalating a real incident.
  3. Escalation is a skill. Teams must practice it, not discover it under pressure.

The no-surprises rule

Leadership should never learn about a problem from a customer. If a customer knows before your manager does, the escalation process has failed.

This doesn't mean every bug is escalated — it means any issue with customer-visible impact is communicated upward before customers start calling.

The matrix

SeverityWho hears in 15 minWar roomStatus pagePostmortem
P0Leadership + on-call30 minAutomatic24h
P1Eng manager + PM (1h)If unresolved at 2hManual48h
P2PM (4h)Internal noteOptional
P3Team lead at next standup

How to escalate well

The escalation message has four parts:

  1. What is happening"P1 incident: grading submission failures, 200+ users affected since 09:42."
  2. What we are doing"Flag disabled at 09:50, investigating root cause."
  3. What we need"None — will update in 30 minutes. If unresolved at 11:00 will escalate to P0."
  4. Who is the commander"I am commanding; @Esti investigating; @Maya on comms."

That is escalation. "Things might be broken, looking into it" is not.

200apps · How We Work · NWIRE