part one · the first 48 hours

The First 48 Hours

The period between the flag being enabled and the first honest picture. What to watch, what to act on, and what to let settle.

Events in this phase

On-call rotation is active for the 48 hours after the flag enables — that was a release-gate condition. Watching is loose during the first hour, sharper after, then settles into normal-cadence dashboard checks. No new ceremony — this is a heightened state of normal flow.

This is where meaning meets the world for the first time. The flag is enabled. The feature is live. Volume IV's machinery is now the watching apparatus — runbooks armed, SLOs baselined, prediction "before" numbers captured.

The first 48 hours are the period when the team has the most attention and the least data. The instinct is to act on every signal. The discipline is not about reacting fast. It is about not reacting incorrectly. Acting early is not a sign of control. Acting correctly is. Knowing which signals warrant action and which need time to stabilise is the difference between a team that contains problems and a team that creates new ones.

What to watch

The monitoring dashboards are the primary source of truth. Not support tickets — dashboards. Support tickets lag reality by hours. The specific SLIs defined in the ADRs are what the team watches: error rates, latency percentiles, queue depths. The SLO thresholds trigger action. The leading signals from Volume IV — adoption, completion, error encounter rate — tell the early story.

The first hour is the noisiest. People click things in unexpected orders, submit forms twice, navigate away mid-flow. Some of this produces errors that are not bugs — they are the normal shape of first contact. The question is not whether errors are occurring. It is whether the error rate is above the SLO threshold and trending up.

When to act

Three conditions warrant immediate action: SLO threshold crossed for more than 5 minutes — open the runbook, start from step one. Any data integrity concern — disable the flag immediately, investigate in staging. Any security-relevant behaviour — disable the flag, full stop. Everything else is logged, prioritised using the bug taxonomy, and addressed in normal flow.

By hour 48, the noisy first-contact patterns have settled. The team has a first honest picture — not the prediction check yet, but the data the prediction check will draw from.

Resolution gate — 48 hours to signal check

Enough to know the feature is live and stable.

Dashboards are within SLO. No P0 incidents open. The "before" baseline is captured. Early usage patterns are visible.

Part 2 — Signal and the Prediction →

✦ Why We Build

◐ Before We Build

◑ What We Shape

● As We Build

◔ After We Build

◕ Did We Serve?

The First 48 Hours

What to watch

When to act

Enough to know the feature is live and stable.

The First 48 Hours ​

What to watch ​

When to act ​

Enough to know the feature is live and stable. ​

The First 48 Hours

What to watch

When to act

Enough to know the feature is live and stable.