Skip to content

Feature Flag Platform

Runtime toggles, targeting rules, audit trail, SDK. Default-off when the platform is unreachable — silent reachability failure must not silently enable behaviour.

Owners: Tech Lead, DevOps Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: Rollback possible is not. A confirmed time is.

Where it lives in the chain

What the platform provides

  • Runtime toggle — flip a flag and have all consumers respond within seconds. Not minutes. Not after restart.
  • Targeting — by user, segment, percentage, environment. "On for 10% of users in pilot region; everyone else default-off."
  • Audit trail — every flag change recorded with who, when, why. The postmortem reads the audit when an incident traces to a flag flip.
  • SDK — typed, with sensible defaults. Default-off when the SDK can't reach the platform. The wallet bug would have shipped a lot worse if the flag was default-on under reachability failure.
  • Approval gates for high-risk flags — production-on requires two approvers; staging-on requires one.

How to do this

  • Choose vs build — buy unless the platform itself is your product. Building flag infrastructure is a known sinkhole that pays back at scale most teams never reach.
  • One source of truth — the flag definition lives in source control alongside the code that reads it. The platform's UI mirrors the file; the file is authoritative. Diff-on-deploy catches drift.
  • Targeting rules are versionednot edited in the UI without a record. Production targeting changed in the UI without a PR is shadow deployment.

What good practice looks like

A team uses the flag platform for gradual rollout — 1% → 5% → 25% → 100% — watching the dashboards between each step. The first 1% is the most expensive observation the team will ever make; it surfaces issues at low blast radius. The platform makes that observation cheap and reversible.

The wallet bug shipped to staging behind a flag. That is what flags are for. The JWT outage shipped to production simultaneously across all environments because the change was not flag-wrapped. The lesson: anything that can lock out users deserves the flag.

200apps · How We Work · NWIRE