Skip to content

Infrastructure as Code

Environment provisioning that is reproducible, reviewable, and reversible. Cloud resources defined in code (Terraform, Pulumi, CloudFormation), committed to source control, applied through the pipeline. Configuration changes are code changes — the JWT outage's lesson.

Owners: DevOps, Tech Lead Phase it lives in: How We Build (Volume IV) The corpus principle this enacts: Configuration changes are code changes — same pipeline, same gates, same review.

Where it lives in the chain

How to do this

The discipline is identical to application code:

  • All infrastructure in source control. Networks, databases, secrets stores, queues, CDN config, gateway policies. The JWT outage was a gateway policy change that bypassed the pipeline.
  • Reviewed by PR. Two reviewers for production-touching changes; one for staging-only.
  • Applied through CI/CD. No manual terraform apply from a laptop. Manual apply is shadow deployment.
  • State stored centrally. Locked during apply. Never terraform init in production from a developer machine.
  • Plan before apply — the plan output is part of the PR. Review reads the plan, not just the source.
  • Rollback rehearsed. Reverting an IaC change is itself a PR; rehearse it in staging before relying on it.

What good practice looks like

A team adding a new region's infrastructure does it as PRs:

  1. PR 1 — defines the new region's network, DNS, certificates. Plan reviewed. Applied to staging-equivalent for verification.
  2. PR 2 — defines the data-plane resources (database, queues). Migration plan referenced. Applied to staging.
  3. PR 3 — connects the region to production traffic at 1%. Watched. Then 5%, then 25%.

A team that clicks through the cloud console ends up with infrastructure no document describes, no PR records, and no rollback path. The blast radius of a misclick is the same as the blast radius of a misdeployment — but without the audit trail to recover.

200apps · How We Work · NWIRE