--- url: /volumes.md --- 200apps · how we work # The Five Volumes Five volumes describe the chain end to end. Each volume is a phase. Each phase produces the artifact the next phase needs, and inherits the artifact the previous phase produced. Skipping a volume does not save time — it pushes the missing work into the volume that comes after, where it is more expensive to repair. Volume I declares the change the organization exists to make. Volume II witnesses the problem and predicts what will change. Volume III turns the prediction into Epics, stories, and scenarios. Volume IV runs the prediction through code into production. Volume V checks what reality answers, names the gap, updates the model — and the next cycle inherits a sharper version of the understanding. ## Volume I — Strategy & Direction The work before the work begins. Vision becomes goals. Goals become initiatives. Initiatives are valued, financially translated, and held in a portfolio that can be funded, continued, or killed. The volume that makes the rest of the chain answerable to something. [Open Volume I →](/volumes/i-strategy/) ## Volume II — Discovery & Brief Understanding the problem before solving it. Observation, not interview. Persons and moments, not personas and tasks. Journey mapping. Assumption surfacing. Three brief artifacts — Initiative, Feature, Technical Design — each one a written prediction with a check date. [Open Volume II →](/volumes/ii-discovery/) ## Volume III — Scope & Shape Translating prediction into shape. Epics named for activities. Story mapping. The walking skeleton — smallest end-to-end release that changes the situation. ADRs, sequence diagrams, schema, API contracts, ilities. Amigos sessions producing Gherkin scenarios trios can defend. [Open Volume III →](/volumes/iii-scope/) ## Volume IV — Execution The prediction goes live. Domain language survives the trip from brief to code. Trunk-based flow. Conventional commits linked to stories. Feature flags wrap new behavior so rollback is one switch. The pipeline catches a different chain level at each stage. The release gate gives the prediction the conditions it needs to be checked. [Open Volume IV →](/volumes/iv-execution/) ## Volume V — After We Build The loop closing. The first 48 hours. Running the check. The four outcomes — only one of which has no value. Bug taxonomy. Postmortems that produce structural changes. The retrospective. The model update — the step most teams skip. The ongoing relationship. The team. The portfolio. Adoption. [Open Volume V →](/volumes/v-after-we-build/) *** ## Reading order Linear, if you have time. The volumes are written to be read in sequence — each one assumes you have done the work of the previous one. If you read Volume IV without II, you will write code for a problem that was never witnessed. Non-linear, if you are looking for a craft. Use the [Master Areas index](/areas/) — every craft is mapped to the volume that addresses it. --- --- url: /volumes/i-strategy.md --- # Strategy & Direction > *The work before the work begins. Vision becomes goals, goals become initiatives, initiatives are valued, financially translated, and held in a portfolio that can be funded, continued, or killed.* This volume describes the **Direction** phase — what happens before any cycle begins. The vision the organisation exists to make true. The goals that translate the vision into bets. The initiatives that name the gaps between current state and goal. The portfolio that decides which initiatives are funded, which continue, which stop. The financial translation that holds the chain answerable to value, not effort. [Continue to the introduction →](/volumes/i-strategy/intro) --- --- url: /volumes/i-strategy/intro.md --- strategy & direction — volume I # Introduction Every chain that produces something useful was started by someone who wrote down what change they intended. Not a task. Not a feature. A change in the world the organisation exists to make true. This is the volume that holds that work. It is the shortest of the five — direction is more about clarity than about practice — but the cost of skipping it shows up in every later volume. A team that never wrote down its vision still has one. It is just a different vision per person, and the chain that runs underneath has no way to converge on what it is for. *Direction is the artifact set that makes every later cycle answerable to something. Without it, the chain runs perfectly — toward whatever it happens to be running toward this week.* ## What this volume covers Seven parts. The first three name the change. The last four make it operational — owned, valued, translated into money, held in a portfolio. * **Vision & Mission** — declaring the change the organisation exists to make. * **Goals & Objectives** — translating vision into measurable bets. * **Initiative Identification** — naming the gap between current state and goal. * **Value Declaration** — estimating the worth of the intended change. * **Financial Translation** — VRI, rework multiplier, discovery billing. * **Client Relationship Strategy** — trust, cadence, renewal, expansion. * **Portfolio Direction** — which initiatives to fund, continue, or kill. In the operational framework, this volume is the source. Every Discovery brief in Volume II eventually traces to an initiative named here. Every model update in Volume V eventually feeds back to a goal that was set, met, missed, or changed. ## Voice This volume is written in the same voice as the others — person-first, witnessed-not-described, direction-as-artifact-not-feeling. A vision statement that does not name a change in someone's life is not a vision; it is a slogan. A goal that has no measurement is not a goal; it is a wish. The discipline is the same: what is the artifact, who owns it, what does it predict, when is it checked. [Part 1 — Vision & Mission →](/volumes/i-strategy/1-vision-mission) --- --- url: /volumes/i-strategy/1-vision-mission.md --- part one · vision & mission # Vision & Mission > *Declaring the change the organisation exists to make.* A vision is a description of a world that does not yet exist, written by people who intend to make it exist. It is not a slogan. It is not a tagline. It is the answer to *what is different in someone's life because we built this*. ## The shape of a vision that holds Three properties. 1. **It names a person.** Not a market, not a segment. A named person — Dina, Miri, Avi — whose life the organisation intends to change. 2. **It names the change.** Specific. *Dina's first hour of work is no longer spent on tasks she has already done four times this week* is a vision. *Best-in-class productivity tools* is a slogan. 3. **It makes a falsifiable claim.** A vision that cannot be wrong cannot guide. *We will be the leader in our space* cannot guide a Tuesday decision. *Dina will spend her first hour on the thing she came in to do* can. ## Mission vs vision The vision is the world to come. The mission is what the organisation does to bring it about. Mission is verbs, vision is nouns. * **Vision** — *Field service technicians arrive at jobs already knowing what they need.* * **Mission** — *We build the system that puts the right information in the technician's pocket before they knock on the door.* A team can hold either without the other for a while. Holding both, written down, named, gives the chain something to converge to. ## How to know it's working The vision is working when: * Two people on different teams, asked the same question independently, would answer the same way. * A new hire, in their first week, can say what the organisation is for in one sentence and the sentence is the same as the founder's. * A scope decision in Volume III references the vision and the reference is not contrived. * A kill decision in Volume V references the vision and the reference is not contrived. The vision is *not* working when: * It appears only in the deck. * Different teams have different working understandings of who the person is. * Strategy decisions are explained without reference to it. * Wins and losses are interpreted without it. ## Where the vision lives Two places. The mission and vision live at the top of the corpus — visible to anyone who reads the chain. They also live in the founder's onboarding conversation, where every new person is asked to repeat them and is corrected if they get it wrong. The artifact and the conversation reinforce each other. A vision that lives only in a deck is a vision that does not survive the deck. ## What this produces for the rest of the chain | Volume | What it inherits from the vision | |---|---| | II | The person whose life the chain is for. Discovery starts from this person, not from a segment. | | III | The constraint that scope decisions are checked against — *does this slice move the person closer to the change?* | | IV | The domain language. The named person and the named change are reflected in the code, the API, the analytics events. | | V | The check question. *Did the cycle move the world closer to the vision?* — answered, not implied. | [Part 2 — Goals & Objectives →](/volumes/i-strategy/2-goals-objectives) --- --- url: /volumes/i-strategy/2-goals-objectives.md --- part two · goals & objectives # Goals & Objectives > *Translating vision into measurable bets.* A goal is a measurable bet about what would be true if the vision were even partly real. A goal that is not measurable is not a goal. A goal that has no time horizon is not a goal. A goal that has no person whose life would change is not a goal — it is a metric. ## Three properties 1. **Time-bound.** *In twelve months* — and a date the leadership commits to revisiting. 2. **Measurable.** A specific signal, defined in the same place the goal is written. *Active grading sessions per week, of any length, on the new flow* is measurable. *Improved grading experience* is not. 3. **Anchored to a named person.** Goals that don't trace to someone whose life is changing decay into vanity metrics within two cycles. ## Goals vs objectives vs predictions Three levels of resolution. They nest. | Level | Time | Owner | Example | |---|---|---|---| | **Goal** | 12 months | Leadership | *Within 12 months, 80% of grading is done in under 15 minutes.* | | **Objective** | 1 quarter | PO | *Q3: ship the new grading flow to 30% of graders, hit <20-min median.* | | **Prediction** | 1 cycle | PO + Trio | *Gal completes the grading cycle in under 15 minutes — checked 2026-06-01.* | Each is the next-most-granular check on the one above. A prediction is the chain's smallest claim. A goal is the chain's largest. They are connected by *what would have to be true at each level for the next one up to be true*. ## How few goals As few as possible. A team with twelve goals has no goals. A team with three goals has three. A team with one goal that is being honestly held has the cleanest chain. The corpus default: **three goals at the leadership level, no more than six objectives across the team at any time.** ## How goals fail Three common shapes. 1. **The goal becomes a target.** Goodhart's Law. The team starts gaming the measurement. The fix is structural — the goal should be the leading signal, not the lagging one. Use *grading sessions completed in under 15 minutes* (causal) rather than *user satisfaction score with grading* (post-hoc). 2. **The goal stops being read.** It was set in January and no one has looked at it since. Quarterly portfolio review (Volume V Part 9) is the answer. 3. **The goal is not connected to a person.** *Increase MRR by 12%* is real, but it is not a goal — it is a constraint. Treat it as such, in service of a goal that names a person. ## Tracing a goal to the work A goal is connected to the chain when: * Each objective traces to one or more initiatives (Part 3). * Each initiative has at least one Volume II brief. * Each brief has at least one prediction. * Each prediction is checked. * The cycle's check feeds the model update — and through the model update, eventually into the next goal-setting conversation. A goal not connected like this is decorative. ## What this produces for the rest of the chain | Volume | What it inherits | |---|---| | II | The acceptance test for whether an initiative is worth pursuing — does it move a goal? | | III | The slicing constraint — does this slice move an objective by a measurable amount? | | IV | The leading signals worth instrumenting. | | V | The lagging signal that is read against the goal at the portfolio review. | [Part 3 — Initiative Identification →](/volumes/i-strategy/3-initiative-identification) --- --- url: /volumes/i-strategy/3-initiative-identification.md --- part three · initiative identification # Initiative Identification > *Naming the gap between current state and goal.* An initiative is the named gap between where the people the chain is for are now, and where the goal says they should be. It is not a feature. It is not a project. It is a piece of work big enough to require Discovery — and contained enough to fit within a quarter or two of focused effort. ## When to name an initiative An initiative is the right shape when: * The change required to close the gap is unfamiliar enough that the team cannot name the feature without first running Discovery. * The work is plural — two or three Volume II briefs will live inside it before it ships. * A decision to fund this initiative *means* a decision not to fund another one of comparable size. If the work is small enough that one brief can hold it, it is a feature, not an initiative. If the work is large enough that the team can't see the scope after one quarter of Discovery, it is a portfolio bet, not an initiative — and the leadership needs to decide whether to invest in initiative-level Discovery first. ## The initiative naming conversation Held by the PO with leadership, anchored to a goal. Three questions, in order: 1. **Who is the named person, and what is their current state?** *Gal, an exam grader at our largest customer, currently spends 47 minutes per grading cycle and grades approximately 60 cycles per week.* 2. **What is the goal, and what does it mean for this person?** *The 12-month goal is that 80% of grading is done in under 15 minutes. For Gal, that means 32 minutes saved per cycle, ~32 hours per week.* 3. **What is the gap?** *We do not know what produces the 47 minutes. We have hypotheses but no observation. We don't know which subset of those minutes is removable.* The third answer is the initiative brief. ## What a good initiative brief contains This is a Volume II artifact in shape, but it is *triggered* by Volume I. The PO drafts; leadership signs off. * **Business gap.** What would be true financially if this initiative succeeded? Not a precise number — a credible range. * **Human gap.** What would be true in the named person's life? Witnessed-not-described, where possible. *Gal would leave work at the time her contract says she leaves.* * **Discovery questions.** What does the team need to learn before it can scope? Listed. * **Value declaration (V).** A range. *Worth around X if it works.* * **Financial translation.** What V means in terms of investment ceiling, rework tolerance, and discovery billing. ## Picking which initiatives A leadership decision, taken with the PO present. The corpus does not prescribe a scoring framework — every project has slightly different math. It does prescribe two questions: 1. **If we had to choose only one initiative this quarter, which one?** Forces a real ranking. 2. **What would we have to be willing to walk away from to fund this one?** Names the cost. If the answer to either question is *we don't know*, the initiative is not ready for the portfolio. ## Gaps to discovery The transition from this part to Volume II is the most common place for the chain to fail. The pattern: an initiative is named, leadership commits, and someone — usually a developer — starts solving it before Discovery has run. The brief that should have come first becomes a postmortem six months later. The corpus pattern: **an initiative does not enter execution until at least one Volume II brief inside it has been written, predicted, and signed off.** This is a Volume I rule, enforced at portfolio level. Without it, every later volume's discipline is performative. ## What this produces for the rest of the chain | Volume | What it inherits | |---|---| | II | The initiative-level questions Discovery must answer before scope can begin. | | III | The scope envelope — what is in this initiative, what is the next one. | | IV | The execution rationale — why we built this, traceable upward. | | V | The portfolio's check — did this initiative do what we said it would? | [Part 4 — Value Declaration →](/volumes/i-strategy/4-value-declaration) --- --- url: /volumes/i-strategy/4-value-declaration.md --- part four · value declaration # Value Declaration > *Estimating the worth of the intended change — before the cycle runs, in writing.* Value declaration is the act of writing down, in advance, what the team believes this initiative is worth if it works. It is not a forecast. It is not a guarantee. It is a number — usually a range — recorded so that the chain can later check whether the change produced what was promised. A team that does not declare value cannot read its VRI (Volume V Part 9). A team that cannot read its VRI cannot make portfolio decisions. The declaration is the spine of the financial chain. ## Three ways to declare V Pick the form that fits the initiative. | Form | When | Example | |---|---|---| | **Time saved** | Internal-tool, productivity, workflow initiatives | *32 hours per Gal per week, ~$X/year per grader, ~$Y/year across the customer's team* | | **Revenue named** | Sales, conversion, retention, expansion | *5% lift in renewal rate translates to ~$X over 12 months at current ACV* | | **Cost avoided** | Reliability, compliance, risk reduction | *Avoiding one hour of P0 downtime per quarter is worth ~$X — current rate is one per six weeks* | The form matters because it determines what the check in Volume V will look like. *Time saved* checks against measured time. *Revenue named* checks against the revenue ledger. *Cost avoided* checks against the avoided-incident count. ## Value as a range A single number invites false precision. The corpus pattern: declare a range, with a most-likely value and the assumptions underneath. ```text Initiative: Grading Flow v2 Person: Gal (and ~120 graders across customer base) Form: Time saved Range: $180k – $720k / year Most-likely: $360k / year Assumptions: (1) Grading is 40% of grader time today. (2) New flow saves at minimum half of that. (3) The customer count holds. (4) Compliance overhead is unchanged. ``` The assumptions are the part that matters. They are what Volume II witnesses. They are what Volume V checks. A V range whose assumptions are not listed is a guess. ## V as the discovery budget V is the ceiling on what it makes sense to spend learning. The corpus pattern: **discovery investment is bounded by V.** A practical rule: spend up to 5% of declared V on Discovery before deciding whether to scope. If after that the discovery questions are not answered, the initiative goes back to the portfolio for a re-decision — *do we invest more in learning, or do we kill?* This is unusual. Most organisations treat Discovery as overhead. The corpus treats it as the activity that determines whether the rest of the spend will produce value or rework. A 5% discovery budget that prevents 50% of execution rework is the cheapest investment the chain makes. ## V as the kill condition A declared V also names what *not enough* looks like. If by the second cycle's check, the measured value is below 25% of V, the initiative is a candidate for kill review at the next portfolio meeting (Volume V Part 9). The threshold is per-initiative — some initiatives have long curves, some have steep ones. The discipline is that the threshold is named in advance, not invented in the kill conversation. ## V and rework Volume V's VRI uses V as its numerator. To make that math honest, V needs to be: * **Written before the cycle.** Not back-fitted after. * **Read at the cycle check.** Did we deliver in the range? * **Adjusted explicitly.** If V is moved, the move is recorded with rationale. Quiet adjustments destroy the signal. A V that is moved without record is the corpus's version of dishonesty. The chain can survive a wrong V. It cannot survive a hidden one. ## What this produces for the rest of the chain | Volume | What it inherits | |---|---| | II | The discovery budget. The list of assumptions to witness. | | III | The slicing question — *which slice unlocks the largest fraction of V earliest?* | | IV | The cost ceiling — execution that exceeds 0.5×V should trigger a portfolio conversation, not a quiet over-spend. | | V | The check value. The VRI input. The kill threshold. | [Part 5 — Financial Translation →](/volumes/i-strategy/5-financial-translation) --- --- url: /volumes/i-strategy/5-financial-translation.md --- part five · financial translation # Financial Translation > *VRI, rework multiplier, discovery billing — turning the chain into numbers leadership can read.* Financial translation is the layer that turns the chain's artifacts into language the people who fund the work can use. Without it, the chain is opaque to leadership. Leadership stops trusting it. The cycle that produces invisible value gets cut before the value is checked. This part of the volume is short, dry, and load-bearing. ## VRI — Value-to-Rework Index Defined fully in [Volume V Part 9](/volumes/v-after-we-build/9-portfolio). Repeated here because it is what makes value declaration matter. ```text VRI = Σ value(initiatives shipped) / Σ rework(rework cycles needed) ``` The chain's job, financially, is to keep VRI rising. Every other financial metric is downstream of this one. ## Rework multiplier The cost of rework is not the same as the cost of the original work. It is higher — sometimes much higher — because rework happens after meaning has scattered across briefs, code, deploys, support tickets, and conversations. The corpus uses a simple multiplier. It is a heuristic, not a measurement. | Where the rework originates | Multiplier | |---|---| | Strategy gap (wrong bet) | 5–10× | | Discovery gap (problem not witnessed) | 3–5× | | Scope gap (story missing) | 2–3× | | Execution gap (code defect) | 1× | | Operation gap (runbook missing) | 1.5–2× | A defect traced to *Discovery* — a brief that didn't witness — costs three to five times what an equivalent code defect costs to repair. The strategy gap is the most expensive: it can require unwinding a quarter of work. This is why the corpus's earliest gates exist. A weekend of Discovery prevents a quarter of rework. The math is not subtle. The discipline is making the leadership see the math at the point of decision. ## Discovery billing For client work, this is the financial pattern that allows the chain to function. Three principles: 1. **Discovery is billed.** Not bundled. Not free. The client pays for the witnessed problem and the predicted change. If they will not, the project is unsuitable for the chain — and saying so is part of the chain's discipline. 2. **Execution is fixed-scope-fixed-price** *after* Discovery, not before. Scope without Discovery is fiction. 3. **Rework caused by missing Discovery is the supplier's cost. Rework caused by post-Discovery scope changes is the client's cost.** The contract names this. A client relationship that cannot accommodate this structure is one where the chain will keep paying its rework bill out of margin, until margin runs out. ## What leadership wants in two minutes Quarterly. Three numbers. * **VRI** — current and trend. * **Rework distribution** — by chain level. *Where is rework being produced?* * **Open V** — the sum of declared value still being checked, with how many cycles overdue. A team that can produce these in two minutes has a real chain. A team that needs a week to assemble them has a chain in name only. ## Where this volume meets the operating numbers The chain's financial signal lives in three places. * The **brief** has V, in a range, with assumptions. * The **portfolio dashboard** has VRI and rework distribution. * The **client contract** has the discovery billing and the rework attribution. When all three are current, leadership can read the chain. When any of them is missing or stale, leadership reads it through anecdote, and that is the moment trust starts to erode. [Part 6 — Client Relationship Strategy →](/volumes/i-strategy/6-client-relationship) --- --- url: /volumes/i-strategy/6-client-relationship.md --- part six · client relationship strategy # Client Relationship Strategy > *Trust building, cadence, renewal, expansion — the strategic frame for how the chain meets the people who pay for it.* The relationship is the layer above the work. It is what makes the work fundable, repeatable, and renewable. The strategic relationship is set in this volume; the operational relationship runs in [Volume V Part 7](/volumes/v-after-we-build/7-ongoing-relationship). Both volumes describe the same relationship from different altitudes. ## The relationship is an asset It is real, valuable, and depreciating. A client relationship that is not invested in for two quarters is worth less, even if the work has shipped on time. Trust does not hold on inertia. The corpus treats the relationship as a portfolio item. It is reviewed quarterly, the way an initiative is reviewed quarterly. The questions are different but the discipline is the same. ## The four states of a client relationship | State | Signal | What the chain does | |---|---|---| | **Establishing** | First 90 days | Cadence is set, SLA is signed, weekly update begins, first prediction is checked together | | **Operating** | Cycles are running, predictions are checked, retention is steady | Maintain cadence, surface patterns, hold the trio's attention | | **Expanding** | Client raises a new problem space; trust has been earned | Move to Initiative discovery, write an initiative brief, run Volume II | | **Renewing or ending** | Contract approaches term | Quarterly portfolio review with the client; explicit decision on what continues, what changes, what stops | The state is named, not implied. *We are establishing* and *we are renewing* are different conversations and need different artifacts. ## Cadence as strategy Cadence is not administrative. It is what makes the relationship legible. A client who knows when they will hear from the team has fewer reasons to ask, and the asks they do raise are the ones that need attention. The corpus default cadence: * **Weekly written update** — Friday, before 4pm, ~200 words, three sections (shipped / in progress / blocked). * **Bi-weekly sync** — 45 minutes, fixed agenda (signal readings, scope decisions, CS patterns). * **Quarterly portfolio review** — 90 minutes, three artifacts (SLA, VRI, root-cause patterns). The cadence is held even when there is *nothing to say*. *Nothing changed this week* is a useful update. Silence is anxiety. ## Trust building Trust is built by repetition of small, kept commitments. The corpus's pattern is to make every commitment small enough to keep, and to keep all of them. * *We will run the check on this date* — and the check runs on that date, even if the result is uncomfortable. * *We will tell you the prediction outcome by Tuesday* — and the email lands on Tuesday. * *We will surface the support patterns each month* — and the patterns are surfaced. Trust is not built by occasionally-heroic effort. Heroic effort is what the chain produces when the regular effort isn't running. A client whose team relies on heroism does not trust the chain — they trust the people, which is more fragile. ## Renewal and expansion The corpus pattern: renewal is the easiest sale; expansion is the second easiest. Both depend on trust having been built in the operating phase. Both are anchored to artifacts the client has been reading. * **Renewal** — anchored to the SLA review and the VRI trend. *Here is what we said we would do, here is what we did, here is what comes next at the same shape.* * **Expansion** — anchored to the pattern in CS-to-bug pipeline (Volume V Part 7) or to a new initiative the client has surfaced. *We have been seeing this in the support pipeline for six weeks. We believe it is initiative-shaped. Here is the brief we would write.* Neither is a sales conversation. Both are chain conversations. The chain that produces good artifacts produces its own pipeline. ## When to walk away The corpus is opinionated here. A client relationship that cannot accommodate Discovery, will not pay for prediction, and frames witnessed assumptions as scope creep is one where the chain will produce rework as its main output. The right response is to say so plainly and to walk away, ideally before the contract is signed. Walking away is a strategic act. It is documented as a portfolio decision and joins the corpus the same way a kill decision does. Future engagements with similar shape inherit the learning. [Part 7 — Portfolio Direction →](/volumes/i-strategy/7-portfolio-direction) --- --- url: /volumes/i-strategy/7-portfolio-direction.md --- part seven · portfolio direction # Portfolio Direction > *Which initiatives to fund, continue, or kill — the strategic decision the portfolio makes every quarter.* The portfolio is the set of initiatives the organisation is currently funding. This part of the volume describes the strategic side of the portfolio — what funding decisions to make, on what evidence. The operational side — DORA, VRI, kill mechanics — is in [Volume V Part 9](/volumes/v-after-we-build/9-portfolio). The portfolio is not a backlog. A backlog accumulates by default. A portfolio is curated by decision. ## Three states, every initiative At every portfolio review, every initiative is in one of three states. | State | Meaning | What it produces | |---|---|---| | **Fund** | This initiative is open and resourced for the next quarter | Cycle plans, briefs, predictions | | **Continue** | This initiative is in flight and on track; carry forward | Continued cadence, no new commitment | | **Kill** | This initiative will not continue; resources are released | A kill brief, a model update, a portfolio note | There is no fourth state. *Pause* is one of the most common forms of organisational dishonesty — initiatives that are paused indefinitely consume attention without producing decisions. The corpus pattern: paused initiatives are renamed *killed* after one full quarter of inaction, with the option to re-fund as a new initiative if conditions change. ## What the portfolio is for Three jobs. 1. **Concentrate attention.** A team can run a finite number of cycles in flight at once. The portfolio names the limit honestly. 2. **Surface conflicts early.** Two initiatives that depend on the same scarce resource — the senior developer, the design system, the client's review time — are surfaced before the cycle runs, not during. 3. **Hold the chain to the goals.** Every initiative maps upward to a goal. Initiatives that don't are surfaced and either re-justified or killed. A portfolio review that does not produce at least one *kill* or *re-justify* in any given quarter is one where the discipline has decayed. ## How to read an initiative at portfolio level Three artifacts on the table per initiative. * **The latest signal reading** — what reality answered most recently. * **The current V and the assumption status** — what we said we would deliver and what we know about the path. * **The chain-level rework distribution** — where this initiative is producing rework, and at what level. The portfolio reads these and asks the same three questions of every initiative. 1. *Is the prediction holding?* 2. *Is the path still credible?* 3. *Is the goal it serves still the goal we hold?* Three yes-es and the initiative continues. Any no and the conversation is open. ## Concentration limits The corpus default: **no more than half the team's capacity is spent on initiatives whose first signal reading has not yet happened.** This is the chain's anti-overcommitment rule. A team that has only first-cycle initiatives is a team running on hope, not on signal. A second rule: **no initiative should consume more than 40% of the team's quarter without an explicit portfolio acknowledgement.** Concentration is sometimes warranted. It is always named, never accidental. ## Kill culture The hardest cultural fact the chain asks of leadership: **killing is not failure.** It is the chain's most disciplined act. The portfolio that never kills is one that has confused continuation with progress. The corpus produces an artifact for every kill — a kill brief — that names what was learned, why the initiative is stopping, and what the next initiative inherits from this one's learning. The kill is the seed of the next decision, not the end of a story. A team can be measured on its kills the way it is measured on its launches. A team that kills three initiatives in a year, each with a kill brief that joined the corpus, is healthier than a team that has never killed one. ## What this volume produces, in one sentence > *The chain runs because direction was named. Initiatives serve goals; goals serve a vision; the portfolio is curated, not accumulated; killing is part of the discipline; the financial chain is honest.* [Back to the volume cover →](/volumes/i-strategy/) · [Volume II — Discovery & Brief →](/volumes/ii-discovery/) --- --- url: /volumes/ii-discovery.md --- # Discovery & Brief > *Understanding the problem before solving it. Observation, person and moment, journey mapping, assumption surfacing. Three brief artifacts — Initiative, Feature, Technical Design — each one a written prediction with a check date.* This volume describes the **Discovery** phase. The team takes the named initiative from Volume I and learns enough about the world to write a brief that makes a falsifiable claim about what will change. The output is three artifacts — Initiative Brief, Feature Brief, Technical Design Brief — each carrying a prediction, a baseline, a target, a check date, a check method, and an owner. [Continue to the introduction →](/volumes/ii-discovery/intro) --- --- url: /volumes/ii-discovery/intro.md --- discovery & brief — volume II # Introduction Volume I named the initiative. The gap between the named person's life today and the world the goal describes. This volume is the work of crossing that gap from understanding-not-yet-witnessed to understanding-witnessed-enough-to-predict. Discovery is not interview. It is observation, with structured language to record what was seen. Discovery is not survey. Surveys ask people what they did; observation watches what they actually do. The two answers diverge more often than the difference can be ignored. *Discovery is the work the team does to earn the right to predict. The brief is the artifact that holds the prediction. Without one, the rest of the chain is hope.* ## The shape of this volume Nine parts. The first five describe the practice of seeing. The last four describe what gets written down. * **Observation** — going to watch the person work, not asking. * **Person & Moment** — naming who has the problem and when it happens. * **Journey Mapping** — the full activity, with friction marks. * **Assumption Surfacing** — naming what we believe but haven't witnessed. * **The Five Stations** — the discovery walk: Vision & Context, Problem Framing, User Journey & Slices, Solution Options, Decision & Scope. * **Initiative Brief** — business gap, human gap, discovery questions, V. * **Feature Brief** — observation, journey, direction, prediction, sign-off. * **Technical Design Brief** — system-witnessed problem, technical prediction. * **Prediction Writing** — baseline, target, check date, method, owner. The volume's central act is the brief. Everything before the brief is preparation; everything after the brief is execution. A brief that does not predict is not a brief. A brief that predicts but is never checked is a guess that wore a uniform. [Part 1 — Observation →](/volumes/ii-discovery/1-observation) --- --- url: /volumes/ii-discovery/1-observation.md --- part one · observation # Observation > *Going to watch the person work, not asking.* Observation is the act of being in the room while the person does the activity. Not in a focus group. Not in an interview room. Not over a screenshare summarising what they would normally do. In the room, in real time, while the activity happens. This is the discipline most teams skip and most teams pay for. The reason is simple: observation is uncomfortable. The person being watched is uncomfortable. The watcher feels intrusive. The schedule is harder to arrange than a 30-minute call. So teams substitute interview, and a brief is built on what people *say* they do — which is reliably different from what they actually do. ## Three kinds of input, ranked | Rank | Method | What you get | What you miss | |---|---|---|---| | 1 | Observation in the field | The actual activity, the actual environment, the actual interruptions, the actual workarounds | Things they only do once a quarter | | 2 | Recorded session of the activity | The activity, the workarounds, much of the friction | Environmental context, interruptions | | 3 | Structured interview about the activity | A described version of the activity | Workarounds the person has stopped noticing, environmental friction, real timing | | 4 | Survey | A summary of summaries | Almost everything | The corpus pattern: observe first. Use interview to fill in details. Use survey only to quantify a pattern that observation has already named. ## What you watch for Three classes of signal. 1. **Workarounds.** The places where the person has built their own bridge over a gap the system left. Workarounds are the most valuable Discovery signal — they are evidence of pain that has already been measured by the person paying it. 2. **Friction.** The pauses, the re-entries, the *let me find that*, the alt-tabs, the post-its. Friction is rarely complained about because the person has internalised it. 3. **Domain language.** The actual words the person uses for the things in their work. They are almost never the words on the screen. The note-taking method matters less than the discipline of writing what was *seen*, not what was *concluded*. ## The interview that earns its place Interviews follow observation, not precede it. They are not for asking *what is hard about your job*. They are for asking *what was that thing you did at 09:14, and why?* A good post-observation interview is short. Half an hour. Five questions, all anchored to specific moments the watcher saw. * *I noticed at 09:14 you opened the spreadsheet, edited a name, and closed it. What was that?* * *At 09:31 you went to the wall calendar before answering a question. What were you checking?* * *At 09:48 you switched to the second monitor for about thirty seconds. What were you reading?* Specific moments, named in time. The person's answer is the most honest answer the chain will ever get from them, because it is anchored to something concrete and recent. ## When observation is impossible Sometimes it is. Regulated environments, sensitive client contexts, fully remote-async workflows. The corpus's fallback, in order: 1. **Recorded session** with the person's permission. 2. **Reconstructed session** — they walk through their actual work using their actual tools while screensharing, narrating in real time. Worse than observation because the narration changes the timing, but better than interview alone. 3. **Pair-and-shadow** — a colleague of the person sits with them and reports back. Adds noise but holds the *witnessed* discipline more honestly than interview. What does *not* substitute: a survey, a stakeholder description of the person's work, or anyone else's interpretation of the person's life. ## What gets recorded The output of observation is a small artifact: the **observation note**. Lives in the project space, attached to the initiative brief. ```text Observed: Gal, exam grader, Wed 2026-04-22, 08:50–10:30 Activity: Grading a batch of five Computer Science final exams Watcher: Alex (PO), Maya (Designer) Timestamps: 08:53 Opens the LMS, navigates: Courses → CS101 → Submissions → Filter unread. Two clicks were wrong direction. (Friction: nav.) 08:57 Pulls up the rubric on second monitor. Switches between the student PDF and the rubric repeatedly. 09:14 Edits a student's display name in a spreadsheet, then returns. (Workaround: LMS does not support Hebrew names properly.) ... Domain language used by Gal (verbatim): "the answers" (she means student submissions) "the boxes" (she means rubric criteria) "send back" (she means require resubmission, not return for editing) Stopped noticing: Gal stops looking at the timer in the LMS after the first 10 min. She has internalised that it is not accurate. ``` The note is not a polished document. It is raw. It is what the brief is later built on. [Part 2 — Person & Moment →](/volumes/ii-discovery/2-person-moment) --- --- url: /volumes/ii-discovery/2-person-moment.md --- part two · person & moment # Person & Moment > *Naming who has the problem and when it happens.* The chain is built on two named things — a person and a moment. Both are specific. Both are the team's first commitment to honesty. ## The person Not *the user*. Never *the user*. A named individual. * Dina, 34, secondary-school teacher in Tel Aviv, fluent in Hebrew and English, six classes a week. * Miri, customer service lead at a regional water utility, twelve direct reports, in the role for nine years. * Avi, a field service technician with four jobs a day across Beersheba and Ashdod, never in the office. The name is not decoration. The name is the constraint. *We are building this for Dina* leads to a different decision than *we are building this for teachers*. The first names a real person whose life can be changed. The second names a market — and a market cannot be observed. The corpus rule: **every brief begins with a named person whose life will change.** If the team cannot name the person, the team has not done Discovery. Substituting a persona — *Teacher Tina, age 30–45, time-pressured* — is not the same. A persona is an aggregate. The chain is built around an individual. ## Aggregating without losing the person Personas have a place. They are a useful summary across many named people. But they are downstream of named people, not upstream. The corpus pattern: 1. Observe several named people doing the same activity. 2. Write each of their stories in their own words. 3. Surface the patterns *across* the named stories. 4. Build the persona, with a footnote pointing back at the named people. A persona that does not point back at named people will drift. Six months later, the persona has different concerns than the people had when it was first written, and no one can tell why. ## The moment The specific point in the activity where the friction or failure occurs. Not the average experience — the actual one, with timing. * Dina's moment: Sunday morning, 7:15am, the first cup of coffee, opening the LMS to plan the week. The system loads, three notifications appear, none of them are about the lesson she is preparing. * Miri's moment: Monday 11:00am, on a call with an angry customer, trying to find the previous case notes while keeping the customer talking, in three different systems. * Avi's moment: 7:45am, in the van, twenty minutes from the first job, looking at the work order on his phone, missing the part the customer described. The moment makes the brief writable. *Dina struggles with planning* is unwritable. *Sunday morning, 7:15am, opening the LMS, the first thing she sees is not what she came in for* is writable. The Discovery moment is what the brief's prediction is anchored to. ## The pattern of moments Most activities have several friction points. The first Discovery pass surfaces the most painful one — the one the person remembers without being asked. The second pass surfaces the others — the ones they have stopped noticing. The corpus pattern: pick one moment to anchor the brief, and list the others in the assumption space. The brief is about the chosen moment. The future cycles will return to the others. ## How person and moment relate to scope The person sets the constraint on *who*. The moment sets the constraint on *when*. Together they bound *what*. A feature that improves Dina's Sunday morning planning is a different feature than one that improves Dina's mid-week emergency lesson swap. The first lives inside one cycle. The second lives inside another. Trying to do both at once produces a feature that does neither well. ## Multiple people, multiple moments Real activities involve more than one person. The grading flow has Gal (the grader) but also the student, the head of department, and the IT admin. Each is a person. Each has moments. The corpus pattern: the brief names *one primary person and one primary moment* and lists the secondary persons and moments at the foot. Scope is set against the primary. The secondary persons are remembered, not centred. They become the centre of their own future briefs. [Part 3 — Journey Mapping →](/volumes/ii-discovery/3-journey-mapping) --- --- url: /volumes/ii-discovery/3-journey-mapping.md --- part three · journey mapping # Journey Mapping > *The full activity with friction marks.* A journey map is the team's drawing of the named person's activity, end to end, with friction marked where it actually occurs. It is the artifact that the brief sits inside. Stories in Volume III refer to journey steps by number — *J3*, *J4* — so the map needs to be authoritative. ## What a journey map is, and isn't It is: * **Sequential.** Step 1 to step n, in the order the activity happens. * **Annotated.** Each step has the person's domain language, the system the activity touches, and where friction occurs. * **Numbered.** J1, J2, J3 — referenced from the brief, the stories, the analytics events. * **Owned by Discovery.** Updated when the team learns more about the activity. Not a one-shot deliverable. It isn't: * A user flow inside a particular product. That comes later, in Volume III. * A wireframe. Wireframes show the *interface* of a step, not the step. * A funnel or analytics dashboard. Those measure aspects of a journey but cannot describe the lived activity. ## How to draw the first version Sit with the observation notes (Part 1). List every distinct action the named person took, in time order. Group consecutive same-tool actions together as one step. Mark friction. A first journey for Gal's grading activity: ```text J1. Gal arrives at her desk. (08:50) J2. Opens LMS, navigates to today's grading queue. (08:53) ⚠ nav-friction J3. Pulls up the rubric on second monitor. (08:55) J4. Opens the first student submission. (08:57) J5. Reads the answer; references rubric several times. (08:58) ⚠ context-switch J6. Edits student's display name in spreadsheet. (09:14) ⚠ workaround: Hebrew names J7. Selects rubric scores; types brief comments. (09:18) J8. Submits the grade. Returns to queue. (09:22) J9. Repeats J4–J8 for next four students. (09:23–10:05) J10. Closes the LMS, drinks water, switches activity. (10:30) ``` The friction marks are the brief-writable points. Each one is a candidate for a Volume II prediction. ## Levels of detail A journey can be drawn at three resolutions. Pick the one the brief needs. | Resolution | Steps | When | |---|---|---| | **Activity-level** | 5–10 | Initiative briefs, where the gap is at activity scale | | **Task-level** | 15–30 | Feature briefs, where the gap is inside one activity | | **Action-level** | 50+ | Specialised work — accessibility audit, performance hot-path, security review | A team that reaches for action-level too early drowns in detail. A team that stays at activity-level too long writes briefs that miss the actual friction. ## Marking friction honestly Friction is not always inconvenience. The corpus uses three friction labels. * **Cognitive** — the person had to hold something in their head that the system could have remembered. * **Mechanical** — extra clicks, extra navigations, manual data shuffling. * **Domain-mismatch** — the system's language or model differs from the person's. The person paid the cost of translation. Each label points to a different kind of fix. The brief that names *cognitive friction at J5* leads to a different feature than the one that names *mechanical friction at J5*. ## The map and the brief The brief uses the map as its anchor. The brief's prediction names the journey step that will change. ```text Initiative: Grading Flow v2 Person: Gal (and ~120 graders) Map step: J6 (display-name workaround for Hebrew names) Friction: Domain-mismatch + mechanical Prediction: With native Hebrew name support in the LMS, J6 disappears. Time saved per cycle: ~3 minutes. Cumulative across the cohort: ~6 hours/week. Check date: 2026-06-01 Check method: Observation of three named graders post-deploy. Owner: Alex (PO) ``` The brief is short. The map is the context that makes it short. ## Updating the map Every model update (Volume V Part 6) is an opportunity to update the map. New friction surfaced by the cycle goes onto the map as a new annotation. Resolved friction is struck through, not deleted — the map carries the history of the activity. A team whose map is current can write briefs faster, because the act of *finding the moment* has already been done. [Part 4 — Assumption Surfacing →](/volumes/ii-discovery/4-assumption-surfacing) --- --- url: /volumes/ii-discovery/4-assumption-surfacing.md --- part four · assumption surfacing # Assumption Surfacing > *Naming what we believe but haven't witnessed.* An assumption is a claim the team is making about the world that has not been witnessed in field observation. Discovery's job is to separate the claims from the witnessing — to know which is which, and to hold the unwitnessed claims as honestly named uncertainty rather than as silent confidence. ## Why this is a separate practice Most teams have assumptions. Few teams write them down. The ones that don't write them down do not have fewer assumptions — they have invisible ones, which the chain cannot test or update. Assumption surfacing is the deliberate act of saying: *here is what we currently believe, and we have not yet seen it, and we are not going to act as if we have until we have*. ## Three states Every claim sits in one of three states. | State | Meaning | What happens next | |---|---|---| | **Witnessed** | We have observed this directly | Brief can rely on it | | **Inferred** | We have evidence but not direct observation | Cite the evidence in the brief; flag for cycle check | | **Not witnessed** | We believe this but have not seen it | List explicitly in the brief's assumption space | A brief whose claims are all in the *witnessed* column is a strong brief. A brief that mostly contains *not witnessed* claims is a Discovery brief that has not yet finished Discovery — the work of moving claims from *not witnessed* to *witnessed* is what's left. ## Writing assumptions in the brief A brief carries its assumptions in a labelled section. They sit there visibly, not hidden in the prose. ```text Brief: Grading Flow v2 (excerpt) Assumptions (witnessed): • Graders use a second monitor for the rubric. (Observed in 6/6 sessions) • Hebrew names cause display issues. (Observed in 4/6 sessions) • Graders work in batches of 5 with short breaks. (Observed in 5/6 sessions) Assumptions (inferred): • Graders prefer keyboard shortcuts over mouse nav. (User comments in 3/6 sessions; not directly observed) • The 47-min mean cycle time will fall to <15 min. (Modelled, not observed yet) Assumptions (not witnessed): • Graders will adopt the new flow without training. (We have not tried it) • Hebrew name fix will hold under all unicode forms. (We have not tested non-grader names — student names — in production load) • The IT admin can roll the change out smoothly. (We have not talked to the admin) ``` Each not-witnessed assumption is a candidate for a discovery question or a cycle check. ## The cycle's job, against the assumption list A cycle either witnesses an assumption (moving it from *not witnessed* to *witnessed*), or it does not. The model update at the end of the cycle (Volume V Part 6) walks the list and marks each one. This is the chain's most reliable form of learning. Assumptions that are confirmed across multiple cycles compound the team's confidence. Assumptions that are contradicted produce sharper briefs. Assumptions that remain *not witnessed* across multiple cycles are flagged — they are usually the assumption hiding the largest risk. ## Why teams resist this Three reasons, all wrong, all common. 1. *Listing assumptions makes us look uncertain.* Listing assumptions makes the team honest. Hiding them makes the team confident in the wrong direction. 2. *We don't know all of them.* You don't have to. List the ones you can name. The retro will surface others. 3. *It feels like overhead.* It is the cheapest insurance the chain produces. A two-minute list at brief time, that the cycle later resolves, is one of the highest-leverage practices in the corpus. ## Assumption space and prediction space A prediction (Part 9) is an assumption about a *change*. An assumption is a claim about a *state*. They are kept separate because their checks are different. * A prediction is checked against measurement: *did the time fall to under 15 minutes?* * An assumption is checked against observation: *did the IT admin actually roll the change out smoothly?* Both are checked. Both feed the model update. They are different artifacts because they answer different questions. [Part 5 — The Five Stations →](/volumes/ii-discovery/5-five-stations) --- --- url: /volumes/ii-discovery/5-five-stations.md --- part five · the five stations # The Five Stations > *The discovery walk — Vision & Context, Problem Framing, User Journey & Slices, Solution Options, Decision & Scope.* The Five Stations is the corpus's prescribed sequence for Discovery on a new initiative. The stations are walked in order. Each one produces an artifact the next one needs. Skipping a station does not save time — it pushes the missing work into the station that comes after, where it is harder to catch. ## Station 1 — Vision & Context **Question:** *Who is this for, what is their pain, why now, what does success look like?* Output: the **Initiative Problem Story**. A one-page document that names: * The named person and their context. * The pain — observed where possible, inferred where the observation is pending. * *Why now?* — what has changed that makes this initiative timely. * *Success looks like* — a one-paragraph description of the world if the initiative works. This is the station where Volume I's named initiative becomes a story the team can carry. The Initiative Problem Story is the artifact every later station refers back to. ## Station 2 — Problem Framing **Question:** *What do we believe, what do we know, what are we constrained by, what are we explicitly not doing?* Output: the **frame**. Four labelled sections. * **Validated assumptions.** Witnessed in Discovery so far. * **Unvalidated assumptions.** Believed but not yet witnessed. * **Constraints.** Hard ones — regulatory, contractual, technical floor — that bound the design space. * **Explicit non-goals.** What the team is *not* trying to do in this initiative, with rationale. The non-goals are the most useful section in the long run. Six months later, when scope creeps, the frame is what the team returns to in order to remember what was deliberately excluded. ## Station 3 — User Journey & Slices **Question:** *What is the full activity, where does it break, which break is in scope?* Output: the **journey map** (Part 3) plus a **slice declaration** that names which steps are in scope for this initiative and which are out. A journey of fifteen steps with three friction points might produce three slices. The first slice is what this initiative addresses. The other two become candidate next-initiatives. A station-3 conversation that produces *we should fix all three* is a station-3 conversation that has not yet become a Discovery decision. The discipline is to choose. ## Station 4 — Solution Options **Question:** *What are 2–3 plausible directions the initiative could go, and what do they trade off?* Output: the **options table**. Each option has: * A short name and a one-paragraph description. * The journey steps it changes. * The risks it carries. * The implementation cost order-of-magnitude (S / M / L). * The assumptions it leans on most. This station deliberately holds the team back from picking. The act of writing the options is what reveals their trade-offs. A team that arrives at this station with the answer already chosen has not done Discovery — they have done justification. The corpus pattern: **always include a "do nothing" option.** Sometimes the right move is to not pursue the initiative this cycle. The do-nothing option, written down with consequences, is what makes that decision honest. ## Station 5 — Decision & Scope **Question:** *Which option, what is the smallest first slice, what would success look like?* Output: the **decision artifact**. Names: * The chosen option, with a sentence on why this and not the other(s). * The MVP scope — the smallest first feature with a Volume II Feature Brief written. * The first three stories the trio recommends. * The KPI — a single signal that, if it moves, will tell us the initiative is on track. The KPI is not the prediction. The KPI is the *leading* signal that the initiative is alive. The prediction is the *lagging* signal, written into the Feature Brief with a check date. ## Walking the stations The stations are a *sequence*, not a *meeting agenda*. Each takes its own time. The corpus pattern is: | Station | Typical duration | |---|---| | 1 — Vision & Context | A short series of conversations + the observation work in Part 1 | | 2 — Problem Framing | A 90-minute trio session with the Initiative Problem Story in front | | 3 — User Journey & Slices | A morning of journey-mapping in front of the observation notes | | 4 — Solution Options | A 90-minute trio session, often with the Tech Lead leading the trade-off discussion | | 5 — Decision & Scope | A 60-minute leadership-present session, ending with sign-off | The stations cannot all happen in one day. Trying to compress the sequence is what produces the kind of brief that ships and is then surprised by reality. ## The artifact bundle When the stations have been walked, the initiative carries: * The Initiative Problem Story. * The frame (assumptions + constraints + non-goals). * The journey map with slices declared. * The options table with the chosen option marked. * The decision artifact with the MVP scope and KPI. This bundle is what the brief is built on. The brief is short — three pages, often two — because the bundle does the heavy lifting underneath. [Part 6 — Initiative Brief →](/volumes/ii-discovery/6-initiative-brief) --- --- url: /volumes/ii-discovery/6-initiative-brief.md --- part six · initiative brief # Initiative Brief > *Business gap, human gap, discovery questions, V.* The Initiative Brief is the highest-altitude artifact in Volume II. It holds the named initiative from Volume I in the form the rest of the chain can read. Every Feature Brief inside the initiative refers back to it. Every kill or continue decision at portfolio level reads it. A good initiative brief is short — two pages — and load-bearing. It is the document a team member who joins the project six months in reads first. ## Required sections ```text INITIATIVE BRIEF Title: Grading Flow v2 Owner: Alex (PO) Status: Discovery / Active / Killed (with date of state change) Goal it serves: Grading completed in <15 minutes (Volume I goal #2) Last reviewed: 2026-05-05 ──────────────────────────────────────────────── 1. Business gap What financial or strategic outcome would change if this initiative succeeded? 2. Human gap Who is the named person, and how is their life different if this initiative succeeds? 3. Current state What we have witnessed about today's activity. Reference to journey map and observation notes. 4. Discovery questions What does the team need to learn before scope can begin? List, with status (witnessed / inferred / not witnessed). 5. Value declaration (V) Range, most-likely, assumptions. Tied to Volume I Part 4. 6. Constraints Hard constraints — regulatory, contractual, technical floor. 7. Explicit non-goals What this initiative will NOT do. With rationale. 8. Sign-off Leadership + PO + Tech Lead. Dates. ──────────────────────────────────────────────── ``` ## What each section is for ### Business gap The financial or strategic side of *why this matters*. Not vague — anchored to Volume I's V declaration and the goal the initiative serves. > *Today, grading consumes ~32% of senior teaching time across the customer's 240-grader pool. At ~$45/hr fully loaded, that is ~$2.5M/year of grader time spent in a workflow that observation suggests is structurally inefficient. The initiative aims to recover the recoverable fraction.* ### Human gap The lived side. Person-first, witnessed where possible. > *Gal, an exam grader at the customer's flagship campus, currently spends an average 47 minutes per grading cycle. She grades ~60 cycles per week. Her contract ends at 5:30pm; in practice she leaves around 7:00 most evenings. The initiative aims to give Gal back the 32 hours/week that the system is currently consuming.* ### Current state A short paragraph anchored to the journey map. Refers to specific observed friction. ### Discovery questions The list of things the team needs to learn before scope. Each one tagged with its current status (witnessed / inferred / not witnessed). Discovery questions that remain *not witnessed* by the time the brief is signed off are explicit risks the chain is taking — visible, named, and acknowledged. ### Value declaration Pulls Volume I Part 4. Range, most-likely, assumptions. Tied to Volume V's VRI. ### Constraints Hard constraints only. Soft constraints — *we'd prefer not to* — go in the non-goals. ### Explicit non-goals Often the most useful section, six months later. Names what was deliberately excluded and why. > *Out of scope: bulk-import of past grading history. Rationale: not in V calculation; observation suggests graders rarely reference >2 weeks back.* ### Sign-off Names and dates. The brief is not active until signed. ## What it is *not* * Not a feature spec. Features come later, in Part 7. * Not a project plan. Plans come later, in Volume III. * Not a wireframe. Wireframes are Volume III. * Not a forecast. The brief contains predictions, but the brief itself is a description of the gap, not of the path. ## How long it stays current The Initiative Brief is the longest-lived Volume II artifact. It is updated at every model update (Volume V Part 6) — assumptions move state, V is adjusted explicitly, status is refreshed. A brief that has not been touched in three months is either: (a) the initiative is paused (which means it should be in *killed* state by the corpus rule), or (b) the brief has decayed and no one is using it. Both are problems. ## The signing rule The corpus rule: **execution does not begin until the Initiative Brief is signed.** A team that begins executing on the basis of a draft brief is a team operating on assumed alignment that has not been written down. The signing is what makes the alignment real. [Part 7 — Feature Brief →](/volumes/ii-discovery/7-feature-brief) --- --- url: /volumes/ii-discovery/7-feature-brief.md --- part seven · feature brief # Feature Brief > *Observation, journey, direction, prediction, sign-off.* The Feature Brief is the cycle's primary artifact. One Initiative Brief contains several Feature Briefs over its life. Each Feature Brief is small enough to fit a cycle and load-bearing enough to be worth predicting. Where the Initiative Brief describes the gap, the Feature Brief describes a single named change to close part of the gap. It carries the prediction the cycle will check. ## Required sections ```text FEATURE BRIEF Title: Grading: native Hebrew name support Initiative: Grading Flow v2 Cycle: Cycle 17 (2026-05-05 → 2026-06-05) Owner: Alex (PO) Designer: Maya Tech Lead: Yossi Status: Discovery / Active / Shipped / Killed Last reviewed: 2026-05-05 ──────────────────────────────────────────────── 1. Experience snapshot 150–200 words. Day-in-the-life narrative. Named person, specific moment, specific pain, specific outcome. 2. Purpose One sentence — what problem this solves. 3. Emotional aim What feeling should the named person leave with? 4. In scope Specific capabilities, not vague. 5. Out of scope Explicit exclusions, with rationale. 6. High-level user flow Step-by-step in plain language. No UI / tech detail. 7. Prediction Baseline, target, check date, check method, owner. 8. Success signal The leading signal that says the feature is alive. 9. Open questions Must be resolved before story-writing begins. Owner + timing. 10. Sign-off PO + Designer + Tech Lead. Dates. ──────────────────────────────────────────────── ``` ## Experience snapshot — the centre of the brief The most valuable section is the experience snapshot. 150–200 words. Day in the life. Named person. Specific moment. Specific pain. Specific outcome. > *It is Wednesday morning. Gal sits down at 08:50 with her coffee and opens the LMS to grade the morning's batch of CS101 finals. The fourth submission is from a student named Yael Rosenberg-Hayut. The system displays her name with the hyphen and accents broken; the surname looks unfamiliar. Gal pauses, recognises the issue, opens her secondary spreadsheet to look up the correct name, copies it, edits the LMS field, and continues. By the time she has done this for the seven Hebrew-named students in the batch, ten minutes have evaporated. By the end of the cycle, almost an hour. After this feature ships, Gal opens the LMS, sees Yael's full name correctly, grades, and never thinks about the spreadsheet again.* The snapshot is what makes the brief impossible to misread. It contains no UI language. No feature names. No technology. Just the felt moment. ## Prediction — the cycle's claim A single prediction, written in the form the corpus uses everywhere. ```text Prediction: Native Hebrew name support eliminates the spreadsheet workaround at journey step J6. Time saved per cycle: ~3 minutes mean. Cumulative cohort impact: ~6 hours/week. Baseline: Mean cycle time 47 min (n=12, captured pre-feature in Discovery, Apr 2026). Workaround used in 5/6 observed sessions. Target: Cycle time mean below 44 min (signal). Workaround used in 0/6 observed sessions (causal). Check date: 2026-06-15 (10 days post-flag-enabled). Check method: Three named-grader observation sessions. Time-on-task instrumentation will be in place but observation is authoritative. Owner: Alex (PO) ``` Volume V Part 2 will read this prediction back. Without it, that section has nothing to do. ## In and out of scope The corpus's pattern: name what is in scope as concretely as possible, and name out of scope as the explicit ground that future features will cover (or won't). ```text In scope: • LMS displays Hebrew names correctly across all unicode forms used in the customer's student database. • The grader-facing surfaces (queue, review, comment) all show correct names. Out of scope: • Bulk re-rendering of past grading reports. (Will be addressed in a follow-up cycle if patterns suggest demand.) • Right-to-left layout improvements unrelated to names. • Student-facing surfaces — student grading view is not used by Gal and is out of this feature's scope. ``` ## Sign-off The brief is not active until three signatures land: PO, Designer, Tech Lead. Each one is signing on something different. The PO signs *the change is worth making this cycle*. The Designer signs *the experience is shapeable in this scope*. The Tech Lead signs *the change is buildable in this scope under our constraints*. Each signature has a date. The dates compose a record of when the trio aligned. ## What it produces for the rest of the chain | Volume | What it inherits | |---|---| | III | The Experience Snapshot, the prediction, the in/out scope. Stories are sliced under this. | | IV | The success signal — the analytics events to instrument. The prediction — the *before* metric. | | V | The check. The signal reading is written *against* this brief. | [Part 8 — Technical Design Brief →](/volumes/ii-discovery/8-technical-design-brief) --- --- url: /volumes/ii-discovery/8-technical-design-brief.md --- part eight · technical design brief # Technical Design Brief > *System-witnessed problem, technical prediction.* The Technical Design Brief is the system-side counterpart to the Feature Brief. Some features need it; some don't. The Tech Lead decides — and the absence of one is itself a decision, recorded. The TDB exists when the feature contains a technical question whose answer the team does not yet know. It is the artifact that holds the technical Discovery before scope can be drawn at the system level. ## When a TDB is written The Tech Lead writes a TDB when: * The feature requires a new third-party integration whose failure modes are not yet mapped. * The feature changes the data model in a way that has migration consequences. * The feature stresses an existing system in a new direction (load, latency, retention). * The feature has security or compliance implications that have not been previously decided. * The feature would be the first of its kind in the codebase — a new pattern. The Tech Lead does *not* write a TDB when the feature is a routine extension of existing patterns. Most features fall in this latter category. ## Required sections ```text TECHNICAL DESIGN BRIEF Title: Hebrew name support — encoding & migration Feature: Grading: native Hebrew name support Owner: Yossi (Tech Lead) Status: Discovery / Active / Shipped / Spike-required Last reviewed: 2026-05-05 ──────────────────────────────────────────────── 1. System-witnessed problem What we have observed in the system today. Logs, traces, metrics. Witnessed, not assumed. 2. Boundary impact What services, schemas, contracts, integrations are touched. 3. Options considered 2–3 technical directions with trade-offs. 4. Technical prediction What change in system behaviour we expect. Baseline, target, check date, check method. 5. Spikes required Time-boxed investigations that must complete before scope. 6. Ilities selection Which non-functional requirements matter for this slice, to what level. Recorded as ADR or referenced from one. 7. Risks What could go wrong technically, with mitigation plan. 8. Sign-off Tech Lead + senior developer + relevant ops. Dates. ──────────────────────────────────────────────── ``` ## System-witnessed problem The TDB's first section is the system equivalent of the Feature Brief's experience snapshot. What did we see in the system that named the problem? Logs, traces, query plans, error rates, support escalations, dependency reports. > *Application logs from Apr 1–30 show 1,247 occurrences of `unicode-encoding-warning` on the `/api/students/{id}/name` endpoint. 89% of those names contain Hebrew unicode forms. Database column `students.display_name` is stored as UTF-8 but the LMS's frontend rendering fallback is Latin-1, producing mojibake. Workaround patterns in CS data — graders editing names in a spreadsheet — confirm this in the field.* System-witnessed is the technical analogue of *witnessed*. The Tech Lead has looked at the actual data, not at someone's description of it. ## Technical prediction A specific, measurable technical change. ```text Technical prediction: After the encoding fix and frontend update, occurrences of `unicode-encoding-warning` drop to <10/month and `name-edit` events from grader accounts drop to <5/week. Baseline: 1,247 unicode-warning/month, ~120 name-edit events/week, captured pre-feature. Target: <10 unicode-warning/month, <5 name-edit events/week. Check date: 2026-06-15. Check method: Pull the warning count and the name-edit event log. No interpretation — counts. ``` The technical prediction sits next to the Feature Brief's prediction. They check different things — the user-side and the system-side — but both check. ## Options and ADR The TDB lists 2–3 technical options with trade-offs. The chosen option becomes (or references) an [ADR](/volumes/iii-scope/6-adr) in Volume III. The TDB is *not* the ADR — the TDB documents the discovery; the ADR documents the constrained decision. If a TDB cannot decide between options, it identifies the spike that would. A spike is a time-boxed investigation, with a deadline and a deliverable. Spikes that go open-ended become Discovery debt. ## Spikes Spikes are short — usually 2–5 days — and produce a written conclusion. The TDB names the spikes required, the time budget, and the question each spike must answer. > *Spike: validate that the LMS frontend rendering fallback can be fixed without a frontend deploy. 3 days, owner Yossi. Conclusion to be written 2026-05-12.* Spikes that exceed their budget are escalated to portfolio level — they are evidence that the initiative requires more Discovery investment than was originally scoped. ## Ilities selection Which non-functional requirements matter for this feature, to what level. Listed in the TDB and recorded as an ADR if any of them is non-default. | Ility | Default | Adjusted? | |---|---|---| | Performance | <200ms p95 on `/api/students/*` | No change | | Availability | 99.9% per ADR-12 | No change | | Accessibility | WCAG 2.2 AA | No change | | Security | Auth required, no PII in logs | Re-asserted: confirm names are not logged | | Internationalisation | UTF-8 throughout | **Adjusted** — explicit Hebrew/RTL test coverage required | A feature whose ilities table contains *no* adjustments is a feature in a known shape. A feature whose ilities table contains adjustments is one where the ADRs need to record what changed. ## Sign-off Tech Lead, plus the most senior developer who will work on the feature, plus any operations engineer whose service is impacted. The TDB is not active until signed. ## When a TDB is *not* written This is a common error. A team writes a Feature Brief, the feature is non-trivial technically, but no one writes the TDB. The cycle proceeds. The system-side surprise lands at execution. Postmortem traces it to a Discovery gap. The corpus pattern: when in doubt, the Tech Lead writes a *short* TDB rather than no TDB. The cost of a half-page TDB is fifteen minutes. The cost of a missing one is sometimes the cycle. [Part 9 — Prediction Writing →](/volumes/ii-discovery/9-prediction-writing) --- --- url: /volumes/ii-discovery/9-prediction-writing.md --- part nine · prediction writing # Prediction Writing > *Baseline, target, check date, method, owner.* A prediction is a falsifiable claim about a measurable change, written before the cycle runs. It is the chain's smallest unit of honest commitment. A brief that does not contain one is not a brief — it is a description. This part of the volume is short, dry, and exact. The discipline does not vary. ## The five fields Every prediction has the same five fields. They are not negotiable. | Field | What it is | What goes wrong without it | |---|---|---| | **Baseline** | The current measured state. Number, with sample size and date. | The check has nothing to compare to. | | **Target** | The state we expect after the change. Number, range, or threshold. | The check cannot say *yes* or *no*. | | **Check date** | The day someone will run the check. | The cycle ends and no one runs the check. | | **Check method** | The specific way we will measure. | The cycle ends and people argue about what counted. | | **Owner** | The named person responsible for running the check. | The check date arrives and no one shows up. | The fields are anchored to measurable things. *We expect users to be happier* fails three out of five — no baseline, no target, no method. ## A prediction worth writing ```text Prediction: Gal completes a grading cycle in under 15 minutes. Baseline: 47 minutes (mean), 38 minutes (median), n=12, captured 2026-04-22 by direct observation. Target: <15 minutes (mean) across n>=8 observed cycles OR <12 minutes (median) across n>=8. Check date: 2026-06-15. Check method: Three observation sessions across three named graders, in the field, with a stopwatch and the existing time-on-task instrumentation as cross-check. Minimum 8 cycles total. Same observers as Discovery. Owner: Alex (PO). ``` ## A prediction that fails the five fields ```text Prediction: Make grading easier and faster for users. ``` No baseline. No target. No date. No method. No owner. This is not a prediction. It is a marketing claim. ## Range, threshold, ratio Predictions can be expressed three ways. Pick the one the change supports. | Form | When | Example | |---|---|---| | **Specific number** | The change has a single dominant effect | *Cycle time falls to <15 minutes mean* | | **Range** | The change has high variance across the population | *Time saved is in the range 25–40 minutes per cycle for graders with >50 cycles/week* | | **Threshold** | The change must clear a binary criterion | *Hebrew names render correctly across 100% of test corpus* | A prediction that hides behind *we'll see what happens* is not predicting. It is observing. ## Honesty in baseline The baseline is the most-skipped field. Teams write *we will improve X by 30%* without saying what X currently is. The check then becomes *we improved by 30% relative to nothing*, which is the chain's most common form of self-deception. The corpus pattern: **no baseline, no execution.** If the team does not know the current state with sample-size precision, the team has not finished Discovery. The first execution week of any cycle is sometimes spent capturing the baseline that should have been captured during Discovery. ## Honesty in check method The check method is the most-skipped field after baseline. Teams write predictions and execute, then arrive at the check date and discover the metric they need was not instrumented. The check becomes a survey, which is the chain's second most common form of self-deception. The corpus pattern: the check method is named in Discovery. If the method requires instrumentation that doesn't exist, that instrumentation is one of the cycle's stories — *as Alex (PO), I want to know how long Gal spent grading, so that I can run the check on 2026-06-15*. ## Predictions feed the check, the check feeds the model This is the chain. The prediction is written in Volume II. It is kept alive through Volume III's stories and Volume IV's instrumentation. It is run in Volume V Part 2 — *Signal and the Prediction*. The result feeds the model update — Volume V Part 6. A team that does this for three cycles in a row is already most of the way to a chain. A team that skips even one of the five fields will, within those three cycles, hit the kind of *not checked* outcome that the corpus identifies as the only result with no value. [Back to the volume cover →](/volumes/ii-discovery/) · [Volume III — Scope & Shape →](/volumes/iii-scope/) --- --- url: /volumes/iii-scope.md --- # Scope & Shape > *Translating prediction into shape. Epics named for activities, story mapping, the walking skeleton, ADRs, sequence diagrams, schema, API contracts, ilities, amigos sessions producing Gherkin scenarios trios can defend.* This volume describes the **Scope** phase. The prediction named in Volume II is given shape — Epics for activities, stories for moments, ADRs for the constrained technical choices the design hinges on, and Gherkin scenarios that the trio can defend. The output is a slice of work small enough to fit a release and large enough to change the situation. [Continue to the introduction →](/volumes/iii-scope/intro) --- --- url: /volumes/iii-scope/intro.md --- scope & shape — volume III # Introduction Volume II ended with a Feature Brief signed off — a witnessed problem, a prediction, a check date. Volume IV will begin with the first commit. This volume is the work of bridging those two — turning a prediction into the shape of a release. Scope is not a list of tasks. Scope is the act of declaring, in writing, what changes in the world this cycle and what does not. A scope without a *not* statement is not a scope; it is a backlog. *Scope is the smallest end-to-end slice of work that — when shipped — will move the prediction. Anything larger is a backlog. Anything smaller is a task.* ## What this volume covers Nine parts. * **Epic Naming & Kickoff** — coherent activities, named after what the person does. * **Story Mapping** — Epics as columns, stories as rows, releases as slices. * **Walking Skeleton** — the smallest end-to-end release that changes the situation. * **Story Writing** — person, moment, done, out-of-scope, Gherkin-ready. * **Amigos & Gherkin** — the trio session that produces shared meaning. * **Architecture Decision Records (ADR)** — constrained choices with rejected options. * **Sequence, Schema, API** — the three technical drawings every Epic needs. * **Ilities Selection** — which non-functional requirements matter, to what level. * **Slicing & Prioritization** — which stories in which release, value-driven. Volume III is where the chain feels most operational. It is also where the most chain damage is done — by skipping amigos, by writing stories without journey references, by choosing technical options without writing ADRs. Most production bugs trace not to Execution but to Scope: the missing state, the unwritten edge case, the technical choice that was made in a Slack thread. [Part 1 — Epic Naming & Kickoff →](/volumes/iii-scope/1-epic-naming) --- --- url: /volumes/iii-scope/1-epic-naming.md --- part one · epic naming & kickoff # Epic Naming & Kickoff > *Coherent activities, named after what the person does.* An Epic is a coherent activity that the named person performs. Not a feature. Not a release. An activity. The corpus's discipline is to name Epics after the activity, not after the system that supports it. ## The naming pattern | Bad Epic name | Good Epic name | Why the second is better | |---|---|---| | *Authentication overhaul* | *Sign in to start the day* | Activity-shaped — the person does this once, then the day begins | | *Hebrew name support* | *Grading without re-typing names* | Names the activity-side outcome, not the implementation | | *Billing v3* | *Pay once, see what was bought* | Names the moment from the person's vantage | | *Dashboard refresh* | *Open the morning, know what's owed* | Activity-shaped, names the moment | The system-shaped name leads to system-shaped stories. The activity-shaped name leads to story sequences that map directly to the journey. The journey, the Epic, and the stories converge on the same vocabulary. ## What an Epic contains An Epic is bigger than a story and smaller than a feature. It contains: * **A name** in the activity-shaped form. * **A one-line description** that names what the person does. * **Journey coverage** — which J-numbered steps this Epic addresses. * **A done-means** — the testable condition under which the Epic is finished. * **A backbones list** — the 5–8 candidate stories the Epic likely contains. * **A recommended start** — which story the trio recommends pulling first. * **Dependencies** — what other Epics or external work this depends on. * **Open questions** — what must be resolved before stories can be refined. ## The Epic kickoff The Epic kickoff is a 60-minute trio session — PO, Tech Lead, QA, plus the Designer when relevant. The output is the artifact above. The session has three movements. 1. **Read the brief together.** Twenty minutes. The brief is on screen. Each section is read aloud. Questions surface. The Tech Lead notes anything that suggests a TDB is missing or stale. 2. **Walk the journey.** Twenty minutes. Each journey step in scope is named. The Epic emerges as the coherent slice through those steps. 3. **Backbone candidate stories.** Twenty minutes. Each backbone is a one-liner — *as \[person], I want \[moment-action], so that \[outcome]*. Not refined. Sized roughly. The trio agrees on the recommended start. The kickoff produces an artifact, not a meeting note. The artifact is what the next session — story writing and amigos — refers to. ## Epic vs Feature A common confusion. The corpus pattern: * A **Feature Brief** in Volume II usually contains **one or more Epics** in Volume III. * An **Epic** contains stories that, when shipped, complete the activity-shaped outcome. * A **story** is a single moment within the Epic. For small features, the Epic and the Feature Brief look similar. For large features, the Feature Brief contains 2–3 Epics, each with its own journey coverage and done-means. ## When an Epic is not yet ready The kickoff sometimes reveals that the Epic is not yet ready to be sliced into stories. Common reasons: * **A discovery question is still *not witnessed***. Story-writing on it would produce stories built on assumption. * **A dependency is undated**. The Epic's done-means depends on a third-party integration whose contract is not signed. * **An ADR is missing**. A technical choice the Epic hinges on is not yet recorded. Each of these is named, owned, and dated. The Epic returns to the kickoff agenda when the blockers clear. Trying to write stories on an Epic that is not ready produces stories that need to be rewritten — the most common source of mid-sprint scope churn. [Part 2 — Story Mapping →](/volumes/iii-scope/2-story-mapping) --- --- url: /volumes/iii-scope/2-story-mapping.md --- part two · story mapping # Story Mapping > *Epics as columns, stories as rows, releases as slices.* A story map is the team's two-dimensional drawing of the work. Time goes left to right (the activity sequence). Detail goes top to bottom (the breadth of stories within each step). Releases are horizontal slices through the map. The story map is how the trio negotiates *which stories ship together* — a question that cannot be answered from a flat backlog. ## The shape ```text Activity flow → ───────────────────────────────────────────────────────────── | EPIC 1 | EPIC 2 | EPIC 3 | | Sign in to start | Open today's | Grade the | | the day | grading queue | morning batch | ───────────────────────────────────────────────────────────── | Email login | Queue loads | Open submission | ← walking | Forgot password | < 2s | Score the rubric | skeleton ───────────────────────────────────────────────────────────── | Magic-link login | Queue filters | Add written | ← release 2 | MFA | Queue search | feedback | ───────────────────────────────────────────────────────────── | SSO via SAML | Custom queue | Bulk grading | ← release 3+ | Session sharing | views | shortcuts | ───────────────────────────────────────────────────────────── ``` The top row is the **walking skeleton** (Part 3) — the smallest story per Epic that, taken together, makes a complete activity. Below that, releases get progressively richer. The bottom of the map contains stories that may never ship — they are the candidates the trio is willing to consider but has not yet decided on. ## How to draw the first version Anchored to the Epic Kickoff (Part 1) outputs and the journey map (Volume II Part 3). 1. Draw the Epic columns left to right, in journey order. 2. Under each Epic, list the candidate stories from the kickoff's backbone. 3. Sort each column top to bottom by *minimum viable* down to *would be nice*. 4. Draw the first horizontal cut — the walking skeleton — across the top of every column. The cut is the discussion. The trio negotiates which stories make the cut. The discipline: every Epic must contribute at least one story to the walking skeleton, or the Epic does not belong in this release at all. ## Releases as slices A release is a horizontal slice. The slice is shipped together — the cohesion is what makes it useful. Shipping half a slice is shipping nothing. The corpus pattern: name releases. *Walking skeleton*, *Release 2 — graders can do their day*, *Release 3 — graders can move faster*. Named releases are easier to defend in conversations with leadership and clients than numbered ones. ## What the map is for Three jobs. 1. **Negotiate scope without a backlog.** A backlog is a list of every story ever proposed. A map is a curated arrangement that shows trade-offs. 2. **Make the walking skeleton visible.** The horizontal cut at the top is the team's commitment for the cycle. 3. **Surface gaps.** An Epic with no candidate story is a gap that the kickoff missed. An Epic with too many candidate stories is one that the trio has not yet thought hard enough about. ## What the map is not Not a Gantt chart. Not a roadmap. Not a backlog substitute. The map describes *what could ship together* and *what is in the next slice*; it does not describe *when*. When-questions go in the release plan, which lives next to the map but is a separate artifact. ## Updating the map The map is updated: * After every Epic kickoff (new stories surface). * After every retrospective (priorities shift). * After every signal reading (predictions either confirm or surface new stories). * After every postmortem with structural implications. A static story map is one that is no longer being used. The map is meant to evolve as the chain learns. [Part 3 — Walking Skeleton →](/volumes/iii-scope/3-walking-skeleton) --- --- url: /volumes/iii-scope/3-walking-skeleton.md --- part three · walking skeleton # Walking Skeleton > *The smallest end-to-end release that changes the situation.* The walking skeleton is the corpus's most important slice. It is the smallest release that — when shipped — moves the prediction. Anything smaller is a feature; anything larger is a release plan. ## End-to-end is the rule The walking skeleton spans every Epic in the cycle, end to end. It does not fully build any one Epic. It builds the thinnest believable slice through all of them. For Gal's grading flow: | Epic | Walking-skeleton story | |---|---| | Sign in | Email login | | Open queue | Queue loads with today's submissions | | Grade | Open one submission, score it, save | | Submit | Submission marks the cycle complete | The skeleton is *Gal can grade one submission end to end*. It is not Gal's full work, but it is *complete work* — there is nothing missing for the activity to be performed once. ## Why end-to-end Because end-to-end is the only kind of slice that: * **Surfaces integration risk** at the cheapest moment. The expensive moment is when six independently-built features are wired together two days before release. * **Is checkable** against the prediction. A partial slice cannot be observed in the field doing the activity; the activity is incomplete. * **Is honest** about scope. *We have built half of every Epic* is a statement that hides what is finished from what is not. * **Lets the team rehearse the chain** end-to-end before stakes are high. The walking skeleton is sometimes ugly. It often has placeholder UI, manual fallbacks, or simplified rules. That is not a problem. It is a complete activity, even if rough. ## What the walking skeleton is not * It is not the MVP. The MVP is what is shipped to users; the walking skeleton may be shipped to a small pilot, or only to staging. They are different artifacts; sometimes the same one. * It is not a prototype. A prototype is throw-away. The walking skeleton is built on the same code paths the production version will eventually run on. * It is not the first release necessarily. For some products, the walking skeleton is shipped only internally; release 2 is the first user-visible release. ## Building one in practice The walking-skeleton stories are written *first* in any cycle. They are pulled into amigos first. They are coded first. They are integrated and shipped first. Other stories — the richness of each Epic — are pulled in after the skeleton is alive. A cycle that ships the richness of Epic 1 first and then discovers Epic 2 was harder than expected ends up with Epic 1 fully built and Epic 2 partially built. The skeleton-first pattern reverses this. ## The skeleton as the prediction's prerequisite The prediction names the change in the world. The walking skeleton is what makes that change observable for the first time. Without the skeleton, the prediction has nothing to land against. Concretely: *Gal completes a grading cycle in under 15 minutes* requires that Gal can complete *a grading cycle*. The walking skeleton is what makes that possible at all. Subsequent releases tune for time. ## When the skeleton can't be small Some features genuinely have non-trivial floors. A regulated payment integration. A real-time collaborative editor. The skeleton in these cases is still the smallest end-to-end version, but it may be larger than the team would prefer. The corpus pattern: when the skeleton is large, the cycle is reshaped around it. *We are spending this cycle on the skeleton; the richness comes next cycle.* The honest naming is what protects the chain from cycles that ship nothing because they tried to ship everything. [Part 4 — Story Writing →](/volumes/iii-scope/4-story-writing) --- --- url: /volumes/iii-scope/4-story-writing.md --- part four · story writing # Story Writing > *Person, moment, done, out-of-scope. Plus the nine-point Definition of Ready.* A story is a unit of work scoped to a single moment in a single person's activity, deliverable in 1–3 days, with at least three testable acceptance criteria. The corpus is opinionated about the form. The form is what makes the story interpretable across the trio without further conversation. ## The story format > *As \[named person, with brief context], I want \[specific action], so that \[meaningful outcome].* Always with a named person. Never *as a user*. ```text As Gal, an exam grader at our customer's flagship campus, I want the LMS to display Hebrew names correctly without re-typing, so that I can grade without breaking my flow. ``` The role description after the name is what carries the context the developer needs. *Gal at the customer's flagship campus* is different from *Gal who tests the LMS internally*. The story is for the first. ## What a story carries Every story has the same five sections. Less and the trio is filling in gaps in their head; more and the story has become a feature. | Section | What it contains | |---|---| | **Story sentence** | As–I want–so that | | **Journey reference** | J-number from the journey map | | **States to handle** | Empty, loading, success, error, key edge cases | | **Acceptance criteria** | 4–7 testable Given/When/Thens, including at least one negative | | **Out of scope** | What this story does not do, even though it is tempting | ## The states section This is the section most teams underuse. Every interactive moment has at least four states: * **Empty** — there is no data yet. * **Loading** — the data is on its way. * **Success** — the data has arrived and the activity continues. * **Error** — the data didn't arrive or arrived broken. Plus the edge cases the moment surfaces — long names, missing values, slow networks, repeated actions. The corpus pattern: a story missing the states section produces a feature that breaks on second use. The states are surfaced *before* code begins, in story-writing — not after, in QA. ## The Definition of Ready A story is *ready* — pullable into a sprint — only when nine points are confirmed. ```text 1. Story format ✅ Named person, action, outcome 2. Journey step reference ✅ J6 — Hebrew name editing workaround 3. ≥3 testable ACs ✅ 4 ACs, including 1 negative case 4. Visual / design ref ✅ Figma frame V2-J6-Names-State-A 5. Copy / user-facing text ✅ Empty, loading, error all defined 6. Observability signal ✅ Event name "name.display.normalized" 7. Dependencies identified ✅ None outside this Epic 8. Sized 1–3 days ✅ Sized 2 days by Tech Lead 9. Tech feasibility ✅ Confirmed in Epic kickoff (no spike needed) ``` A story missing one or more points is not ready. It can be pulled into amigos, where the missing points are surfaced and assigned. It cannot be pulled into a sprint until all nine are green. ## Sizing Three sizes. Larger is not allowed. | Size | Duration | Pattern | |---|---|---| | **Small** | <1 day | Single moment, single state, no surprises | | **Medium** | 1–2 days | Standard moment with states, Gherkin scenarios already known | | **Large** | 2–3 days | Moment with multiple states or new infrastructure | Anything larger is split. The split is part of story-writing; it is not deferred to *we'll see how it goes*. A story that arrives at amigos at *5 days* is a story that has not been written; it is a wish. ## What the story is for The story is for three audiences, all of whom must read it and arrive at the same understanding without further conversation: * **The developer** — knows what to build. * **The QA** — knows what to test. * **The PO** — knows what they will accept. Amigos (Part 5) is the session that confirms the convergence. If amigos surfaces disagreement, the story goes back for rewriting before code begins. [Part 5 — Amigos & Gherkin →](/volumes/iii-scope/5-amigos-gherkin) --- --- url: /volumes/iii-scope/5-amigos-gherkin.md --- part five · amigos & gherkin # Amigos & Gherkin > *The trio session that produces shared meaning before code begins.* Amigos is a 45-minute session where the trio — PO, Developer (often Tech Lead), QA — takes one story and produces named Gherkin scenarios. It is the smallest unit of shared meaning between phases. Without it, the story produces three different features in three different heads, and they meet for the first time in PR review. ## The session | Time | What happens | |---|---| | 0–5 min | The PO reads the story aloud. The journey reference is on screen. | | 5–15 min | The trio walks the states section. Each state surfaces or refines one or more scenarios. | | 15–30 min | The trio writes Given/When/Then scenarios collaboratively. QA leads the *Then*. Developer leads the *When*. PO leads the *Given*. | | 30–40 min | The trio attacks each scenario. *Where else does this break? What else should we test?* | | 40–45 min | Sign-off. The story is ready for code. | The session is bounded. It does not run long. If 45 minutes is not enough, the story is too large; split it. ## What the trio produces For each story, a named set of Gherkin scenarios. Three or more, including at least one negative. ```gherkin Scenario: Hebrew name renders correctly on first load Given Gal is logged in as a grader And there is a submission from a student named "Yael Rosenberg-Hayut" When Gal opens the grading queue Then the student's name is displayed as "Yael Rosenberg-Hayut" And no encoding warnings are present in the application log Scenario: Hebrew name with mixed forms renders correctly Given Gal is logged in as a grader And there is a submission from a student named "אריאל שמעון-לוי" When Gal opens the submission Then the student's name renders correctly in both the queue and review screens And the right-to-left layout is applied to the name field Scenario: Name with rare unicode form falls back gracefully Given Gal is logged in as a grader And a submission contains a unicode form not in the supported set When Gal opens the submission Then the name is displayed with the unicode-fallback character And a system warning is logged for review And Gal can still grade the submission without re-typing ``` The scenarios are written from the person's vantage. *Given Gal is logged in* — not *Given the user has a session token*. Gherkin that loses the person becomes Gherkin that the QA cannot defend. ## Why the trio Each role brings something the others cannot. * **The PO** holds the brief and the prediction. They keep the scenarios pointed at what the cycle is checking. * **The Developer** holds the system. They surface what is implementable and what would require unscoped work. * **The QA** holds the failure imagination. They surface the negative cases, the edge cases, the *what if the network drops* questions. Amigos with two roles is not amigos. The third role is what catches the gap the other two would have shipped. ## What amigos is not * Not a planning meeting. Stories are not pulled into the sprint here; they are made *ready* here. Pulling happens at sprint planning. * Not a code review. The scenarios describe behavior, not implementation. Implementation is the developer's later choice — the scenarios constrain it, not the reverse. * Not a workshop. 45 minutes. One story. Two stories per amigos session, max, if both are small. ## When amigos surfaces a problem Common problems and their fixes. | What surfaces | Fix | |---|---| | The story is missing a state | Story goes back to the PO; rewrite, return to amigos | | The story is too large | Split. Sometimes here in the session; sometimes the PO takes it away | | A technical assumption is wrong | Story is paused. Spike opened. ADR may be needed | | The Gherkin keeps drifting into UI | The story is pointed at the wrong moment; rewrite | | Three scenarios isn't enough | Add more until the trio agrees the story is defensible | ## After amigos The Gherkin scenarios live next to the story, not in a separate document. They are linked from the brief. They become the test scenarios in Volume IV — sometimes lightly edited for technical fit, but the names and the *Then* statements stay. Amigos that produces good scenarios produces a story that the developer can build, the QA can verify, the PO can accept — without further conversation. [Part 6 — Architecture Decision Records →](/volumes/iii-scope/6-adr) --- --- url: /volumes/iii-scope/6-adr.md --- part six · architecture decision records # Architecture Decision Records (ADR) > *Constrained choices, with rejected options.* An ADR records a technical decision that constrains future code. It names what was decided, what was considered and rejected, the trade-offs explicitly accepted, and the consequences. It does not record decisions that have no constraints — *we are using TypeScript* is not an ADR; *we are using Mongoose rather than Prisma for this service because of \[X], and accepting \[Y] as a consequence* is. The ADR is the technical artifact that survives the developer leaving. It is the corpus's main weapon against the *we made that choice for a reason but no one remembers it* failure mode. ## The MADR template The corpus uses MADR (Markdown Architecture Decision Record) format. Every ADR has the same seven sections. ```markdown # ADR-NNN — [Title] Status: Proposed | Accepted | Deprecated | Superseded by ADR-MMM Date: YYYY-MM-DD Author: [name] Context-link: [Feature Brief, Initiative, related ADRs] ## Context and Problem Statement What is the situation? What forced this decision now? ## Decision Drivers The forces, constraints, and requirements that bound the choice. ## Considered Options At least 2, ideally 3 + "do nothing". ## Options Analysis Pros and cons of each option. ## Decision Outcome The chosen option, with the rationale and the trade-offs we are explicitly accepting. ## Consequences - Positive: what gets better - Negative: what gets harder - Risks: what could go wrong ## Implementation Notes (optional) Any specifics the implementing developer needs. ``` ## When to write one The Tech Lead writes an ADR when: * A technical choice has consequences that will be felt outside the cycle. * A choice is being made between options that are non-trivially different. * A constraint is being applied that future developers will need to know about. * A trade-off is being explicitly accepted in service of something else. The Tech Lead does *not* write an ADR for: * Routine choices inside an established pattern. * Style decisions covered by linters and conventions. * Decisions with one obvious option. ## Why two options minimum The corpus rule: **never write an ADR with fewer than two options considered.** The *one obvious option* is rarely as obvious as it feels. The act of writing down the rejected options is what reveals their weakness — and sometimes their strength. ADRs with one option are statements of preference; ADRs with two are decisions. The third option is often *do nothing* — keep the current pattern, accept the friction, defer the decision. Naming *do nothing* explicitly with consequences forces the team to either accept the deferral or commit to the change. ## States | State | Meaning | |---|---| | **Proposed** | The decision is being discussed; the ADR is being read by the team | | **Accepted** | The decision is in force; new code complies | | **Deprecated** | The decision is no longer the right one but has not been replaced | | **Superseded** | A new ADR has replaced this one; this ADR points to it | ADRs are never deleted. They are part of the record. Even superseded ADRs are useful — they tell future developers what was tried and why it changed. ## Where ADRs live In the corpus's reference layout: ```text /docs/architecture/adr/ ADR-001-typescript-and-strict-null-checks.md ADR-002-mongoose-over-prisma-for-grading-service.md ADR-003-feature-flag-platform.md ... ``` ADRs are numbered sequentially. The number stays even when the decision is superseded — a deprecated ADR-002 still exists; ADR-019 supersedes it. ## A short example ```markdown # ADR-014 — Use Bull (Redis) for grading job queue Status: Accepted Date: 2026-04-12 Author: Yossi (Tech Lead) Context-link: Feature Brief — Grading Flow v2; ADR-006 (Redis cluster) ## Context and Problem Statement Grading produces a sequence of background jobs (rendering reports, notifying students, syncing to LMS). We need a job queue. The existing services use ad-hoc setTimeout patterns; this is the first service requiring a real queue. ## Decision Drivers - Reliability of delivery (jobs must not silently fail) - Operability (queue introspection, retry, dead-letter) - Existing infra (Redis cluster from ADR-006) - Developer familiarity ## Considered Options 1. Bull (Redis-backed) — Node ecosystem standard 2. SQS — managed, durable 3. RabbitMQ — full broker 4. Do nothing — keep ad-hoc setTimeout ## Options Analysis [for each option, pros/cons] ## Decision Outcome Bull. Reuses Redis cluster (no new infra). Mature in the Node ecosystem. Trade-off accepted: Bull is in maintenance mode; we will revisit when BullMQ migration is justified by team capacity. ## Consequences - Positive: Job queue with retry, DLQ, and introspection in 2 days. - Negative: Couples the grading service to Redis availability. - Risks: Bull's maintenance status; mitigated by tracking BullMQ. ``` ## ADRs and the chain ADRs are read in two places. * **At Epic kickoff (Part 1)** — to confirm the technical pattern the Epic depends on is current. * **At postmortem (Volume V Part 4)** — when an incident traces to a technical choice, the ADR is read. If the ADR is missing or wrong, the postmortem produces a new or replacement one. A team whose ADR record is current and read is a team that has fewer of the *who decided this* conversations and more of the *we decided this, here is what we'd accept changing* conversations. [Part 7 — Sequence, Schema, API →](/volumes/iii-scope/7-sequence-schema-api) --- --- url: /volumes/iii-scope/7-sequence-schema-api.md --- part seven · sequence, schema, api # Sequence, Schema, API > *The three technical drawings every Epic needs.* Volume III's technical artifact set is small. Three drawings, plus one or more ADRs to constrain them. The drawings are not deliverables — they are the conversation aids that the Tech Lead, the developer, and the QA all read before code begins. ## Sequence diagram — what flows where A sequence diagram shows the flow of a request through the services that handle it. Verbs are arrows. Boxes are services or components. The diagram includes the failure paths, not just the happy path. ```mermaid sequenceDiagram autonumber actor Gal participant Web as Web participant API as Grading API participant LMS as LMS Adapter participant DB as Postgres Gal->>Web: Open submission #1234 Web->>API: GET /submissions/1234 API->>DB: SELECT s.*, st.display_name FROM ... DB-->>API: row API->>LMS: GET /students/567/normalised-name LMS-->>API: { name: "Yael Rosenberg-Hayut" } API-->>Web: { submission, student } Web-->>Gal: rendered submission alt LMS unreachable LMS--xAPI: timeout API->>API: log unicode-fallback API-->>Web: { submission, student: { name: db.display_name } } end ``` The corpus pattern: every Epic has at least one sequence diagram. A single happy-path arrow set with the alt-flow for the most likely failure. A diagram that only shows the happy path is missing the part the on-call will care about. ## Schema design — what the system remembers A schema diagram or written schema definition shows the tables, columns, constraints, and relationships the Epic introduces or changes. The corpus pattern is to write schema in DDL even when the framework would generate it — the canonical form is what the team reviews. ```sql -- Migration: add normalized_display_name to students ALTER TABLE students ADD COLUMN normalized_display_name TEXT GENERATED ALWAYS AS (normalize_unicode(display_name)) STORED; CREATE INDEX idx_students_normalized_name ON students (normalized_display_name); ``` Schema changes are the part of the system most likely to produce migration drama. The corpus pattern: every schema change has a backward-compatible migration plan, written and reviewed before the cycle starts. The plan answers: * Is the change additive (safe to roll forward) or modifying (requires sequencing)? * Can the change be reverted in production without data loss? * What is the read/write pattern during the migration window? * What is the staging rehearsal plan? A schema change without a migration plan is a story that has not finished story-writing. ## API contract — what the system promises For every API surface change, an explicit contract: verb, path, request shape, response shape, errors, guarantees. ```http GET /api/v2/submissions/:id Path params: id String. Submission ID. Query params: include Optional, comma-separated. Allowed: "rubric,student". Response 200: { id, status, submitted_at, student: { id, display_name, normalized_display_name }, rubric: { ... } // present if include=rubric } Response 404: { error: "submission_not_found" } Response 403: { error: "not_authorised_for_submission" } Guarantees: - normalized_display_name is always present and non-empty. - Read repeatable within a 30s window post-write. ``` The contract is what the trio agrees on. The implementation conforms to it. Drift between contract and implementation is the most common source of *the API broke* support tickets. ## When state machines are needed Some entities have multiple states with rules about which transitions are allowed. *Submission*: pending → in-progress → graded → returned → final. The corpus pattern: when an entity has more than three states, draw the state machine explicitly. ```mermaid stateDiagram-v2 [*] --> Pending: submission received Pending --> InProgress: grader opens InProgress --> Graded: scores saved InProgress --> Pending: grader closes without saving Graded --> Returned: returned for resubmission Graded --> Final: cycle locked Returned --> InProgress: resubmission opened Final --> [*] ``` Disallowed transitions are the silent source of bugs. A state diagram that the trio reads at amigos catches them before code. ## Data flow — when integration is the point For Epics that cross system boundaries — third-party integrations, data ingestion, ETL — a data flow diagram shows the full path of the data: source, transformations, persistence, consumers. ```mermaid flowchart LR LMS[LMS] -->|hourly poll| Sync[Sync Job] Sync --> Q[Validation Queue] Q --> V[Validator] V -->|valid| DB[Grading DB] V -->|invalid| DLQ[Dead-letter] DB --> API[Grading API] API --> Web[Grading Web] ``` The data flow is what surfaces the unhappy paths — what happens to data that fails validation, where it goes, who notices. ## When to write which | Drawing | When | |---|---| | Sequence diagram | Always — at least one per Epic | | Schema | Whenever a schema change is involved | | API contract | Whenever an API surface changes | | State machine | When an entity has more than 3 states | | Data flow | When an Epic crosses system boundaries | The Tech Lead decides which apply. The decision is named at Epic kickoff. [Part 8 — Ilities Selection →](/volumes/iii-scope/8-ilities) --- --- url: /volumes/iii-scope/8-ilities.md --- part eight · ilities selection # Ilities Selection > *Which non-functional requirements matter for this Epic, to what level.* Ilities — performance, accessibility, security, learnability, durability, internationalisation, reliability — are the requirements that don't show up in the user story but that determine whether the feature lives or dies once shipped. The corpus pattern: ilities are selected, not assumed. The selection is recorded in the brief and (where adjusted) in an ADR. ## The standard list ```text Performance (latency, throughput, resource use) Reliability (availability, fault tolerance) Security (auth, authz, data protection, audit) Privacy (PII handling, retention, deletion) Accessibility (WCAG, keyboard, screen reader, contrast) Internationalisation (locale, RTL, unicode forms) Learnability (first-use, no-training adoption) Maintainability (code clarity, deletability) Operability (monitorability, runbook coverage) Durability (data loss tolerance) Scalability (load shape, growth path) ``` ## The default and the deviation Each project has a defaults table. *We do not write it down once and forget it.* It lives in a top-level ADR, and every brief either confirms the defaults or names the deviation. ```text Project: Grading Flow v2 Default ilities: Performance <200ms p95 on read endpoints Reliability 99.9% monthly availability per ADR-12 Security Auth required; no PII in logs; audit logs retained 90d Privacy Hebrew names, English names — no other PII collected Accessibility WCAG 2.2 AA Internationalisation UTF-8 throughout; Hebrew + English Learnability Graders adopt without training (target: <10 min first use) Maintainability Standard project conventions Operability Standard observability per ADR-08 Durability No data loss; standard backup policy Scalability Up to 50 concurrent graders per customer ``` When a brief deviates — *we are accepting <500ms p95 for this batch endpoint* — the deviation is recorded in the brief and (if structurally significant) gets its own ADR. ## How to read each ility for an Epic Each ility translates into questions the trio answers before code begins. ### Performance * What is the latency target? p50, p95, p99. * What is the throughput shape? Burst vs steady? * What load profile do we expect this Epic to add? ### Reliability * What is acceptable downtime? * What dependencies introduce failure modes? * What is the failure-degradation pattern? Fail closed, fail open, fail soft? ### Security * What auth and authz changes does this introduce? * Does this Epic touch PII or new sensitive data? * What audit logging is required? * Has the threat model been re-read? ### Accessibility * Keyboard-only navigation across the new flows? * Screen reader output verified? * Contrast ratios? * RTL behaviour validated? ### Internationalisation * Unicode forms fully supported (NFC/NFD/NFKC/NFKD)? * Locale-aware sorting and search? * RTL layout? * Date/number format handling? ### Learnability * First-use onboarding? * Empty states? * Help text in domain language? ### Maintainability * Standard project patterns or new ones? * Test coverage adequate? * Dependencies introduced reviewed? ### Operability * New metrics, traces, log fields? * New runbook entry? * Monitor/alert rules updated? ### Durability * Data loss tolerance? * Backup and restore tested? ### Scalability * Growth headroom for the next 12 months? * Hot-spot risk? ## The UX/product ilities Some ilities live more on the product side than the system side. The corpus calls these *UX/product ilities* and the Designer + PO own them. * **Learnability** — can a new person use this without training? * **Content clarity** — does the on-screen language match the person's domain language? * **Responsiveness** — does the experience hold across the devices the person uses? * **Comprehension** — does the person know what state they are in at any moment? These are real ilities. They are checked in design review and in QA. A feature that ships with broken learnability is a feature that produces support tickets, regardless of how reliable the backend is. ## What gets recorded where | Selection | Where it lives | |---|---| | Project defaults | A top-level ADR in `/docs/architecture/adr/ADR-001-default-ilities.md` | | Per-Epic confirmation | The Feature Brief or Technical Design Brief | | Per-Epic deviation | A new or referenced ADR | | Per-story specifics | The story's acceptance criteria, often as Gherkin scenarios | ## When ilities are skipped The pattern is predictable. A team ships a feature, it works in staging, it fails on the third day in production because of a load profile the brief did not name. The postmortem traces the gap to the ilities table that was left as default when it should have been adjusted. The corpus pattern: if a feature touches a load path that hasn't been seen at scale, the Tech Lead writes a one-line ility deviation in the brief, even if no number is yet known. Naming the unknown is the cheapest insurance against the missed unknown. [Part 9 — Slicing & Prioritization →](/volumes/iii-scope/9-slicing-prioritization) --- --- url: /volumes/iii-scope/9-slicing-prioritization.md --- part nine · slicing & prioritization # Slicing & Prioritization > *Which stories in which release, value-driven.* Slicing is the act of choosing the smallest coherent set of stories that, together, change the situation enough to be worth releasing. Prioritization is the act of choosing which slice ships first. Both are PO decisions, made on the story map (Part 2) in front of the trio. ## The slicing question > *If we ship only this set, will the prediction be checkable?* If yes, the slice is candidate-shippable. If no, more is needed — but not arbitrarily more; only what makes the prediction checkable. This is what makes slicing different from feature-listing. A feature list says *we want all of these*. A slice says *these together change the situation, and the rest can wait*. ## The prioritization principle Stories are prioritized within a slice by *what unlocks the rest*. Inside the walking skeleton, the order is: 1. The story that makes the data flow *exist* — the empty pipe. 2. The story that puts the first record through. 3. The story that lets the activity be performed end to end. 4. The stories that handle the first edge cases. 5. The richness — performance tuning, polish, additional states. A slice that is built in this order is a slice where, at any point, the team has working software that does *something* the named person can use. A slice built in any other order produces working components that do not connect until the last day. ## Value-driven slicing The corpus pattern: every story carries a value tag, derived from the Feature Brief's V (Volume I Part 4). | Value tag | Meaning | |---|---| | **Required for prediction** | Without this story, the prediction cannot be checked | | **Material to V** | This story's presence/absence affects V by >10% | | **Marginal** | This story rounds out the experience but does not move V meaningfully | | **Optional** | Nice-to-have, candidate for later release | A slice contains *all* required-for-prediction stories. It contains some material-to-V stories, prioritized by *which moves V earliest*. It rarely contains marginal or optional stories — those wait for richness releases. ## The slicing conversation Held by the PO at the end of Epic kickoff (Part 1) and reviewed at story mapping (Part 2). The PO names the slice. The trio reviews. Two questions: 1. *Does the slice deliver the prediction?* Required-for-prediction stories all in? 2. *Does the slice hold together?* Does it make sense to ship as one thing? A slice that doesn't deliver the prediction is unfinished. A slice that doesn't hold together is two slices wearing one name. ## Within the slice — sprint shape A slice is then sequenced into sprints (or whatever cadence the team uses). The order respects the *unlocks the rest* principle. ```text Sprint 1 (the skeleton): - Story A: data flow exists (the empty pipe) - Story B: a record passes through end to end - Story C: the named person can perform one full activity Sprint 2 (the first richness): - Story D: most common edge case - Story E: error states - Story F: the next-most-used path Sprint 3 (predict-checkable): - Story G: instrumentation for the prediction's check - Story H: the analytics events - Story I: the documentation the CS team needs ``` Sprint 1 makes something Gal can do once. Sprint 2 makes it survive contact with the messy real world. Sprint 3 makes it possible to run the check. ## What slicing is *not* * Not "split the story into smaller tasks". Tasks are technical sub-units the developer creates; they are not slices. * Not breaking by component (frontend / backend / infra). The slice spans every layer. * Not "MVP minus features". The slice is positively defined — what is in. Not negatively defined as a stripped-down feature. ## When slicing is hard Some Epics resist slicing. Two patterns to recognise. * **The Epic is actually one slice.** The work is irreducibly end-to-end. Accept it. Plan the cycle around the larger slice. Be honest with the client and leadership about what *one slice* means here. * **The Epic is two Epics in disguise.** Re-read the Epic kickoff output. Sometimes the activity that was named contains two coherent activities and the slice you can't draw is across both. Split the Epic. ## What this volume produces, in one sentence > *Volume III turns a prediction into a slice — small enough to fit a release, end-to-end through every Epic, with the stories trios can defend and the technical drawings the on-call can read at 3am.* [Back to the volume cover →](/volumes/iii-scope/) · [Volume IV — Execution →](/volumes/iv-execution/) --- --- url: /volumes/iv-execution.md --- # Execution > *The prediction goes live. Domain language survives the trip from brief to code. Trunk-based flow, conventional commits, feature flags, the pipeline, the release gate, runbooks. The cycle's machinery.* This volume describes the **Execution** phase. The Volume III scope is now in motion. The trio is building, testing, integrating, releasing. The pipeline catches a different chain level at each stage. The release gate is the moment the chain steps from *building* to *watching*. By the end of this volume, the flag is enabled and Volume V's first 48 hours has begun. [Continue to the introduction →](/volumes/iv-execution/intro) --- --- url: /volumes/iv-execution/intro.md --- execution — volume IV # Introduction Volume III ended with a slice — stories ready, scenarios defended, ADRs current, the technical drawings on the wall. Volume V opens with the flag enabled. This volume is the work that fills the gap. Execution is where most teams operate by default. It is the part of the chain that has the most tools, the most automation, the most visible progress. The corpus's discipline in this volume is not to invent more tooling — the world has plenty — but to keep the machinery aligned with the chain. The pipeline catches different levels of mistake at different stages. The release gate is not a meeting; it is a state. The runbook is not paperwork; it is the on-call's only friend at 3am. *Execution is the work of carrying a prediction from a signed brief through code, test, integration, and release — without losing the meaning that was named in Volume II.* ## The shape of this volume Nine parts. * **Domain Language in Code** — the names from the brief survive the trip into the codebase. * **Trunk-Based Development** — short-lived branches, continuous integration. * **Feature Flags** — wrapping new behavior; rollback is one switch. * **The CI/CD Pipeline** — six stages, each catching a different chain level. * **Testing Layers** — unit, contract, integration, visual regression — each for a different gap. * **Release Gate** — the named conditions the chain must satisfy. * **Gradual Rollout** — pilot, percentage ramp, full enablement. * **Runbooks & Rollback** — written before the incident, rehearsed in staging. * **Observability** — logs, traces, metrics, events. The instrumentation Volume V reads. [Part 1 — Domain Language in Code →](/volumes/iv-execution/1-domain-language) --- --- url: /volumes/iv-execution/1-domain-language.md --- part one · domain language in code # Domain Language in Code > *The names from the brief survive the trip into the codebase.* The discipline is small and load-bearing. The same words the named person uses for the things in their work appear in the code, the API, the database, the analytics events, and the support documentation. When this discipline holds, the trio can read each other's work without translation. When it breaks, every conversation between PO, developer, and QA carries a small tax — *what we call X, the code calls Y* — and over time, the tax compounds into bugs, support escalations, and refactors that should never have been necessary. ## The trip from brief to code ```mermaid flowchart LR classDef p fill:#fdfcf8,stroke:#3a51a3,color:#1f2c5e classDef bad fill:#fde8e8,stroke:#a83d3d,color:#5a1d1d Brief([Feature Brief: 'submission']) --> Story([Story: 'submission']) Story --> Story2([Acceptance criteria: 'submission']) Story2 --> Code([Function: getSubmission]) Code --> Schema([Table: submissions]) Schema --> API([API: /submissions/:id]) API --> Event([Event: submission.opened]) Event --> Analytics([Analytics: submission_opened]) Analytics --> Doc([Help text: 'Submission']) ``` If at any node the word changes — *submission* becomes *exam* becomes *paper* becomes *attempt* — the chain leaks. Six months later, a CS rep tells a customer they have *17 attempts pending*, and the customer has no idea what *attempts* are because they call them *submissions*. ## Three places the leak begins 1. **The codebase already has a different word.** The new feature uses *submission*; the codebase calls it *attempt* because it was named that way in 2018. The fix is a rename — small, scary, almost always worth it. A small ADR captures the decision. 2. **A developer translated unconsciously.** *Submission* felt long; the function got named `getSub()`. The fix is a code review with the brief in hand. 3. **A different system imposes its name.** A third-party LMS calls submissions *attempts*; the integration introduces the name into our code. The fix is to keep the third-party name *only* at the boundary; translate inside the boundary to our domain word. ## Reviewing for domain language The corpus's pattern: code review (Part 11 — *Code Review* in the Master Areas) explicitly asks the question *do the names in this PR match the brief*. If not, the PR returns. This is not pedantry. The naming review is the cheapest moment to catch a mismatch. Every later moment costs more. The PR reviewer reads the brief. Then reads the diff. The first question is not *does this work*. The first question is *do the names match*. ## What "domain language" includes * **Entity names** — *submission*, *grader*, *rubric*, *cycle*. * **Action names** — *grade*, *return*, *finalise*, not *update*, *patch*, *post*. * **State names** — *pending*, *in-progress*, *graded*, not *active*, *processing*, *done*. * **Quantity names** — what the field actually represents. *Cycle time in seconds* is honest; *duration* is vague. * **Error messages** — when something goes wrong, the message uses the same vocabulary. ## The boundary translation pattern When a third-party system uses different vocabulary, translate at the integration boundary. The corpus pattern: ```typescript // LMS adapter — translates LMS vocabulary to ours interface LMSAttempt { id: string; learner_id: string; ... } interface Submission { id: string; student_id: string; ... } export function fromLMS(attempt: LMSAttempt): Submission { return { id: attempt.id, student_id: attempt.learner_id, // ... } } ``` The rest of the codebase never sees *attempt* or *learner*. They are LMS words; we use *submission* and *student*. The translation is in one file, named clearly, and the boundary is visible. ## Why this matters Three reasons. 1. **The brief becomes the spec.** The trio agrees in domain language; the code matches. There is one source of truth. 2. **Support reads what the code says.** When a CS rep looks at logs, the words match the customer's vocabulary. Triage is faster. Escalations are clearer. 3. **The next developer onboards faster.** A codebase whose words match the product is one a new person can read. ## When the codebase has bad names Renaming is hard. The corpus pattern: rename *with the brief*. When a brief lands that introduces a word the codebase calls something else, the trio decides at Epic kickoff whether to rename now or after the cycle. Either decision is fine. *Don't rename and don't decide* is the failure mode. A codebase that has had *submission* and *attempt* meaning the same thing for two years is a codebase carrying tax. The tax is paid in every onboarding, every PR review, every support escalation. The interest is small per transaction; the principal compounds. [Part 2 — Trunk-Based Development →](/volumes/iv-execution/2-trunk-based) --- --- url: /volumes/iv-execution/2-trunk-based.md --- part two · trunk-based development # Trunk-Based Development > *Short-lived branches, continuous integration.* Trunk-based development is the corpus's default branching pattern. Branches live for hours, not weeks. Integration happens continuously into `main`. Long-lived feature branches are an anti-pattern that produces merge conflicts, hidden divergence, and integration drama at exactly the wrong time. ## The pattern * One trunk: `main`. It is always shippable. * Branches are named: `feat/grading-flow-hebrew-names-001`. Slug includes the story or epic. * Branches live <2 days. Anything longer goes back behind a flag and merges to trunk in pieces. * Every commit on `main` runs through the pipeline (Part 4) and stays releasable. ## Why short-lived A branch that lives a week becomes a thing the team manages instead of a thing the team ships. Conflicts compound. Code reviews grow large. Integration testing happens against an old version of trunk. By the time the branch is merged, the work has to be re-validated against current trunk, which has moved on. A branch that lives two hours is a small unit of meaning. The PR is small. The review is fast. The integration is immediate. Trunk has not moved meaningfully. The work flows. ## How big features fit The honest question: *if branches are short-lived, how do we land big work?* Two answers. 1. **Slice the work behind a feature flag.** New behavior is wrapped in `if (flag('hebrew-names'))`. The flag starts off. Many small commits land on `main`, none of them visible to users. Eventually the work is complete; the flag is turned on. See [Part 3 — Feature Flags](/volumes/iv-execution/3-feature-flags). 2. **Land scaffolding first.** Refactors, schema additions, and stub implementations land before the user-visible logic. By the time the visible work begins, the foundation is on trunk and tested. The combination — flagged behavior + foundation-first commits — is what makes trunk-based development feasible for non-trivial features. ## Conventional commits Every commit on `main` follows the conventional commits pattern. ```text type(scope): description — STORY-ref feat(grading): native Hebrew name support in submission view — GRD-142 fix(grading): handle NFD/NFC normalisation for student names — GRD-142 docs(adr): ADR-014 add unicode normalisation rationale — GRD-142 chore(deps): bump unicode-normalize to 1.4.2 — GRD-142 test(grading): cover RTL+LTR mixed names — GRD-142 ``` The format is not aesthetic. It is functional. * **Type** — `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `perf`, `build`, `ci`, `style`. Determines whether the commit shows up in the changelog. * **Scope** — the area of the codebase. Helps reviewers and the changelog. * **Description** — what changed, in present tense. Imperative. * **Story reference** — the Volume III story slug or ID. Connects the code to the brief. Changelogs (Part 8 — *Changelog Generation*) are auto-built from these. A commit that does not follow the pattern is a commit the chain cannot read. ## Code review Every PR reviewed by at least one other engineer. The review reads, in this order: 1. **The brief and story** — what is this for? 2. **The diff** — does it deliver what the story says, in domain language? 3. **The tests** — are the Gherkin scenarios from amigos covered? 4. **The ADRs touched, if any** — are constraints respected? 5. **The pipeline output** — is the build green? Approval is not *I have read this and the code looks fine*. Approval is *this is the change the brief described, and it does what amigos predicted*. The reviewer is not the gatekeeper of code style. Style is the linter's job. The reviewer is the chain's last reader before the trunk receives the change. ## The merge moment Merge to `main` happens when: * Pipeline is green. * Reviewer has approved. * Story's Gherkin scenarios have been verified by QA on the branch (pre-merge QA — Part 5). * The flag is in place; the change is invisible to users. After merge: * The pipeline runs against trunk. * The change is deployed to a staging environment within minutes. * The change is one switch away from production. ## Branches that live too long — the recovery Sometimes a branch grows. The team realises it is now four days old, three thousand lines, and merges are getting hard. The corpus pattern is to recover, not push through: 1. **Stop adding to the branch.** No more features on this branch. 2. **Identify the foundation pieces.** Schema additions, refactors, scaffolding. Cherry-pick or split into small PRs and land them. 3. **Wrap the rest behind the flag.** Land the user-visible work in small PRs after the foundation. 4. **Postmortem the branch.** What signal was missed two days ago that would have told the team to slice differently? The corpus rule: a branch over five days old is a postmortem candidate, not a merge candidate. [Part 3 — Feature Flags →](/volumes/iv-execution/3-feature-flags) --- --- url: /volumes/iv-execution/3-feature-flags.md --- part three · feature flags # Feature Flags > *Wrapping new behavior so rollback is one switch.* A feature flag is a runtime switch that wraps new behavior. With flags, rollback is one click — no redeploy, no rebuild. The team can ship code to production days before the feature is enabled. The team can enable to a pilot group before everyone. The team can disable on the first sign of trouble. This is the foundation that makes trunk-based development (Part 2) safe. ## The lifecycle Every flag has a lifecycle. The corpus is opinionated about each step. | Step | What happens | Owner | |---|---|---| | **Create** | Flag is registered in the platform with a name, description, default state | Developer + Tech Lead | | **Wire** | Code conditionally executes based on the flag | Developer | | **Test** | Both flag-on and flag-off paths are tested in staging | QA | | **Enable** | Flag is turned on for the target audience (pilot, percentage, all) | PO + Tech Lead | | **Stabilise** | Flag remains in code while behavior is observed | Tech Lead | | **Clean up** | Flag is removed; the new behavior becomes the only path | Developer | The last step is the one most teams skip. A flag that has been on for six months is no longer a flag — it is a legacy `if` statement. Cleanup is a story like any other. ## What flags are for Three jobs. 1. **Rollback without redeploy.** The most common reason. New behavior misbehaves; flag off; investigate calmly. 2. **Gradual enablement.** Pilot one customer; if good, enable 5%; if good, 25%; if good, all. 3. **A/B testing.** Compare two paths simultaneously. (Less common in the corpus pattern; we tend to predict and check rather than A/B.) ## Naming flags The same domain-language discipline as code (Part 1). Flag names use the brief's vocabulary. | Bad flag name | Good flag name | |---|---| | `feature_x` | `grading.hebrew-names` | | `new_ui` | `grading.flow-v2` | | `experiment_3` | `grading.keyboard-shortcuts` | The format is `{area}.{specific-thing}`. Areas group flags so the platform's UI is navigable. ## Wiring patterns The corpus uses a small wrapper. Every flag check goes through it. ```typescript // Bad: scattered inline checks if (process.env.FLAG_HEBREW === 'on') { ... } // Good: typed flag client import { flag } from '@/flags' if (flag('grading.hebrew-names', { user: gal })) { // new behavior } else { // old behavior } ``` The wrapper: * Is the only place that talks to the platform SDK. * Carries context (user, session) so flags can target. * Logs flag evaluations as observability events. * Defaults to off if the platform is unreachable. ## What both paths need A flag wraps two paths — old and new. Both must be tested. Both must be runnable in staging. The flag-on path is what the cycle ships. The flag-off path is the rollback target. The QA verification (Part 5) explicitly tests both. *I tested with the flag on* is half a verification. ## Targeting Flags can target by: * **User** — specific named users (the pilot). * **Account** — specific customer organisations. * **Percentage** — random N% of all traffic. * **Attribute** — any dimension the platform supports (region, plan, role). The targeting is part of the rollout plan (Part 7). It is named in advance, not invented during enablement. ## Cleanup as a Volume V act A flag that has been on at 100% for two cycles should be cleaned up. The corpus pattern: cleanup stories appear in the next cycle's slice as part of the *unfinished business* the model update surfaces. Failure to clean up flags produces: * Cognitive load — every reader of the code wonders if the flag is still meaningful. * Configuration drift — the platform has hundreds of stale flags; finding the active ones is hard. * Real bugs — when someone toggles a stale flag thinking it does something different. A flag-cleanup discipline is part of operational hygiene. The portfolio review (Volume V Part 9) reads flag count and age trends as a chain-health signal. ## When flags are wrong Some changes should not be flagged. The flag adds noise without value. * **Pure refactors.** No behavior change. No need to flag. * **Changes too tightly coupled to flag.** If the flag-on and flag-off paths diverge so much that the codebase has to maintain two implementations, the flag has become a fork. * **Schema changes.** A migration cannot be flagged at the data layer the same way as a UI change. Schema changes are managed differently — see [Part 8 — Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback). The Tech Lead decides what is flagged. The default is *flag user-facing behavior changes; do not flag pure refactors or schema work*. Deviations are recorded. [Part 4 — The CI/CD Pipeline →](/volumes/iv-execution/4-pipeline) --- --- url: /volumes/iv-execution/4-pipeline.md --- part four · the ci/cd pipeline # The CI/CD Pipeline > *Six stages, each catching a different chain level.* The pipeline is the chain's machinery for moving code from the developer's editor to production. The corpus pattern: the pipeline is structured into six stages, each of which catches a different *level* of chain mistake. Skipping a stage does not save time; it pushes the missed mistake further along, where it is more expensive. ## The six stages | Stage | Catches | Tools (typical) | |---|---|---| | **0 · Pre-commit** | Trivial mistakes before they enter the repo | Husky, lint-staged, prettier, eslint, secrets scan | | **1 · Build & lint** | Compilation errors, type errors, lint violations | tsc, eslint, vue-tsc, vite | | **2 · Unit & contract tests** | Logic errors at the function/module level | Vitest, contract test runners | | **3 · Integration & e2e** | Wiring errors across modules; Gherkin scenarios | Playwright, Cypress, supertest | | **4 · Visual regression** | Unintended UI changes against Figma baselines | Chromatic, Percy, Lost Pixel | | **5 · Security & dependency** | Known CVEs, leaked secrets, license issues | npm audit, semgrep, gitleaks | | **6 · Deploy & smoke** | Deployment-time errors; basic post-deploy health | The deploy script, smoke check | Six stages plus the pre-commit zero-stage. Seven gates. Each one has a specific job and a specific way to fail. ## Pre-commit (stage 0) Runs on the developer's machine before the commit lands. Catches: * Style violations (prettier, eslint). * Files that shouldn't be committed (secrets, large binaries). * Type errors at the file level (sometimes — full typecheck waits for stage 1). If pre-commit catches it, the commit doesn't enter the repo. This is the cheapest stage to catch anything. The corpus rule: **never bypass pre-commit hooks**. If they're slowing the team down, the hooks are wrong, not the discipline. ## Stage 1 — Build & lint Compiles the project. Type-checks. Lints across the whole codebase. Fast — under three minutes for a healthy project. A red stage 1 means the change is structurally broken. No further stages run. ## Stage 2 — Unit & contract tests Runs the unit test suite. Runs contract tests against any API the project exposes. Unit tests are scoped to a single function/module. Contract tests are the boundary tests — *given this input, the function/API returns this output*. A red stage 2 usually means logic is broken. The story's Gherkin scenarios should be in this stage if they are unit-test-shaped. ## Stage 3 — Integration & e2e Runs the broader test suite. Spins up dependencies in containers. Executes the Gherkin scenarios as end-to-end browser tests. This stage is the most expensive — sometimes 10–20 minutes for a healthy project. The corpus pattern: parallelise. A 20-minute serial run becomes a 4-minute parallel run. A red stage 3 means the wiring is wrong. Modules that work alone don't work together. This is where amigos pays back its time investment. ## Stage 4 — Visual regression Compares rendered UI against approved baselines. Failures surface as visual diffs that the Designer reviews. This stage catches the *I didn't realise that change moved that pixel* class of bug. It is also the stage that catches accessibility regressions, contrast changes, and RTL layout drift. The baselines come from Figma — the named states from the Designer's frames. Every named state has a baseline. New visual states require a new baseline approval. ## Stage 5 — Security & dependency Scans dependencies for known vulnerabilities. Scans the diff for secrets. Checks license compliance for new dependencies. A red stage 5 usually means *don't merge yet, talk to security*. Sometimes it means *the world has learned a CVE since yesterday and our dependency is now flagged*. ## Stage 6 — Deploy & smoke Deploys the build to the target environment. Runs a small post-deploy smoke check — *the homepage returns 200, the API returns 200, the version banner matches*. A red stage 6 in production triggers automatic rollback. The pipeline knows the previous artifact and switches back. ## Reading the pipeline Each stage's failure has a different chain meaning: | Stage red | Chain interpretation | |---|---| | 0–1 | A mechanical / hygiene gap. Cheapest fix. | | 2 | A logic gap. The unit test should have caught it; sometimes the test is the gap (not the code). | | 3 | A scope gap. The Gherkin or the wiring is wrong. Re-read the brief. | | 4 | A design gap. The Designer-baseline pair didn't anticipate this change. | | 5 | An external change in the world. Often nothing the team did. | | 6 | A pipeline / environment gap. Often infrastructure. | A team that reads pipeline failures by chain level builds a more reliable pipeline over time. A team that just reruns until it goes green builds a flaky one. ## Environments The corpus assumes three environments. * **Dev** — the developer's local. Hot reload, mocks where appropriate, fast. * **Staging** — production-shaped. Full pipeline runs here. The release gate (Part 6) verifies here. * **Production** — the real thing. Smoke-tested by stage 6, monitored by Volume V machinery. Each environment has a clear purpose. The corpus pattern: do not test in production unless the alternative was demonstrably impossible. [Part 5 — Testing Layers →](/volumes/iv-execution/5-testing) --- --- url: /volumes/iv-execution/5-testing.md --- part five · testing layers # Testing Layers > *Unit, contract, integration, visual regression — each layer for a different gap.* Testing is not a uniform activity. Different layers catch different mistakes. The corpus pattern is to use each layer for what it is good at and not lean on one layer to do the work of another. ## The layers | Layer | What it catches | Authored by | Frequency | |---|---|---|---| | **Unit** | Logic errors inside a function | Developer | Every story | | **Contract** | Boundary errors between caller and callee | Developer | Every API/service boundary | | **Integration** | Wiring errors across modules | Developer + QA | Every Epic | | **End-to-end (Gherkin)** | The whole-flow scenario | QA writes, Developer implements | Every story's amigos output | | **Visual regression** | Unintended UI changes | Designer + QA | Every UI change | | **Accessibility** | Keyboard, screen reader, contrast | Designer + QA | Every UI change | | **Performance / load** | SLO breaches under expected load | Tech Lead + QA | Per Epic when ility-relevant | | **Exploratory** | The unknown unknowns | QA | Pre-merge | ## Unit tests Smallest unit of test. Single function or module. Mocks out the world. Runs in milliseconds. The corpus rule: a story with no unit tests is a story whose logic was never written down twice. *Twice* is the discipline — once in the implementation, once in the test that proves it does what was named. Unit tests are *not* the place for Gherkin scenarios that span multiple modules. Those go in integration or end-to-end. ## Contract tests Boundary tests. *Given this caller sends this shape, the service returns this shape; given this caller sends a malformed shape, the service returns this error.* The corpus rule: every API the project exposes has at least one contract test per endpoint per documented response. The contract tests are derived from the API contract written in [Volume III Part 7](/volumes/iii-scope/7-sequence-schema-api). ## Integration tests Wire several modules together. Often spin up real dependencies — a real database, a real Redis, a stubbed third-party. Catch the wiring mistakes that no unit test sees. Integration tests are slower. The corpus pattern: write enough of them to cover the Epic's main flows; do not write so many that the pipeline becomes painful. ## End-to-end / Gherkin The Gherkin scenarios from amigos (Volume III Part 5) become e2e tests. Run in a browser-driver against the real frontend, real backend, real database. They are the slowest, most fragile, and most valuable. The corpus pattern: every story has at least the *required-for-prediction* Gherkin scenarios as e2e. Negative cases as e2e where they cross system boundaries. The Gherkin lives next to the story in source control; the test code is generated from or aligned with the Gherkin. ## Visual regression For UI work, the rendered output is compared against an approved baseline. Failures surface as image diffs. The Designer is the approver — *yes that is the intended change* or *no that is a regression*. The baselines come from Figma frames. Every named state has a baseline image. The Designer can update baselines when they intended the change. ## Accessibility tests Automated checks for contrast, ARIA, keyboard reachability. Manual checks for screen reader output and keyboard flow. Both run pre-merge for any UI change. The corpus rule: accessibility failures block merge unless explicitly accepted as known issue with a remediation date. *We'll fix it later* without a date is the chain failing the discipline. ## Performance / load For Epics where ility selection (Volume III Part 8) names performance as material, a load test runs against staging. The test is shaped against the expected load profile. The corpus pattern: load tests are *prediction-checked*, like everything else. *We expect the new endpoint to hold p95 under 200ms at 200 RPS* — checked. ## Exploratory testing The QA, with the brief and the journey map open, *uses the feature like the named person would*. Outside the scenarios. Looking for the moments that nobody named. The corpus pattern: every pre-merge QA includes 30+ minutes of exploratory testing on the major stories. The output is the QA report (next section). ## Pre-merge QA verification Before a PR can merge, the QA verifies on the branch: * Gherkin scenarios from amigos pass. * Both flag-on and flag-off paths work. * Edge cases the QA imagined during exploration. * Accessibility baseline holds. The verification is a checked artifact, not a Slack message. The QA writes a short report — what was tested, what was explored, what surprised. The report lives next to the PR. ## QA report The artifact at the end of pre-merge QA. ```text QA Report — PR #482 (GRD-142 Hebrew name support) QA: Mira Date: 2026-05-22 Tested (Gherkin): ✅ Hebrew name renders correctly on first load ✅ Mixed-form Hebrew name renders correctly ✅ Rare unicode form falls back gracefully ✅ Edit attempt is now disabled (story explicitly removes the workaround) Explored: - Tried 12 names with various unicode forms; all rendered - Tried with extreme name length (84 chars); rendered with truncation - Tried Hebrew name in queue search (passes; bonus discovery) - Tried with screen reader; name read correctly Surprises: - Unicode-fallback log line is duplicated when the same name is rendered twice on the same page. Filed GRD-148 (P3, cosmetic). Not tested: - Bulk export view (out of scope per brief) Accessibility: ✅ Contrast unchanged ✅ Keyboard nav unchanged ✅ Screen reader reads names correctly (NVDA, VoiceOver) Visual regression: ✅ One intended change accepted (queue row height +2px for RTL names) Pre-merge: APPROVED ``` The report is what the chain reads later — at signal reading, at postmortem, at retrospective. It is the QA's structured record of what they witnessed. ## Test maintenance Tests that are flaky, slow, or wrong are themselves chain debt. The corpus pattern: when a test is repeatedly failing for the wrong reasons, the test is fixed or deleted, not retried. A test suite is part of the codebase. It is reviewed, maintained, and pruned. A 4,000-test suite that nobody trusts is worse than a 400-test suite that is solid. [Part 6 — Release Gate →](/volumes/iv-execution/6-release-gate) --- --- url: /volumes/iv-execution/6-release-gate.md --- part six · release gate # Release Gate > *The named conditions the chain must satisfy before the flag can be enabled.* The release gate is not a meeting. It is a state. It is the moment the chain steps from *building* to *watching*, and it is gated by a checklist that is satisfied or not — no debate. ## The checklist ```text RELEASE GATE — Grading Flow v2 (cycle 17) Code [ ] Pipeline green on main, latest commit [ ] Visual regression baselines current [ ] Accessibility checks passing [ ] Security scan no high/critical findings Scope [ ] All required-for-prediction stories merged [ ] All Gherkin scenarios from amigos passing [ ] Pre-merge QA reports filed for each PR [ ] Story map shows release slice complete Operations [ ] Monitor / alert rules updated for new flow [ ] Runbook for new failure modes written and reviewed [ ] On-call rotation confirmed for next 48 hours [ ] Status page entry drafted for graceful failure communication Release machinery [ ] Feature flag created and tested in staging (both paths) [ ] Rollout plan named (pilot, percentage, full) [ ] Rollback procedure documented [ ] Migration plan signed off (if applicable) Comms [ ] CS handoff document written and shared [ ] Client release brief written and reviewed [ ] Help text and empty states reviewed for domain language [ ] Internal team has read the brief Prediction [ ] Baseline numbers captured pre-flag [ ] Check date in the calendar [ ] Check method instrumentation in place [ ] Owner of the check named (and available on the date) ``` The checklist is owned by the PO. The PO holds the gate. *Held* means the PO does not approve enablement until every item is checked, with the relevant owner named. ## Why a checklist Three reasons. 1. **It surfaces the gaps in advance.** Two days before the release, the team realises the runbook hasn't been written. Two days is enough to write it. Two minutes before enablement is not. 2. **It distributes ownership.** Each line has an owner. The PO is not personally writing the runbook; the on-call is. The PO is verifying that it exists. 3. **It produces the artifact for postmortem.** When something goes wrong, the postmortem reads the gate. *Was the failure mode in the runbook?* If yes, the runbook didn't catch it; structural fix elsewhere. If no, the runbook should have had it; structural fix in the runbook template. ## The gate is honest about scope The gate is satisfied for *this slice*. Not for *the whole product*. A walking skeleton release goes through the gate. A richness release goes through the gate. Each release has its own gate. The corpus pattern: even small features have gates. The gate scales with the release — a tiny copy fix has a tiny gate (pipeline green, story merged, no comms needed). A new flow has the full gate. ## Soft items vs hard items The gate has two kinds of items. * **Hard items** — pipeline green, scenarios passing, runbook exists. These are binary. They are or they aren't. * **Soft items** — runbook is *good*, brief is *clear*. These have judgment. They are signed off by name. Soft items are reviewed by their owner. The runbook is reviewed by an on-call who hasn't seen the feature before — *can you act on this at 3am?* — and signs *yes* or *not yet*. ## When an item won't satisfy Sometimes the gate cannot be satisfied. The honest options: 1. **Delay the release.** The most common right answer. The cycle was ambitious; the gate's missing item is real; ship next week. 2. **Reduce scope.** Drop the slice that requires the unmet item. Ship what is ready. 3. **Document a known limitation.** Some items are *we know this is incomplete; we are accepting it for now*. The acceptance is named, the remediation is dated, and the documentation goes to CS so they know. What the corpus does not allow: enabling the flag with an unmet item that has not been documented as accepted. That is the chain operating outside its own discipline, and it is the source of the postmortems that have to teach the same lesson twice. ## Who can hold the gate The PO holds the gate. The Tech Lead can pause it. The QA can pause it. The on-call who reads the runbook and finds it inadequate can pause it. Anyone in the trio can hold the gate; only the PO can release it. A gate that the PO releases without the trio's agreement is a gate that has lost its meaning. The corpus pattern: gate disagreements are surfaced and named, not papered over. ## Where the gate sits in time ```mermaid gantt dateFormat YYYY-MM-DD title Release Gate timing section Cycle Building (V III/IV) :a1, 2026-05-01, 14d Pre-merge QA :a2, after a1, 2d section Gate Gate review :gate, after a2, 1d Flag enable + V V Part 1 :crit, gate2, after gate, 2d ``` The gate review is the last act of Volume IV. The flag enable is the first act of Volume V. [Part 7 — Gradual Rollout →](/volumes/iv-execution/7-gradual-rollout) --- --- url: /volumes/iv-execution/7-gradual-rollout.md --- part seven · gradual rollout # Gradual Rollout > *Pilot, percentage ramp, full enablement.* Gradual rollout is the discipline of enabling a feature progressively, with observation between each step, instead of flipping a flag for everyone at once. The chain that does this catches problems while they are small. The chain that doesn't catches problems after they have been everyone's problem for an hour. ## The shape of a rollout The default sequence: ```mermaid flowchart LR A[Internal team only
day 0] --> B[1 named pilot user
day 1-2] B --> C[5 pilot users / 1 customer
day 2-4] C --> D[10% of all users
day 4-6] D --> E[50% of all users
day 6-9] E --> F[100%
day 9-14] ``` Each step has *exit criteria* — what must be true before moving to the next. | Step | Exit criteria | |---|---| | Internal | Internal team uses the feature for one full work session. No errors. No negative surprises. | | 1 user | Pilot user uses for 1–2 days. Observed by PO + Designer. No critical issues. | | 5 users / 1 customer | One full grading cycle for the customer. Support volume normal. SLO holds. | | 10% | Two days at 10%. Error rate within SLO. Adoption metric trending up. | | 50% | Two days at 50%. Same checks at scale. | | 100% | Holds for the full first 48 hours (Volume V Part 1). | The exit criteria are written *before* the rollout begins. They live in the release brief. Not invented during enablement. ## Why pilots first A pilot is a named person, observed in the field, using the feature for the activity it was designed for. Not a beta-test group. Not a focus session. The pilot is the smallest version of Volume V Part 2 — the prediction is checked at small scale. The pilot's value: * Surfaces the obvious miss before it is everyone's miss. * Builds the customer trust that the team is being careful. * Produces the first signal reading at low risk. A pilot of one named person who has been part of Discovery is better than a percentage rollout. Percentage rollouts have *anonymous* signal. Named pilots have *legible* signal — *Gal hit the workaround again at J6, three times yesterday* is a story; *0.3% of users encountered an error* is a metric. ## When percentage rollouts add value After the pilot. Percentage rollouts test scale, not behavior. They surface issues that only appear under load — flaky third-party integrations, hot-spot DB queries, cache stampedes. Percentage rollouts are observed via SLO dashboards, not field observation. The pilot already verified the human side; percentage verifies the system side. ## Targeting strategies The flag's targeting (Part 3) implements the rollout. Common patterns: * **By customer** — enables one customer at a time. Useful when customers have different shapes. * **By region** — enables one region at a time. Useful for geographic latency or regulatory differences. * **By percentage with sticky assignment** — the same user always gets the same path during the rollout. Important — flapping a user between flag-on and flag-off is the corpus's worst experience for the named person. * **By role** — enables for one role across all customers (e.g., graders only, not students). Sticky assignment is the default. The team should consciously choose otherwise. ## The release brief to the client Before the rollout begins, the PO writes a short release brief to the client. ```text RELEASE BRIEF — Grading Flow v2 Date: 2026-06-01 Owner: Alex (PO) For: [Client lead], [Customer ops] What's changing Hebrew name handling improvements in the grading flow. Specifically: graders no longer need to edit names manually in the spreadsheet workaround. What to expect - Pilot phase (next 5 days): 1 grader at the flagship campus. - 10% rollout (days 6-7): randomly selected graders. - 50% (days 8-9), 100% (day 10). What is not yet available - Bulk re-rendering of past grading reports (planned for cycle 18). If something goes wrong - Disable is one switch. We will tell you within 30 minutes. - Status page: https://status.acme.example Contact Alex (PO), Yossi (Tech Lead), on-call rotation: 200apps-grading ``` The release brief is shared before enablement, not after. Anxiety arrives when communication arrives late. ## The CS handoff The CS team learns about the change before the customer does. The handoff is a separate document. ```text CS HANDOFF — Grading Flow v2 What's new - Native Hebrew name support; the spreadsheet workaround is no longer needed. What might surface in tickets - Customers reporting that names look "different" — they look correct. - Customers asking about past reports (out of scope, see brief). - Edge case: rare unicode form falls back gracefully (see runbook). Likely questions Q: Will my old grading reports be updated? A: No, the change applies to grading from [date] forward. Bulk re-rendering is planned for the next cycle. Q: I see a warning icon on a name. What does that mean? A: A rare unicode form was found. Grade can proceed normally; the name renders with a fallback character. Engineering is notified. Escalation path L1 → L2 (QA + on-call dev) → L3 (Tech Lead + PO) Status page: https://status.acme.example ``` CS reading the handoff and asking questions before customers do is what makes the rollout calm. ## What can stop a rollout At any step, any of these stops the rollout: * The error rate exceeds SLO. * A pilot user reports a critical issue. * A support ticket pattern emerges that wasn't predicted. * The on-call is paged for the new flow. The flag is disabled. The team investigates. The rollout resumes when the issue is closed and the runbook is updated. The corpus pattern: stopping a rollout is normal. It is not failure. The team that has never paused a rollout is either a team with very simple features or a team that isn't watching closely. [Part 8 — Runbooks & Rollback →](/volumes/iv-execution/8-runbooks-rollback) --- --- url: /volumes/iv-execution/8-runbooks-rollback.md --- part eight · runbooks & rollback # Runbooks & Rollback > *Written before the incident, rehearsed in staging.* A runbook is a written procedure for a known operational situation. Authored before the situation occurs. Rehearsed in staging. Lives next to the service it covers. The on-call's only friend at 3am. ## What a runbook is ```text RUNBOOK — Grading flow: high error rate on /api/submissions Trigger Alert: submission_error_rate > 1% over 5 minutes Or: queue_length > 1000 Severity assessment (decide first) - Is data integrity at risk? → P0, follow this runbook - Is auth or security implicated? → P0, disable flag, escalate - Are users blocked from work? → P1 - Is degradation only in features beyond → P2 the cycle's prediction? Containment (first 5 minutes) 1. Check feature flag panel: https://flags.200apps.example/grading.flow-v2 If recently changed, REVERT to previous state. 2. Check deploy timeline: https://deploys.200apps.example/grading If recent deploy, ROLLBACK with: ./scripts/rollback grading 3. Check upstream LMS adapter health: https://status.lms-adapter.example If LMS adapter is down, expected behavior includes errors; escalate to LMS team. Diagnosis (after containment) 4. Check error log filter: `service:grading severity:error` Identify the most-frequent error class. 5. Check the dashboard panel: "grading-flow latency p95 by endpoint" 6. Trace one example request from the request ID in the error log: https://traces.200apps.example/?id=... Communication - 5 min: ping #incident-grading with severity + first action - 15 min: post status page entry if user-facing - 30 min: communicator updates internal channel Escalation P0: page Tech Lead immediately, then PO if user-impact >5 min P1: page Tech Lead if not resolved in 30 min P2: page Tech Lead if not resolved in 4 hours Known false positives - Spike during 09:00 weekdays (cohort starts grading) — wait until 09:15 before action Last reviewed: 2026-05-05 — Yossi Last rehearsed in staging: 2026-04-28 ``` A runbook is short, dry, and procedural. It tells the on-call exactly what to do. ## The properties of a runbook that holds * **Authored before the incident.** A runbook written during the incident is documentation. A runbook written before is a procedure. * **Rehearsed in staging.** The runbook's procedure is run, end to end, against a staged failure. If steps are missing or wrong, the rehearsal surfaces it. * **Read by an on-call who hasn't built the feature.** They are the chain's audience. If they cannot follow it, it doesn't work. * **Reviewed regularly.** Every postmortem (Volume V Part 4) reads the relevant runbook. Outdated runbooks are repaired or deleted. * **Lives next to the service.** Not in a separate wiki. In the code repository or a runbook-store that the on-call has on their phone. ## Four levels of rollback The corpus pattern: rollback has four levels, applied in this order. | Level | Mechanism | Time | When to use | |---|---|---|---| | **1 · Flag** | Toggle the feature flag off | Seconds | New behavior is misbehaving | | **2 · Deploy** | Roll back to the previous build | Minutes | Bug exists across both flag paths | | **3 · Migration** | Reverse the schema migration | Hours (if author allowed it) | Data shape change is the cause | | **4 · Data** | Correct affected rows | Hours+ | Data has been corrupted | Each level is more expensive and more disruptive than the previous. The runbook tells the on-call which level to attempt and in what order. The corpus's most important rollback discipline: **plan for the level you might need before you ship.** Migrations should be reversible by default. Data changes should be auditable. Flags should wrap behavior. *I didn't think we'd need to roll back* is not a postmortem-acceptable answer; that is what the runbook is for. ## When a runbook doesn't exist The on-call hits a situation the runbook doesn't cover. Three steps: 1. **Contain first.** Use the four levels above. Default to flag off if uncertain. 2. **Document during.** A real-time log of what was tried, what happened. Not for reading later — for the next person who hits this. 3. **Author after.** The postmortem produces a runbook entry for this situation, so it is covered next time. The corpus rule: every incident produces at least one runbook delta. *Either the existing runbook needed an update, or a new runbook needed to be written.* No incident leaves the runbook unchanged. ## Status page A status page is the communicator's instrument during an incident. The corpus pattern: * **Automatic** for known severities — when alerts fire above threshold, a status entry is posted automatically with a default message. * **Manual updates** during the incident — every 30 minutes minimum at P0/P1. * **Resolution post** — when the incident is closed, the status page shows it as resolved with a one-paragraph postmortem summary. A status page that is accurate and current builds trust. A status page that says *all systems operational* while customers are reporting issues destroys it. ## What rollback discipline produces for the chain The chain that takes rollback seriously ships more often. The chain that does not avoids shipping. The math: a deployable artifact with three rollback levels is one the team will release; an artifact with no rollback plan is one the team will sit on. [Part 9 — Observability →](/volumes/iv-execution/9-observability) --- --- url: /volumes/iv-execution/9-observability.md --- part nine · observability # Observability > *Logs, traces, metrics, events.* Observability is the property of a system that makes it possible to ask questions about its current state without having to deploy new code. The corpus pattern: observability is built *with the feature*, not after. By the time the gate is reached, the system is already legible. ## The four signal types | Signal | What it answers | Cost | Read by | |---|---|---|---| | **Logs** | What happened in this specific request? | High in volume; easy to author | Developer, on-call | | **Traces** | How did this request flow across services? | Moderate, depends on sampling | Developer, on-call | | **Metrics** | What is happening across many requests? | Low per-event; needs cardinality discipline | On-call, Tech Lead | | **Product analytics events** | What did the named person do? | Low; needs naming discipline | PO, data | Each signal has a different consumer and a different cost profile. The corpus uses all four; not as overlap, but as complement. ## Logs Structured. Always. JSON format with consistent fields. ```json { "ts": "2026-05-22T08:53:14.029Z", "service": "grading-api", "level": "info", "request_id": "req_a4f2c1", "user_id": "usr_2103", "endpoint": "GET /submissions/1234", "duration_ms": 187, "event": "submission.opened", "subject_id": "sub_1234", "domain_terms": ["submission", "grader"] } ``` Log fields are picked at design time, not at incident time. The fields appear in the brief as part of the observability section. The corpus pattern: **never log PII**. The grading flow logs `user_id`, not `name`. The privacy ility (Volume III Part 8) constrains the log shape. ## Traces Each request gets a trace ID. Spans nest within the trace. Sampling is intelligent — every error trace is captured; healthy traces are sampled at low rate. Traces show the request's path. *Did this request hit the LMS adapter? Yes. Did the LMS adapter respond in time? 124ms. Did the response normalise correctly? Yes.* A team that uses traces solves more incidents in less time. A team that doesn't reads logs sequentially and reconstructs the path mentally — slower, more error-prone. ## Metrics Counters and gauges and histograms. Aggregated. Cheap per data point. The corpus's standard set, per service: ```text http_request_duration_ms{endpoint, method, status} histogram http_requests_total{endpoint, method, status} counter http_requests_in_flight{endpoint} gauge job_duration_ms{queue, type, outcome} histogram job_queue_depth{queue} gauge db_query_duration_ms{query_class} histogram db_connections_active{} gauge flag_evaluations_total{flag, outcome} counter flag_evaluation_duration_ms{flag} histogram ``` Plus the Epic-specific metrics named in the brief. Cardinality is managed: labels are bounded. *Labels per username* is forbidden — that explodes cardinality. *Labels per endpoint* is fine. ## Product analytics events The named-action signals. The brief names them; the implementation emits them. ```text Brief: 'When Gal opens a submission, we want to know.' Event: submission.opened Properties: submission_id, grader_id, ts, duration_to_open_ms Brief: 'When Gal saves a grade, we want to know.' Event: submission.graded Properties: submission_id, grader_id, ts, score_count, total_time_ms ``` Events use domain language. They follow `subject.verb` format. They are versioned conservatively — a property name doesn't change without a migration story. These events feed Volume V's signal reading. The prediction *Gal completes a grading cycle in under 15 minutes* is checked against `submission.graded.total_time_ms`. The instrumentation is in place by release-gate time. ## Alerts Alerts are derived from metrics, not logs. They fire when an SLO threshold is crossed. ```yaml - alert: GradingApiHighErrorRate expr: sum(rate(http_requests_total{service="grading-api",status=~"5.."}[5m])) / sum(rate(http_requests_total{service="grading-api"}[5m])) > 0.01 for: 5m severity: P1 runbook: https://runbooks.200apps.example/grading-flow-high-error-rate message: "grading-api error rate above 1% for 5 minutes" ``` Each alert has a runbook link. An alert without a runbook is an alert that produces panic, not action. ## Dashboards Two kinds. * **Service dashboard** — for the on-call. Latency, error rate, saturation. Read at every sync. * **Feature dashboard** — for the PO. Adoption, completion, error encounter rate, prediction-relevant metrics. Read at every signal reading. The dashboards are versioned in code (or whatever the platform supports). They live next to the service. Changes to dashboards go through review like code. ## What gets instrumented The corpus rule: instrument what the brief named. If the prediction is checked against time-to-grade, the instrumentation that captures time-to-grade is part of the cycle. It is not deferred. Instrumentation that *isn't* needed for the prediction or for operations isn't added. The corpus is opinionated against premature observability — too many metrics make the right ones harder to find. ## The signal feeds Volume V The whole observability stack exists to make Volume V's check possible. The check date arrives. The PO opens the feature dashboard. The metric is there. The check is straightforward. A team that arrives at the check date and discovers the metric isn't instrumented has discovered, late, that the chain skipped a step. The corpus's discipline: instrument *with* the feature, not after. ## What this volume produces, in one sentence > *Volume IV carries the prediction from a signed brief through code, test, and release — with the language preserved, the trunk integrated, the gate held, the flag wrapped, the rollback rehearsed, and the instrumentation in place by the time the flag flips.* [Back to the volume cover →](/volumes/iv-execution/) · [Volume V — After We Build →](/volumes/v-after-we-build/) --- --- url: /volumes/v-after-we-build.md --- # After We Build > *The loop closing. Watching the first hours, checking the prediction against reality, classifying bugs by which level of the chain produced them, learning from incidents, writing the model update, growing the team, reading the portfolio — and starting the next cycle from a less wrong place than this one.* This volume describes the **Reflection** phase — the work between the live feature (Execution, Volume IV) and the next cycle's discovery (back to Volume II). The Prediction that was named in Volume II, kept alive through Scope in Volume III, and made measurable in Volume IV is finally checked here. The chain closes — and reopens — at the model update. [Continue to the introduction →](/volumes/v-after-we-build/intro) --- --- url: /volumes/v-after-we-build/intro.md --- After we build — volume V # Introduction Volume IV ended with the flag enabled and the feature live. The team now faces the only question that matters: *was this actually the thing we thought we were building?* Every previous volume produced a claim. Volume II's brief witnessed a problem and predicted a change. Volume III turned that prediction into Epics, stories, scenarios. Volume IV ran the prediction through code and into production. Now reality answers. The forty-seven minutes Gal spent grading either fell to ten or they didn't. The brief was either a model that survived contact with the world, or it was a description of something that turned out to be different. This volume is what the team does in the period when reality is answering. Not the celebration. Not the postmortem alone. The honest, sometimes uncomfortable, comparison between what was predicted and what happened — at every level, from the initiative bet down to the individual scenario. And then the step most teams skip: writing the model update so what was learned survives the conversation. *A cycle is not done when the feature ships. It is done when the prediction has been checked, the gaps named, the model updated, and the next cycle inherits a sharper version of the understanding. The question that runs through every part of this volume:* **where did meaning break — and did we fix it permanently?** The team that does this consistently is not the team that gets things right more often. It is the team that gets things slightly less wrong each cycle than the last — and the difference compounds. ## The shape of this volume Ten parts. The first six close the current cycle. The last four address what every previous volume left in shadow — the team itself, the portfolio, and how to start. * **The First 48 Hours** — what to watch, what to act on, what to let settle. * **Signal and the Prediction** — running the check. The four outcomes. Why "not checked" is the only one with no value. * **Bugs and Their Roots** — the bug taxonomy. Chain-aware root causes that trace every defect to its level. * **Incidents and Postmortems** — contain before diagnose. Postmortems that produce structural changes, not feelings. * **The Retrospective** — three questions. One change. Compounding rather than listing. * **The Model Update** — the step most teams skip. Where learning survives the conversation. * **The Ongoing Relationship** — support levels and escalation. The SLA as operational contract. Helpdesk metrics. Client cadence. Where the chain meets the people who pay for it. * **The Team** — onboarding, T-shaped people, small teams, psychological safety, what happens when someone leaves. * **The Portfolio** — the view across features and products. DORA metrics. Technical debt as chain gaps. VRI. When to stop. * **Adoption** — how to start. Which practice first. The first cycle. What resistance looks like. What maturity feels like. In the operational framework, this volume describes the **Reflection** phase — the work between the live feature (Execution, Volume IV) and the next cycle's discovery (back to Volume II). The Prediction that was named in Volume II, kept alive through Scope in Volume III, and made measurable in Volume IV is finally checked here. The chain closes — and reopens — at the model update. These are not sections. They are the steps required for the system to learn. [Part 1 — The First 48 Hours →](/volumes/v-after-we-build/1-first-48-hours) --- --- url: /volumes/v-after-we-build/1-first-48-hours.md --- part one · the first 48 hours # The First 48 Hours > *The period between the flag being enabled and the first honest picture. What to watch, what to act on, and what to let settle.* **On-call rotation** is active for the 48 hours after the flag enables — that was a release-gate condition. **Watching** is loose during the first hour, sharper after, then settles into normal-cadence dashboard checks. **No new ceremony** — this is a heightened state of normal flow. This is where meaning meets the world for the first time. The flag is enabled. The feature is live. Volume IV's machinery is now the watching apparatus — runbooks armed, SLOs baselined, prediction "before" numbers captured. The first 48 hours are the period when the team has the most attention and the least data. The instinct is to act on every signal. The discipline is not about reacting fast. It is about not reacting incorrectly. Acting early is not a sign of control. Acting correctly is. Knowing which signals warrant action and which need time to stabilise is the difference between a team that contains problems and a team that creates new ones. ## What to watch The monitoring dashboards are the primary source of truth. Not support tickets — dashboards. Support tickets lag reality by hours. The specific SLIs defined in the ADRs are what the team watches: error rates, latency percentiles, queue depths. The SLO thresholds trigger action. The leading signals from Volume IV — adoption, completion, error encounter rate — tell the early story. The first hour is the noisiest. People click things in unexpected orders, submit forms twice, navigate away mid-flow. Some of this produces errors that are not bugs — they are the normal shape of first contact. The question is not whether errors are occurring. It is whether the error rate is above the SLO threshold and trending up. ## When to act Three conditions warrant immediate action: **SLO threshold crossed for more than 5 minutes** — open the runbook, start from step one. **Any data integrity concern** — disable the flag immediately, investigate in staging. **Any security-relevant behaviour** — disable the flag, full stop. Everything else is logged, prioritised using the bug taxonomy, and addressed in normal flow. By hour 48, the noisy first-contact patterns have settled. The team has a first honest picture — not the prediction check yet, but the data the prediction check will draw from. ### Enough to know the feature is live and stable. Dashboards are within SLO. No P0 incidents open. The "before" baseline is captured. Early usage patterns are visible. [Part 2 — Signal and the Prediction →](/volumes/v-after-we-build/2-signal-and-the-prediction) --- --- url: /volumes/v-after-we-build/2-signal-and-the-prediction.md --- part two · signal and the prediction # Signal and the Prediction > *The check that was promised before the cycle ran. What the four possible outcomes mean — and why all of them are valuable except one.* **The check session** happens on the date named in the brief — a scheduled commitment. **The signal reading** is written immediately after, short and factual. **The retrospective** uses both as input but happens later. This is where meaning is tested against reality. The prediction was written before the cycle ran. The check date arrives. Someone runs the check — the specific measurement named in the brief, not a survey, not an impression. ## Running the check The check is an observation, not a report. If the prediction was *"Gal will complete the grading cycle in under 15 minutes,"* the check is watching Gal grade a real exam — not asking her how long she thinks it takes. The check method mirrors the discovery method: witnessed, not described. You bring the baseline numbers, the prediction target, and an open mind. You are not there to confirm success. You are there to correct the model. The purpose of the check is not validation — it is model correction. A check that confirms the prediction updates the model with confidence. A check that contradicts it updates the model with direction. > *Only a check that never happened leaves the model exactly as wrong as it was. A prediction that was not checked is indistinguishable from a guess.* ## The four outcomes | Outcome | What it means | What you do next | |---|---|---| | **Right** | The prediction was accurate. The model was correct. | Document it. Name what specifically was right so it carries forward. | | **Too conservative** | The change was bigger than predicted. | Understand why the model underestimated. Still a gap — in calibration. | | **Wrong** | The prediction did not come true. | The most valuable outcome. Name the gap specifically. Feeds the next brief. | | **Not checked** | Nobody ran the check. The date passed. | The only outcome with no value. The model cannot update. The cycle ran blind. | The check that confirms what the team hoped for is comfortable. The check that surfaces a gap is uncomfortable. Both are equally valuable. Only the check that didn't happen is worthless. ## Writing the signal reading After the check, a **signal reading** is written — a short document next to the Feature Brief in Confluence: what was predicted, what was measured, the gap, what the gap tells us. Not a retrospective — facts and first interpretation. The retrospective comes later. A useful signal reading has five lines and no flourish. It is read by people who weren't in the check session. It is the input to the model update. ```text Prediction: Gal completes the grading cycle in under 15 minutes. Baseline: 47 minutes (mean, n=12, captured pre-flag). Target: <15 minutes. Measured: 11 minutes 20 seconds (mean, n=8, captured weeks 1-3 post-flag). Gap: Better than predicted. Investigate why — likely the new keyboard shortcut absorbed more time than the deep-link navigation we built for. ``` ### Enough to know whether the model held. The check ran on the named date. The signal reading is written and lives next to the brief. The team has an honest answer. [Part 3 — Bugs and Their Roots →](/volumes/v-after-we-build/3-bugs-and-their-roots) --- --- url: /volumes/v-after-we-build/3-bugs-and-their-roots.md --- part three · bugs and their roots # Bugs and Their Roots > *The bug taxonomy. Chain-aware root causes that trace every defect to its level.* **Daily triage** runs for the first week after a release, then settles to twice weekly. **Bug filing** never stops — anything unexpected gets filed, classified, and routed. **Root cause notes** are written on bugs that are not surface-level mechanical fixes. A bug is a witnessed gap between what the chain said would happen and what is happening. The taxonomy is the team's shared language for what kind of gap, where in the chain it originated, and how urgently it needs to be repaired. ## Six dimensions of every bug Every bug filed has six fields. Less and the triage stalls. More and it becomes a paper exercise. | Dimension | Question it answers | Example values | |---|---|---| | **Severity** | How much harm is it causing now? | P0 / P1 / P2 / P3 | | **Surface** | Where does the person encounter it? | Page / flow / API / job | | **Class** | What kind of break is it? | Functional / performance / data / security / copy / accessibility | | **Chain level** | Which level of the chain produced it? | Strategy / Discovery / Scope / Execution / Operation | | **Reach** | How many people meet it? | One / a segment / everyone | | **Reversibility** | Can the harm be undone? | Reversible / partial / irreversible | The **chain level** is the dimension teams skip. Without it, every bug looks like a code defect. With it, the team learns that more than half of "bugs" trace to a brief that was never witnessed, a story that was missing a state, or a prediction that was never named. ## The chain-aware classification ```mermaid flowchart LR classDef l1 fill:#fbe9d6,stroke:#be641e,color:#7a3c0c classDef l2 fill:#fde8e8,stroke:#a83d3d,color:#5a1d1d classDef l3 fill:#fff5d0,stroke:#a08000,color:#5a4500 classDef l4 fill:#e6f3eb,stroke:#3a7a4d,color:#1f4a2c classDef l5 fill:#e8eef9,stroke:#2a4a8a,color:#142a55 S[Defect surfaces in production] S --> L1[Strategy
Wrong bet, wrong portfolio]:::l1 S --> L2[Discovery
Problem not witnessed]:::l2 S --> L3[Scope
Story missing or wrong]:::l3 S --> L4[Execution
Code, test, pipeline]:::l4 S --> L5[Operation
Support, ongoing]:::l5 ``` Same defect, five different fixes. A "the form errors on Hebrew names" bug at **Execution** level is a regex fix. The same observation at **Discovery** level is a brief that didn't witness Hebrew speakers and a feature that needs reshaping. The team that classifies the bug at the right level pays for the fix once. ## Triage cadence Daily for the first week post-release. Twice-weekly afterward. Triage is short — fifteen minutes — and produces three things: a severity, a chain level, and an owner. The triage does not solve. It routes. The PO sees the chain-level distribution monthly. Recurring concentration in one level is a structural signal. *Most defects keep tracing to Discovery* is a finding about the team's discovery practice, not about the QA team. ## What gets fixed, what gets tracked, what gets killed * **Fixed now** — P0 and P1 anywhere; P2 if reach is large; any irreversible-class bug. * **Tracked** — P2/P3 with limited reach; chain-level Strategy bugs that need a portfolio decision before code. * **Killed** — bugs that document the system working as designed. They become docs, not tickets. A bug that has been open six months is not a bug. It is a decision that has not been made. Move it. ### Enough to know what the chain produced. Every open defect has six dimensions, including chain level. Triage cadence has run twice. The PO has the level distribution. [Part 4 — Incidents and Postmortems →](/volumes/v-after-we-build/4-incidents-postmortems) --- --- url: /volumes/v-after-we-build/4-incidents-postmortems.md --- part four · incidents and postmortems # Incidents and Postmortems > *Contain before diagnose. Postmortems that produce structural changes, not feelings.* **Incident** begins when a runbook condition fires or an SLO is breached past tolerance. **Containment** is the first action — diagnosis comes after. **Postmortem** is scheduled within five working days, regardless of severity. **Action items** are owned, dated, and tracked alongside stories. An incident is what happens when reality goes more wrong than the chain anticipated. The point of incident discipline is not heroism. It is to make the same incident less likely the next cycle, by trading the slow expensive way of finding gaps (live customers) for the fast cheap way (chain artifacts). ## Contain before diagnose The first action in an incident is not figuring out why. It is to stop the harm from growing. Four levers, in order of preference: 1. **Flag off** — the new behavior is wrapped, the switch is one click. No deploy. No code. 2. **Roll deploy back** — last known good. CI keeps the artifact for exactly this reason. 3. **Migration rollback** — only if the migration was authored to be reversible. If it wasn't, the incident is bigger than this incident. 4. **Data correction** — last resort. Always after the bleeding has stopped, never during. Diagnosis happens after containment. Trying to diagnose during containment slows containment. The runbook tells the on-call which lever to pull and which to skip. ## Roles during the incident Three roles, even on a small team. They can be held by the same person on a one-person on-call but they are still distinct hats. * **Incident commander** — decides what gets done next. Holds the timeline. * **Communicator** — runs the status page, the client comms, the internal channel. Updates every 30 minutes minimum. * **Investigator** — looks at the data. Reports findings to the commander. The commander does not investigate. The investigator does not communicate. The communicator does not decide. ## Escalation and de-escalation Escalation is information flow, not blame flow. The rule: **no surprises**. If leadership will hear about this incident from anywhere other than the commander, they hear about it from the commander first. | Severity | Who hears in 5 minutes | Who hears in 30 minutes | Status page | |---|---|---|---| | P0 — production down | On-call group, Tech Lead, PO | Leadership, affected clients | Automatic | | P1 — partial degradation | On-call group, Tech Lead | PO, leadership at next sync | Manual update | | P2 — single-feature impact | On-call group | PO at next standup | Internal note only | | P3 — annoyance | On-call group at next standup | — | — | De-escalation is as deliberate as escalation. The incident is not over until the commander stands the team down, the status page is updated to *resolved*, the timeline is archived, and someone has checked on the people who took the page. The cost of an unannounced de-escalation is a team that stays anxious into the next cycle. ## The postmortem Within five working days. Blameless in tone. Structural in output. The postmortem asks, in order: 1. **Timeline** — what happened, when, who knew. Drawn from the commander's notes. 2. **Detection** — when did the system know? When did a person know? Difference is the detection gap. 3. **Containment** — how did we stop the bleeding? Were the runbooks adequate? 4. **Root cause** — five whys, but with **chain levels** as the answer space. Not *why did the developer not see this* but *why did the chain not catch it before this developer*. 5. **What was missed** — which level had the right opportunity to prevent this incident, and didn't? 6. **Structural fix** — owned, dated, tracked. Not "we'll be more careful." A change to a checklist, a runbook, a test, a brief template, a CI step. A postmortem that produces an action item with no owner and no date is not a postmortem. It is a feeling that was written down. ## What goes back into the chain Every postmortem produces at least one chain-artifact change. Examples that hold: * A runbook gains a new condition. * The release-gate checklist gains a new item. * The Feature Brief template gains a new question that would have caught this. * The CI pipeline gains a new check. * A new ADR records the constraint that the incident revealed. The model update absorbs the rest. Without the model update, the postmortem becomes a memorial. ### Enough to know the chain learned. Status page resolved. Timeline archived. Postmortem complete with at least one structural change owned and dated. The runbook or checklist that was missing now exists. [Part 5 — The Retrospective →](/volumes/v-after-we-build/5-retrospective) --- --- url: /volumes/v-after-we-build/5-retrospective.md --- part five · the retrospective # The Retrospective > *Three questions. One change. Compounding rather than listing.* **Held once per cycle**, after the signal reading is written and the first postmortems (if any) are filed. **Sixty minutes maximum.** **Outputs one change**, not a list. A retrospective is not a venting session. It is the team's regular act of tightening the chain by exactly one notch. The discipline is to compound — each cycle's retrospective change still in effect when the next cycle's runs — instead of listing — eight things to improve, none owned, all forgotten by Tuesday. ## Three questions Same three, every cycle. The repetition is the point. 1. **What carried?** What did we do this cycle that we want to keep doing? Specific. *We held amigos before code began on every story* counts. *Communication was good* does not. 2. **What broke?** Where did meaning leak between phases? The brief was unclear, the ADR was missed, the prediction was forgotten, the postmortem produced a feeling instead of a fix. Concrete. 3. **What changes?** One change. Owned by name. Dated. Testable. That last constraint is the whole game. Lists of improvements compound to nothing. One owned, dated, testable change compounds. ## What "testable change" means A change is testable if a person who wasn't in the retro can, by looking at the chain artifacts in the next cycle, see whether it happened. | Not testable | Testable | |---|---| | Communicate better with the client | Send the weekly client update every Friday by 4pm, written by the PO, before the team's Friday wrap | | Get amigos done earlier | Amigos for every story is scheduled within 24 hours of the story being pulled | | Clean up the bugs | The chain-level distribution of open bugs is reviewed in the Wednesday triage and discussed in monthly portfolio review | | Write better briefs | The brief template now has an "I have witnessed this" line that the PO signs | ## How to run sixty minutes Five minutes — read the signal reading aloud. Anchors the room in what reality answered. Twenty minutes — *what carried* and *what broke*. Round-robin so everyone speaks once before anyone speaks twice. Notes go to a single shared doc. Twenty minutes — propose the change. Name the owner. Set the date. Say what would prove it happened. Five minutes — write the change down where it will be seen. Pinned in the team channel, added to the chain-artifact list, linked from the next cycle's kickoff. Ten minutes — buffer for the conversation that always wants to keep going. Then it ends. ## What the retrospective is not Not the postmortem. The postmortem is incident-specific and produces structural fixes to runbooks and brief templates. The retrospective is cycle-specific and produces process changes to how the team runs. Not the model update. The retrospective produces *one change to how the team operates*. The model update writes down *what the team learned about the world*. Different artifacts, different audiences, different lifetimes. Not therapy. The retro is short, dry, and outputs an artifact. If the team needs the conversation about feelings, that is a separate conversation in a separate room. ### Enough to know one thing is moving. One change is named, owned, dated, testable. It is visible in the team's working space. The change from the previous retrospective is still in effect — or has been deliberately retired with a note explaining why. [Part 6 — The Model Update →](/volumes/v-after-we-build/6-model-update) --- --- url: /volumes/v-after-we-build/6-model-update.md --- part six · the model update # The Model Update > *The step most teams skip. Where learning survives the conversation.* **Held immediately after the retrospective.** **Owned by the PO.** **Output is a written change to a chain artifact** — the model the team carries into the next cycle. **Visible to anyone the model affects** — usually the whole trio plus leadership. The signal reading recorded what happened. The retrospective recorded what the team is changing about how it works. The model update records what the team now believes about the world that it didn't believe before, and writes that belief into the artifact that will shape the next cycle. This is the step most teams skip. The conversation happens. Insights are spoken. People nod. The next cycle begins, and somewhere — in a brief, in a story, in an estimate — someone makes the same wrong assumption that was named two weeks ago. The reason is not that the team didn't listen. The reason is that nothing was written down where the next cycle would see it. ## Four moves The model update is mechanical. Four moves, in order. ### 1. Close the assumptions that were witnessed Volume II briefs name *assumptions: what we believe but haven't witnessed*. After this cycle, some of those assumptions have either been confirmed (witnessed in the field), contradicted (witnessed and wrong), or remain unwitnessed. Close the ones that have moved. Mark them in the brief that owns them. ```text Assumption (Volume II brief, Initiative #INV-204): "Graders prefer keyboard shortcuts over deep links." Status (post-cycle): CONFIRMED Evidence: 8 of 8 graders observed using ⌘+Enter to advance. Deep links were used by 1 grader once during the cycle. Closed: 2026-05-05 ``` ### 2. Add the assumptions you didn't have The cycle surfaced things the brief did not predict. Some of these are leftover *not witnessed* items. Some are new — facets of the world the team had not considered. Add them, with their status. The next brief that touches this initiative inherits them. ```text New assumption (added 2026-05-05): "Graders work in batches of five exams, then take a 2–3 minute break." Status: NOT WITNESSED at decision time, OBSERVED during check Implication: Anything that interrupts a batch produces friction out of proportion to its size. ``` ### 3. Append the signal reading to the brief Not a copy. A link. The brief now points at the check that was promised before the cycle ran. Anyone who reads the brief next cycle reads the result alongside the prediction. ### 4. Sharpen the open questions Every brief carries open questions that need to resolve before the next phase. The cycle answered some of them — strike them. It refined others — rewrite them. It surfaced new ones — add them. The set of open questions in the brief is now the agenda for the next discovery. ## What gets updated, beyond the brief The model lives in more places than one document. * **Templates** — if the cycle showed that the brief template was missing a question, add the question. Future briefs inherit. * **Checklists** — release gate, DoR, postmortem — gain new items if the cycle surfaced a chain-level gap. * **Domain glossary** — new terms learned from the field go into the project's domain glossary so the next story uses the same word. * **Persona notes** — Dina's actual day, observed, replaces Dina's described day. ## What separates a model update from a wiki entry A wiki entry is information. A model update is a change to an artifact that the next cycle will use without anyone remembering to look. The test: if the PO who wrote the update were hit by a bus, would the next cycle still inherit the learning? If yes — model update. If no — wiki entry. The corpus is built on the assumption that the next cycle will not remember to look. Everything important must be where it is needed, when it is needed. ## Why this is the step most teams skip Three reasons, all real, none acceptable. 1. **It feels redundant** — we just talked about it in the retro. The talk does not survive. The artifact does. 2. **It is unrewarding in the moment** — there is no audience, no celebration. The reward is paid out next cycle, by the gap that doesn't open. 3. **No one owns it by default** — and that is exactly why the corpus assigns it to the PO. Without an owner, it is no one's job, which means it doesn't happen. The team that does this consistently is the team whose third cycle is meaningfully better than its first. The team that skips it has done six cycles' worth of work, and is still running on its first cycle's model. ### Enough to know learning survived. Assumptions in the brief are closed or annotated. New assumptions are recorded. Signal reading is linked from the brief. Open questions are sharpened. At least one template, checklist, or glossary file changed. [Part 7 — The Ongoing Relationship →](/volumes/v-after-we-build/7-ongoing-relationship) --- --- url: /volumes/v-after-we-build/7-ongoing-relationship.md --- part seven · the ongoing relationship # The Ongoing Relationship > *Support levels and escalation. The SLA as operational contract. Helpdesk metrics. Client cadence. Where the chain meets the people who pay for it.* **Support runs continuously**, three levels. **Weekly client update** every Friday — same time, same shape, written. **Bi-weekly sync** every other week — signal readings, decisions, scope changes. **Quarterly portfolio review** — SLA performance, VRI trends, root-cause patterns. A feature shipped is not a relationship. A relationship is what happens between cycles, every week, around the work. This part of the volume is about the parts of the chain that have no kickoff and no end — the cadence that holds when the project room is empty. ## Three levels of support Support is layered so that the chain stays clear about who handles what — and so that signals from the field reach the team that can do something about them. | Level | Who | Resolves | Escalates when | |---|---|---|---| | **L1** | CS Lead, frontline | Configuration, account, how-to, known-issue lookup | Issue is reproducible and not in the known-issues doc | | **L2** | QA + on-call developer | Reproducible defects, environment issues, data corrections | Fix requires code or schema change, or is a P0/P1 incident | | **L3** | Tech Lead + PO | Code-level defects, structural issues, scope decisions | Repeated pattern triggers a chain-level review | The escalation path is in the runbook. CS does not page L3 directly. L3 hears about L1 patterns through L2's weekly summary. ## Support-to-bug pipeline Every L1 ticket is candidate evidence for the bug taxonomy. The PO reviews CS volumes weekly, asking three questions: 1. **Is this a single occurrence or a pattern?** Patterns escalate to L2 immediately. 2. **What chain level does this trace to?** Disproportionate weight in any one level is a structural signal. 3. **Is the help text the bug?** A common L1 question often means the content design — not the code — is the gap. The output is a small number of new bugs filed each week, classified by chain level. CS volume that *isn't* producing bugs is a sign of an over-staffed or under-listened-to support team. ## SLA — the operational contract The SLA is what the team has promised the client and is willing to be measured against. Four dimensions, every contract: * **Availability** — uptime, with the maintenance window written. * **Response time** — how long until L1 acknowledges. By severity. * **Resolution time** — how long until L1 resolves, or escalates with a target. By severity. * **Data integrity** — the team's promise about the data the client trusts to it. The SLA is not a marketing document. It is the threshold past which someone is paged, and the conversation that opens with the client when it is breached. ### The breach protocol The SLA is breached the moment a threshold is crossed, not the moment someone notices. The protocol: 1. **Early warning** — the dashboard shows the threshold approaching. A communicator (PO or CS Lead) reaches out *before* breach. *We are seeing X. We are doing Y. We will tell you Z by W.* 2. **Contain** — same containment levers as Volume V Part 4. Flag, deploy rollback, migration rollback, data correction. 3. **Communicate** — every 30 minutes minimum during a P0/P1 SLA breach. Status page is updated. 4. **Resolve** — the breach is over when the SLO is back inside threshold *and* the client has been told it is over. 5. **Postmortem** — same week. SLA breaches are always P-level enough to warrant the structural-fix discipline. ### The SLA review Quarterly, with the client. Three questions: * *Did we meet the SLA?* Numbers, not impressions. * *Where did we approach without breaching?* Leading indicators. * *Are the categories still right?* The thresholds were written against a model of the world that may have moved. An SLA reviewed quarterly stays a contract. An SLA never reviewed becomes a souvenir. ## Helpdesk metrics Tracked monthly. Reviewed in the bi-weekly sync. | Metric | What it tells you | What "wrong" looks like | |---|---|---| | **First Response Time (FRT)** | Whether L1 is keeping pace | FRT trending up — L1 is overloaded or under-tooled | | **Resolution time** | Whether problems are getting fixed or routed | Resolution >> FRT — L1 is queueing instead of solving | | **Escalation rate** | Whether the right work is reaching L2/L3 | Very low — L1 is over-resolving and missing patterns. Very high — known-issues doc is stale | | **Categories** | Where pain concentrates | One category >40% — that's the next slice | | **Ticket-to-bug conversion** | How well the support pipeline feeds the chain | Zero in 30 days — the pipeline is closed | | **Satisfaction (CSAT)** | The relationship at the ticket level | Low CSAT *with* good resolution time — tone, not speed, is the gap | ## Client cadence The cadence is part of the work, not extra to it. The cadence is what makes the work answerable. ### Weekly client update — written Every Friday. Same time, same shape. Three sections: **what shipped**, **what is in progress**, **what is blocked**. Written by the PO, before the team's Friday wrap, in under 200 words. The discipline is consistency, not eloquence. A client who reads ten weekly updates that look the same has a model of the team. A client who reads ten weekly updates that look different has anxiety. ### Bi-weekly sync — spoken Forty-five minutes. Signal readings (any from the last two weeks), roadmap (any changes), scope decisions (any), CS patterns (any). The PO drives. The Tech Lead is present. The client speaks last. The agenda is fixed. The sync that becomes a discussion of feature requests is a sync that has lost its job. Feature requests go into the backlog through the backlog process. The sync is for state. ### Quarterly portfolio review The longer view. The PO and the Tech Lead sit with leadership and (where the relationship supports it) the client. Three artifacts on the table: 1. **SLA performance** — did we meet, where did we approach, are categories right. 2. **VRI trends** — value-to-rework. Is the chain producing more of what was paid for, or less? 3. **Root-cause patterns** — chain-level distribution of bugs and incidents. What level keeps producing the misses? The review's output is a portfolio decision — fund, continue, or kill, and what to invest in chain repair. ### Enough to know the relationship is held. Three support levels are staffed and routing. SLA is current and reviewed within the last 90 days. Weekly update has been sent on schedule. Bi-weekly sync has run. CS-to-bug pipeline produced at least one bug last month. [Part 8 — The Team →](/volumes/v-after-we-build/8-team) --- --- url: /volumes/v-after-we-build/8-team.md --- part eight · the team # The Team > *Onboarding, T-shaped people, small teams, psychological safety, what happens when someone leaves.* **Onboarding** is a defined cycle, not an orientation week. **One-to-ones** are weekly, fifteen minutes, owned by the manager. **Capacity planning** is monthly. **Hiring** is governed by chain fit, not individual brilliance. The chain works because people work it. This part of the volume is about the human infrastructure — what makes the team able to run a cycle, keep running it, and stay alive while doing so. ## Onboarding to the chain A new person on a chain-running team does one full cycle in shadow. Not a week of orientation. A cycle. | Cycle phase | What the new person does | |---|---| | Week 1 | Reads the corpus front to back. Pairs with the PO on a Discovery session. Joins amigos. | | Weeks 2–3 | Pulls one small story alongside a developer. Writes their first prediction with the PO's review. | | Week 4 | Owns a story end to end. Files the first bug they author. Writes a model-update line. | | Week 6 | Has been part of one full cycle. Joins their first retro as a participant, not a guest. | The onboarding is over when the new person can read the chain artifacts and know what is missing without being told. ## T-shaped development A T-shaped person is deep in one craft, working in two or three adjacent ones. Deep enough to ship the deep one alone. Wide enough to talk to the people on either side. In a small team, this is not optional. A developer who can't read a brief slows the trio. A PO who can't read a sequence diagram pushes architecture decisions to a meeting that should have been a comment. A QA who has never opened the codebase will write Gherkin that's mechanically correct and structurally meaningless. The corpus is meant to expand the horizontal of the T. Each volume is written so that a non-specialist in the volume's craft can still read it and act on it. ## Small team adaptation A team of three runs the same chain as a team of nine. The roles do not disappear. They combine. * **PO + Designer** — common combination. The PO does feature briefs and journeys; brings in design help on visual systems and review. * **Developer + Tech Lead** — the senior dev wears the lead hat in ADRs and pipeline decisions. * **PO + QA** — the PO writes amigos themselves; QA review is contracted out for high-risk releases. The hat is conscious. *I am writing this brief as PO* and *I am reviewing it as QA* are different stances even when they are the same person. The chain holds because the artifacts are stance-shaped, not person-shaped. What does *not* combine: **incident commander, communicator, investigator**. Even on a one-person on-call, the hats switch in time. *Right now I am the commander. I will not investigate for the next ten minutes.* ## Psychological safety Safety in this corpus is not about feelings. It is about whether the chain can hear what it needs to hear. The chain hears badly when: * People bring problems and leave with blame. * Postmortems trace incidents to individuals instead of levels. * Predictions are scored as personal performance instead of model accuracy. * Silence in a retro is treated as agreement. The chain hears well when: * Defects are traced to chain levels first, individuals last. * *Wrong* predictions are explicitly the most valuable outcome short of *not checked*. * The team's manager reads the postmortem and asks *what changed in the brief template*, not *who missed this*. * Silence is treated as a system signal — what is the system not letting people say? This is structural. A team that says it is safe and produces postmortems with no structural fix is not safe. A team that produces structural fixes from every incident is safe — whether or not anyone says the word. ## Knowledge retention A person leaves. The cycle continues. The corpus is what makes that true. Three artifacts carry the load: 1. **Briefs and ADRs** — the *why*. The decisions, with their rejected options. 2. **Runbooks** — the *what to do when*. The operational memory. 3. **The model file** — the assumptions, witnessed and not, with status and dates. Slack threads are not knowledge retention. Email is not knowledge retention. The corpus is. When someone leaves, the question is not *who knew this*. The question is *which artifact has it*. If the answer is *nobody wrote it down*, that is a chain gap, and the next retrospective owns it. ## Team capacity planning Monthly. Three numbers and a question. * **Cycles in flight** — how many initiatives is the team currently inside? * **On-call recovery** — when did the on-call last get a clean week? * **Discovery debt** — how many briefs are due for refresh and overdue? The question: *what would have to be true for the team to take on one more thing?* If the answer is *nothing changes*, the team is over capacity and is paying for it in chain debt. The instinct under pressure is to shorten the cycle. The corpus is structured against that. Cutting Discovery to ship faster trades a known cost (a longer cycle) for an unknown cost (a feature that solves the wrong problem). The Discovery cost is paid in days. The wrong-problem cost is paid in months. ## Hiring for chain fit The corpus is opinionated. Not every excellent engineer or designer or PM thrives in it. Hiring needs to be honest about that. Look for, in interview: * **Does the candidate name a person when describing past work?** Or do they say *the user*? * **Can they describe a prediction they made and got wrong, by name?** Or do they describe wins only? * **Do they trace defects to chain levels, instinctively?** Or do they personalise — *the engineer didn't catch it*? * **Can they hold their craft and read an adjacent one?** Vertical and horizontal. * **Are they comfortable with structural tools** — checklists, templates, conventions — or do they treat them as bureaucracy? The hiring bar is not *better than the team*. It is *adds something the chain is missing without breaking what the chain is doing*. ## Cross-team coordination A growing team eventually has more than one chain running in parallel. The coordination problem is not staff allocation — it is shared state. Three artifacts handle it. * **Shared services registry** — what does each chain consume from each other? Versioned, with owners. * **ADR cross-references** — when one chain's ADR constrains another, both ADRs link. * **Portfolio review (Part 9)** — sees both chains together. Catches the conflict that no individual cycle can see. What does *not* work: ad hoc Slack channels for coordination. They produce decisions no one can find next quarter. The artifacts are the coordination. ### Enough to know the chain has runners. Onboarding has produced a person who can read the chain. One-to-ones have run. Capacity is honestly named. Knowledge retention artifacts are current. The team can lose one person and continue. [Part 9 — The Portfolio →](/volumes/v-after-we-build/9-portfolio) --- --- url: /volumes/v-after-we-build/9-portfolio.md --- part nine · the portfolio # The Portfolio > *The view across features and products. DORA metrics. Technical debt as chain gaps. VRI. When to stop.* **Quarterly portfolio review.** **Monthly DORA reading.** **Continuous VRI.** **Killing initiatives is a portfolio function** — not a team conversation. The portfolio is the view above any single cycle. It is where the leadership reads what the chain is producing across all the work, and decides what to keep funding, what to continue, and what to stop. The discipline of the portfolio is the discipline of writing down what the team is willing to walk away from. ## DORA — the four system signals Lead time, change failure rate, deploy frequency, MTTR. Tracked at portfolio level, not story level. They tell you whether the chain itself is healthy — independent of any individual feature. | Metric | Read it as | |---|---| | **Lead time** (commit → production) | How thick the chain is. Long lead time means a bottleneck somewhere. | | **Change failure rate** | How often the chain ships something that has to be rolled back. | | **Deploy frequency** | How often the chain is willing to commit. Low frequency hides problems. | | **MTTR** | How quickly the chain recovers when it breaks. | A team can have great delivery on one feature and still have a chain that is dying. The DORA signals catch that. DORA is not a team scorecard. It is a chain diagnostic. Read across cycles, not within a sprint. ## Technical debt as chain gaps The corpus has a specific definition of *technical debt*: the gap between what the chain produced and what the chain should have produced, that has been left in the system because the cycle had to keep moving. Debt is not all code. It is: * A migration that runs but is not reversible. * A runbook that names a step but doesn't say how. * A brief that has predictions but no check date. * A test suite that runs but doesn't cover any of the Gherkin scenarios. * A flag that was meant to be cleaned up after the cycle and is still wrapping behavior six months later. Debt accumulates by chain level. *Discovery debt* — briefs without witnessed assumptions — is real and expensive. *Operational debt* — runbooks that don't work — is the kind that wakes people up at 3am. The portfolio review reads the debt distribution. Concentration in any one level is a signal that the chain has been letting that phase quietly skip itself. ## VRI — Value-to-Rework Index The financial translation. *Value declared per cycle* divided by *rework produced per cycle*. A portfolio-level health metric. ```text VRI = Σ value(initiatives shipped) / Σ rework(rework cycles needed) ``` A VRI that is improving means the chain is producing more of what was paid for. A VRI that is declining means initiatives are shipping but reality keeps disagreeing. A VRI hidden — because rework is reframed as "next phase" — is the most expensive, because the chain doesn't see itself. Rework is named explicitly: * **Rework caused by Strategy** — the initiative was the wrong bet. Counted. * **Rework caused by Discovery** — the brief didn't witness. Counted. * **Rework caused by Scope** — the story missed a state. Counted. * **Rework caused by Execution** — the bug was a code defect. Counted. * **Rework caused by Operation** — the runbook didn't exist. Counted. Reframing rework as *iteration* hides the signal. Iteration is what we said we'd do. Rework is what we said we wouldn't have to do. ## When to stop This is the portfolio's hardest job. An initiative is killed when: * The prediction was wrong and the new prediction is not better than the next one in the queue. * The signal reading shows no movement after two cycles of intervention. * The VRI of this initiative, alone, is below the portfolio threshold. * Strategy has moved and the initiative is now serving a goal the organisation no longer holds. Killing is not failure. Killing is the chain's most disciplined act. The team that kills initiatives at the right time has already saved more than they would have spent finishing. The corpus pattern: a kill produces an artifact, like everything else. The kill brief — short, factual, with the data — joins the corpus. The next initiative inherits it. *We have learned not to do this in this way for these reasons.* ## The portfolio review Quarterly. Three artifacts on the table: 1. **DORA signals** across all initiatives, with the trend. 2. **VRI** across all initiatives, with the trend, and the rework breakdown by chain level. 3. **The kill list** — initiatives the trio is recommending stop, with evidence. Decisions made at the portfolio review are written down — kept, continued, killed — with rationale. They become the input to the next cycle's strategy. A portfolio review that produces only *keep* decisions is not a portfolio review. It is a status meeting. The discipline is the willingness to say *stop*. ### Enough to know what is alive. DORA signals are tracked and read. VRI is current. Rework is broken down by chain level. The portfolio review has happened in the last 90 days and produced at least one decision — including, where warranted, a kill. [Part 10 — Adoption →](/volumes/v-after-we-build/10-adoption) --- --- url: /volumes/v-after-we-build/10-adoption.md --- part ten · adoption # Adoption > *How to start. Which practice first. The first cycle. What resistance looks like. What maturity feels like.* **First cycle is short** — a single feature, scoped to fit a real check date. **One practice at a time.** **Resistance is a signal**, not a verdict. The corpus is large. A team that tries to adopt all of it at once adopts none of it. Adoption is not a project. It is a sequence of small commitments, each of which proves the previous one was worth the friction. This part of the volume is about the practical question every team that meets the corpus eventually asks: *where do we begin*. ## The minimum viable chain Twenty minutes. Two practices. One cycle. 1. **A prediction with a check date.** Anywhere on the team's existing work — a feature already in flight is fine. Write down what we expect, what we will measure, when we will check. 2. **Run the check on the date.** Even if the result is *we forgot to wire the metric*. The act of running the check is what builds the muscle. That is the chain. Everything else in the corpus is in service of those two acts being done well. A team that does this for three cycles in a row is already most of the way to having a chain. ## Practice sequencing Once the minimum is holding, layer the practices. The order matters — each one assumes the previous is in place. | Order | Practice | Why this one next | |---|---|---| | 1 | Prediction + check | The chain's spine. Without this, nothing else updates. | | 2 | Feature Brief (one sheet) | Where the prediction lives so it doesn't drift. | | 3 | Amigos before code | The smallest unit of shared meaning between phases. | | 4 | Retrospective with one change | Compounding rather than listing. | | 5 | Model update | Where the cycle's learning becomes the next cycle's starting point. | | 6 | ADRs for constrained choices | The technical-side artifact that survives the developer leaving. | | 7 | The full Discovery walk | Five stations. Vision through Decision. | | 8 | Postmortem with chain levels | The structural-fix discipline. | | 9 | Portfolio review with VRI | The view above the cycle. | | 10 | The full corpus | Everything else, gradually. | Skipping ahead is allowed if the team is ready. Skipping the chain's spine — prediction and check — is not. Without it, nothing learns. ## What resistance looks like Resistance is not opposition. It is the team's honest signal about where the chain is asking for something the conditions don't yet support. Common shapes, with what's actually underneath: | Resistance sounds like | What is actually true | |---|---| | *We don't have time for amigos.* | Amigos save time on the story they cover. The team has not yet seen the saving, because they have not yet held one. | | *Predictions are too risky to write down.* | The team has been measured on getting predictions right rather than on running the check. The fix is structural — make *not checked* the only outcome with no value. | | *The clients won't let us do Discovery.* | The clients have been sold execution. The fix is in the contract and the financial translation, not in the cycle. | | *We're too small for all this overhead.* | The corpus does not require all of it. See the minimum viable chain above. The team has heard the corpus as a checklist, not a sequence. | | *Our work is different.* | Sometimes true. Often the fix is to translate the corpus into the team's domain language, not to abandon the practices. | Resistance handled well produces a smaller, more honest version of the practice. Resistance ignored produces silence and ceremony. ## What maturity feels like A mature chain is not a busy chain. It is a quiet one. Signs the chain is mature: * The retro produces small changes, not big ones, because the big ones already happened. * Postmortems are uneventful — runbooks held, structural fixes are obvious. * Predictions land within a small distance of measured reality, more often than not. * New people onboard quickly because the artifacts hold the knowledge. * The portfolio review produces *kill* decisions occasionally, without drama. * Leadership reads the chain artifacts and knows what is happening without asking. Signs the chain is *not yet* mature: * The retro produces a list and the list grows. * Postmortems describe what people felt and not what changed. * Predictions are made loosely or not at all. * Onboarding takes months because the knowledge lives in heads. * Kill decisions are dramatic and rare. * Leadership is briefed in conversations rather than informed by artifacts. Maturity is not a destination. It is a condition that has to be re-earned each cycle. The team that takes it for granted has already started losing it. ## Artifact lifecycle A corpus is not a museum. Artifacts go stale. The chain has a small, honest discipline for that. * **Every artifact has a `last_reviewed` date.** The corpus surfaces ones over six months old. * **Deprecated artifacts are marked, not deleted.** Future readers need to know the practice was tried and what replaced it. * **A lightweight track exists** for changes that don't warrant the full chain — typo fixes, documentation tweaks, minor refactors. They go through CI but not through Discovery. * **The chain itself is reviewed annually.** Where is friction structural? Where is the corpus describing a world that has moved? ## Chain evolution The chain is meant to change. A chain that hasn't changed in two years is either complete (rare) or asleep (common). Chain evolution is itself a chain operation: * **A change to the chain begins as a brief** — same template, witnessed problem in the team's own working life. * **It is scoped, sliced, predicted** — same shape as any feature. * **It runs as a cycle** — the team practices the new way for one full cycle. * **It is checked** — did the change produce what we predicted? * **It joins the corpus** — or it doesn't, and the artifact records why not. The chain evolves the way features do, on itself, with the same discipline. Anything else is fashion. ### Enough to know the chain is alive. At least one cycle has been run with prediction + check. At least one practice from the sequencing list has been added. At least one model update has been written. The team can name what is mature, what is still draft, and what is gap. *** ## End of Volume V The cycle closes here — and reopens. The model update from this volume is the input to the next cycle's Volume II. The portfolio decision from this volume is the input to the next cycle's Volume I. [Back to the volume cover →](/volumes/v-after-we-build/) · [Back to the five volumes →](/volumes/) · [The map →](/map) --- --- url: /areas.md --- 200apps · how we work # Master Areas Every discipline, craft, and operational practice the chain touches — mapped by where it lives, who owns it, and which volume addresses it. The volumes describe the chain as a story. The areas describe it as a grid. Both are true. The story tells you what to do this week. The grid tells you whether the practice you are about to start has a home in the chain. ## The thirteen sections | # | Section | Phase | Primary owners | Volumes | |---|---|---|---|---| | 1 | [Strategy & Direction](/areas/01-strategy-direction/) | Before | Founder, Leadership, PO | I, II | | 2 | [Discovery & Research](/areas/02-discovery-research/) | Before | PO, Designer | II | | 3 | [Product Definition](/areas/03-product-definition/) | Before | PO, Trio | II, III | | 4 | [Design & UX](/areas/04-design-ux/) | During | Designer | III, IV | | 5 | [Architecture & Technical Design](/areas/05-architecture/) | During | Tech Lead | III, IV | | 6 | [Development & Code](/areas/06-development-code/) | During | Developer | IV | | 7 | [Quality & Testing](/areas/07-quality-testing/) | During | QA | III, IV | | 8 | [Pipeline & Operations](/areas/08-pipeline-operations/) | During | Tech Lead, DevOps | IV | | 9 | [Release & Communication](/areas/09-release-communication/) | Boundary | PO | IV | | 10 | [Post-Release & Learning](/areas/10-post-release-learning/) | After | PO, Tech Lead | V | | 11 | [Ongoing Operations & Client](/areas/11-ongoing-operations-client/) | After | PO, CS Lead | V | | 12 | [Team & Organizational](/areas/12-team-organizational/) | Continuous | Leadership, PO | V | | 13 | [Adoption & Evolution](/areas/13-adoption-evolution/) | Meta | PO, Leadership | V | ## How a section is structured Each section is a list of crafts. Each craft is a page. Every craft page declares: * **What the craft is** — a working definition. * **Who owns it** — the role(s) that produce or steward the artifact. * **Which volume addresses it** — the narrative source of truth. * **Related crafts** — adjacent practices that depend on or feed this one. * **Maturity** — `gap`, `seed`, `draft`, `stable`, or `reviewed`. ## Gaps A small number of areas exist in the chain — they show up in real teams every week — but are not yet addressed in any volume. They are flagged with `maturity: gap`. The current gaps: * Market & competitive awareness * Competitive analysis (as discovery input) * Quantitative research (alongside observation) * User testing / usability (validation before build) * Pair programming / mobbing * Infrastructure as code * Team capacity planning (sustainable pace) * Hiring for chain fit * Cross-team coordination * Artifact lifecycle (deprecation, lightweight track) * Chain evolution (when to change the chain itself) These are not omissions of importance — they are areas where the chain has not yet produced a volume's worth of considered practice. Filling them is part of the corpus's job. ## How to use this index * **Pick a craft** to read its definition and find the volume that addresses it. * **Pick a section** to see how a discipline composes — which crafts feed which. * **Use [the map](/map)** to see the cross-graph between Volumes and Areas. --- --- url: /areas/01-strategy-direction.md --- master areas · section 1 # Strategy & Direction > *The work before the work begins.* Where vision becomes goals, goals become initiatives, and initiatives are valued, financially translated, and held in a portfolio that can be funded, continued, or killed. This section addresses the work that makes every later cycle answerable to something. Primary owners: Founder, Leadership, PO. Primary volumes: [I — Strategy & Direction](/volumes/i-strategy/), with the financial layer surfacing again in [V — After We Build · Portfolio](/volumes/v-after-we-build/9-portfolio). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Vision & Mission](/areas/01-strategy-direction/vision-mission) | Founder, CEO | I | | [Goals & Objectives](/areas/01-strategy-direction/goals-objectives) | Leadership, PO | I, II | | [Initiative Identification](/areas/01-strategy-direction/initiative-identification) | PO | II | | [Portfolio Management](/areas/01-strategy-direction/portfolio-management) | Leadership, PO | V | | [Value Declaration](/areas/01-strategy-direction/value-declaration) | PO | II, V | | [Financial Translation](/areas/01-strategy-direction/financial-translation) | PO, Leadership | V | | [Client Relationship Strategy](/areas/01-strategy-direction/client-relationship) | PO, Account | V | | [Market & Competitive Awareness](/areas/01-strategy-direction/market-competitive) gap | PO, Leadership | — | --- --- url: /areas/01-strategy-direction/vision-mission.md --- Strategy & Direction · master area # Vision & Mission > *Declaring the change the organisation exists to make. A vision names a person and a falsifiable claim about their life; a mission names what the organisation does to bring that vision about.* **Owners:** Founder, CEO **Volume addressing it:** I ## Where it lives * [Volume I · Vision & Mission](/volumes/i-strategy/1-vision-mission) ## Related crafts * [Goals & Objectives](/areas/01-strategy-direction/goals-objectives) --- --- url: /areas/01-strategy-direction/goals-objectives.md --- Strategy & Direction · master area # Goals & Objectives > *Translating vision into measurable bets. Goals are time-bound, measurable, and anchored to a named person. Objectives nest under goals; predictions nest under objectives.* **Owners:** Leadership, PO **Volumes addressing it:** I, II ## Where it lives * [Volume I · Goals & Objectives](/volumes/i-strategy/2-goals-objectives) ## Related crafts * [Vision & Mission](/areas/01-strategy-direction/vision-mission) * [Initiative Identification](/areas/01-strategy-direction/initiative-identification) --- --- url: /areas/01-strategy-direction/initiative-identification.md --- Strategy & Direction · master area # Initiative Identification > *Naming the gap between current state and goal. An initiative is bigger than a feature, smaller than a portfolio bet, and contains 2–3 Volume II briefs over its life.* **Owner:** PO **Volume addressing it:** II ## Where it lives * [Volume I · Initiative Identification](/volumes/i-strategy/3-initiative-identification) ## Related crafts * [Goals & Objectives](/areas/01-strategy-direction/goals-objectives) * [Initiative Brief](/areas/03-product-definition/initiative-brief) --- --- url: /areas/01-strategy-direction/portfolio-management.md --- Strategy & Direction · master area # Portfolio Management > *Deciding which initiatives to fund, continue, or kill. Quarterly. Three states only — fund, continue, kill. Pause is a form of dishonesty.* **Owners:** Leadership, PO **Volume addressing it:** V ## Where it lives * [Volume I · Portfolio Direction](/volumes/i-strategy/7-portfolio-direction) * [Volume V · The Portfolio](/volumes/v-after-we-build/9-portfolio) ## Related crafts * [Financial Translation](/areas/01-strategy-direction/financial-translation) * [Quarterly Portfolio Review](/areas/11-ongoing-operations-client/quarterly-portfolio) --- --- url: /areas/01-strategy-direction/value-declaration.md --- Strategy & Direction · master area # Value Declaration > *Estimating the worth of the intended change in writing, before the cycle runs. Range, most-likely, assumptions. Forms VRI numerator.* **Owner:** PO **Volumes addressing it:** II, V ## Where it lives * [Volume I · Value Declaration](/volumes/i-strategy/4-value-declaration) ## Related crafts * [Financial Translation](/areas/01-strategy-direction/financial-translation) * [Initiative Brief](/areas/03-product-definition/initiative-brief) --- --- url: /areas/01-strategy-direction/financial-translation.md --- Strategy & Direction · master area # Financial Translation > *VRI, rework multiplier, discovery billing — the layer that turns chain artifacts into language leadership and clients can use.* **Owners:** PO, Leadership **Volume addressing it:** V ## Where it lives * [Volume I · Financial Translation](/volumes/i-strategy/5-financial-translation) ## Related crafts * [Value Declaration](/areas/01-strategy-direction/value-declaration) * [Technical Debt Management](/areas/06-development-code/tech-debt) --- --- url: /areas/01-strategy-direction/client-relationship.md --- Strategy & Direction · master area # Client Relationship Strategy > *Trust building, cadence, renewal, expansion. Strategic frame for how the chain meets the people who pay for it. Cadence is held even when there is nothing to say.* **Owners:** PO, Account **Volume addressing it:** V ## Where it lives * [Volume I · Client Relationship Strategy](/volumes/i-strategy/6-client-relationship) ## Related crafts * [Weekly Client Update](/areas/11-ongoing-operations-client/weekly-client-update) * [Bi-weekly Client Sync](/areas/11-ongoing-operations-client/bi-weekly-sync) --- --- url: /areas/01-strategy-direction/market-competitive.md --- Strategy & Direction · master area # Market & Competitive Awareness > *Understanding the landscape the product lives in. Currently a gap in the corpus — competitive analysis as Discovery input is not yet addressed in any volume.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** PO, Leadership **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Competitive Analysis](/areas/02-discovery-research/competitive-analysis) --- --- url: /areas/02-discovery-research.md --- master areas · section 2 # Discovery & Research > *Understanding the problem before solving it.* The crafts in this section produce the input the rest of the chain runs on. Without them — or with them done badly — every later artifact is built on assumption. Discovery is observation-first: surveys and interviews are downstream of going to watch the activity in person. Primary owners: PO, Designer. Primary volume: [II — Discovery & Brief](/volumes/ii-discovery/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Observation / Field Research](/areas/02-discovery-research/observation) | PO, Designer | II | | [Interview Technique](/areas/02-discovery-research/interview-technique) | PO, Designer | II | | [Person Identification](/areas/02-discovery-research/person-identification) | PO | II | | [Moment Identification](/areas/02-discovery-research/moment-identification) | PO, Designer | II | | [Journey Mapping](/areas/02-discovery-research/journey-mapping) | PO, Designer | II, III | | [Assumption Surfacing](/areas/02-discovery-research/assumption-surfacing) | PO | II | | [Domain Immersion](/areas/02-discovery-research/domain-immersion) | Whole trio | II | | [Competitive Analysis](/areas/02-discovery-research/competitive-analysis) gap | PO | — | | [Quantitative Research](/areas/02-discovery-research/quantitative-research) gap | PO, Data | — | --- --- url: /areas/02-discovery-research/observation.md --- Discovery & Research · master area # Observation / Field Research > *Going to watch the named person do the activity, in real time, in their environment. The discipline most teams skip. Not interview. Not survey.* **Owners:** PO, Designer **Volume addressing it:** II ## Where it lives * [Volume II · Observation](/volumes/ii-discovery/1-observation) ## Related crafts * [Interview Technique](/areas/02-discovery-research/interview-technique) * [Journey Mapping](/areas/02-discovery-research/journey-mapping) * [Assumption Surfacing](/areas/02-discovery-research/assumption-surfacing) --- --- url: /areas/02-discovery-research/interview-technique.md --- Discovery & Research · master area # Interview Technique > *Structured, anchored to specific moments observed. Asks "what was that thing you did at 09:14" — never "what is hard about your job".* **Owners:** PO, Designer **Volume addressing it:** II ## Where it lives * [Volume II · Observation](/volumes/ii-discovery/1-observation) ## Related crafts * [Observation / Field Research](/areas/02-discovery-research/observation) --- --- url: /areas/02-discovery-research/person-identification.md --- Discovery & Research · master area # Person Identification > *Naming who has the problem — Dina, Miri, Avi. Never *the user*. The chain rule: every brief begins with a named person whose life will change.* **Owner:** PO **Volume addressing it:** II ## Where it lives * [Volume II · Person & Moment](/volumes/ii-discovery/2-person-moment) ## Related crafts * [Moment Identification](/areas/02-discovery-research/moment-identification) * [Feature Brief](/areas/03-product-definition/feature-brief) --- --- url: /areas/02-discovery-research/moment-identification.md --- Discovery & Research · master area # Moment Identification > *Finding the specific friction or failure point in the activity. Not the average experience — the actual moment, with timing.* **Owners:** PO, Designer **Volume addressing it:** II ## Where it lives * [Volume II · Person & Moment](/volumes/ii-discovery/2-person-moment) ## Related crafts * [Person Identification](/areas/02-discovery-research/person-identification) * [Journey Mapping](/areas/02-discovery-research/journey-mapping) --- --- url: /areas/02-discovery-research/journey-mapping.md --- Discovery & Research · master area # Journey Mapping > *Drawing the named person's activity end to end with friction marks. Steps numbered J1, J2 — referenced from briefs and stories.* **Owners:** PO, Designer **Volumes addressing it:** II, III ## Where it lives * [Volume II · Journey Mapping](/volumes/ii-discovery/3-journey-mapping) * [Volume III · Story Mapping](/volumes/iii-scope/2-story-mapping) ## Related crafts * [Observation / Field Research](/areas/02-discovery-research/observation) * [Moment Identification](/areas/02-discovery-research/moment-identification) * [Current-State Flow Mapping](/areas/04-design-ux/current-state-flow) --- --- url: /areas/02-discovery-research/assumption-surfacing.md --- Discovery & Research · master area # Assumption Surfacing > *Naming what we believe but haven't witnessed. Three states — witnessed, inferred, not witnessed. The cycle moves them.* **Owner:** PO **Volume addressing it:** II ## Where it lives * [Volume II · Assumption Surfacing](/volumes/ii-discovery/4-assumption-surfacing) ## Related crafts * [Model Update](/areas/10-post-release-learning/model-update) --- --- url: /areas/02-discovery-research/domain-immersion.md --- Discovery & Research · master area # Domain Immersion > *Learning the language of the named person's world. The same words must appear in the brief, the code, the API, the analytics events.* **Owner:** Whole trio **Volume addressing it:** II ## Where it lives * [Volume II · Observation](/volumes/ii-discovery/1-observation) * [Volume IV · Domain Language in Code](/volumes/iv-execution/1-domain-language) ## Related crafts * [Domain Language in Code](/areas/06-development-code/domain-language) --- --- url: /areas/02-discovery-research/competitive-analysis.md --- Discovery & Research · master area # Competitive Analysis > *Understanding existing solutions and their gaps as Discovery input. Currently a gap in the corpus.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owner:** PO **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Market & Competitive Awareness](/areas/01-strategy-direction/market-competitive) --- --- url: /areas/02-discovery-research/quantitative-research.md --- Discovery & Research · master area # Quantitative Research > *Surveys, analytics, usage data as Discovery input — alongside (not instead of) observation. Currently a gap in the corpus.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** PO, Data **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Observation / Field Research](/areas/02-discovery-research/observation) * [Product Analytics](/areas/08-pipeline-operations/product-analytics) --- --- url: /areas/03-product-definition.md --- master areas · section 3 # Product Definition > *Translating understanding into actionable scope.* The artifact-producing section. Every craft here outputs a written thing — brief, story, map, decision — that survives the room in which it was made. This section bridges Discovery (Volume II) and Scope (Volume III). Primary owners: PO, with the Trio joining at Epic and amigos. Primary volumes: [II — Discovery & Brief](/volumes/ii-discovery/) and [III — Scope & Shape](/volumes/iii-scope/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Initiative Brief](/areas/03-product-definition/initiative-brief) | PO | II | | [Feature Brief](/areas/03-product-definition/feature-brief) | PO, Designer | II | | [Technical Design Brief](/areas/03-product-definition/technical-design-brief) | Tech Lead | II | | [Product Decision Records (PDR)](/areas/03-product-definition/pdr) | PO | III | | [Prediction Writing](/areas/03-product-definition/prediction-writing) | PO | II | | [Epic Naming & Kickoff](/areas/03-product-definition/epic-naming) | PO, Trio | III | | [Story Writing](/areas/03-product-definition/story-writing) | PO | III | | [Story Mapping](/areas/03-product-definition/story-mapping) | PO, Trio | III | | [Walking Skeleton](/areas/03-product-definition/walking-skeleton) | PO, Trio | III | | [Slicing & Prioritization](/areas/03-product-definition/slicing-prioritization) | PO | III | | [Backlog Management](/areas/03-product-definition/backlog-management) | PO | III, IV | --- --- url: /areas/03-product-definition/initiative-brief.md --- Product Definition · master area # Initiative Brief > *Business gap, human gap, discovery questions, V — the highest-altitude artifact in Volume II. Two pages. Signed off before execution begins.* **Owner:** PO **Volume addressing it:** II ## Where it lives * [Volume II · Initiative Brief](/volumes/ii-discovery/6-initiative-brief) ## Related crafts * [Feature Brief](/areas/03-product-definition/feature-brief) * [Initiative Identification](/areas/01-strategy-direction/initiative-identification) * [Value Declaration](/areas/01-strategy-direction/value-declaration) --- --- url: /areas/03-product-definition/feature-brief.md --- Product Definition · master area # Feature Brief > *Observation, journey, direction, prediction, sign-off. The cycle's primary artifact. Carries the prediction Volume V will check.* **Owners:** PO, Designer **Volume addressing it:** II ## Where it lives * [Volume II · Feature Brief](/volumes/ii-discovery/7-feature-brief) ## Related crafts * [Prediction Writing](/areas/03-product-definition/prediction-writing) * [Initiative Brief](/areas/03-product-definition/initiative-brief) --- --- url: /areas/03-product-definition/technical-design-brief.md --- Product Definition · master area # Technical Design Brief > *System-witnessed problem and technical prediction. Written by the Tech Lead when the feature carries technical Discovery questions.* **Owner:** Tech Lead **Volume addressing it:** II ## Where it lives * [Volume II · Technical Design Brief](/volumes/ii-discovery/8-technical-design-brief) ## Related crafts * [Architecture Decision Records](/areas/05-architecture/adr) * [Spike Management](/areas/05-architecture/spike-management) --- --- url: /areas/03-product-definition/pdr.md --- Product Definition · master area # Product Decision Records (PDR) > *Scope or priority decisions with rejected options. The product analogue of an ADR. Records the *why* of trade-offs that future cycles must see.* **Owner:** PO **Volume addressing it:** III ## Where it lives * [Volume III · Slicing & Prioritization](/volumes/iii-scope/9-slicing-prioritization) ## Related crafts * [Architecture Decision Records](/areas/05-architecture/adr) --- --- url: /areas/03-product-definition/prediction-writing.md --- Product Definition · master area # Prediction Writing > *Baseline, target, check date, method, owner. The five fields that make a brief a brief.* **Owner:** PO **Volume addressing it:** II ## Where it lives * [Volume II · Prediction Writing](/volumes/ii-discovery/9-prediction-writing) ## Related crafts * [Signal Reading](/areas/10-post-release-learning/signal-reading) * [Feature Brief](/areas/03-product-definition/feature-brief) --- --- url: /areas/03-product-definition/epic-naming.md --- Product Definition · master area # Epic Naming & Kickoff > *Coherent activities named after what the named person does. The kickoff produces the artifact Volume III is built on.* **Owners:** PO, Trio **Volume addressing it:** III ## Where it lives * [Volume III · Epic Naming & Kickoff](/volumes/iii-scope/1-epic-naming) ## Related crafts * [Story Writing](/areas/03-product-definition/story-writing) * [Story Mapping](/areas/03-product-definition/story-mapping) --- --- url: /areas/03-product-definition/story-writing.md --- Product Definition · master area # Story Writing > *Person, moment, done, out-of-scope. Plus the nine-point Definition of Ready. Sized 1–3 days, with at least three testable acceptance criteria.* **Owner:** PO **Volume addressing it:** III ## Where it lives * [Volume III · Story Writing](/volumes/iii-scope/4-story-writing) ## Related crafts * [Amigos Session](/areas/07-quality-testing/amigos) * [Gherkin Scenario Writing](/areas/07-quality-testing/gherkin) * [Epic Naming & Kickoff](/areas/03-product-definition/epic-naming) --- --- url: /areas/03-product-definition/story-mapping.md --- Product Definition · master area # Story Mapping > *Epics as columns, stories as rows, releases as horizontal slices. The artifact the trio negotiates scope on.* **Owners:** PO, Trio **Volume addressing it:** III ## Where it lives * [Volume III · Story Mapping](/volumes/iii-scope/2-story-mapping) ## Related crafts * [Walking Skeleton](/areas/03-product-definition/walking-skeleton) * [Slicing & Prioritization](/areas/03-product-definition/slicing-prioritization) --- --- url: /areas/03-product-definition/walking-skeleton.md --- Product Definition · master area # Walking Skeleton > *The smallest end-to-end release that changes the situation. Spans every Epic in the cycle.* **Owners:** PO, Trio **Volume addressing it:** III ## Where it lives * [Volume III · Walking Skeleton](/volumes/iii-scope/3-walking-skeleton) ## Related crafts * [Story Mapping](/areas/03-product-definition/story-mapping) * [Slicing & Prioritization](/areas/03-product-definition/slicing-prioritization) --- --- url: /areas/03-product-definition/slicing-prioritization.md --- Product Definition · master area # Slicing & Prioritization > *Which stories ship together, in what order, value-driven. Required-for-prediction stories must all be in the slice.* **Owner:** PO **Volume addressing it:** III ## Where it lives * [Volume III · Slicing & Prioritization](/volumes/iii-scope/9-slicing-prioritization) ## Related crafts * [Walking Skeleton](/areas/03-product-definition/walking-skeleton) * [Story Mapping](/areas/03-product-definition/story-mapping) --- --- url: /areas/03-product-definition/backlog-management.md --- Product Definition · master area # Backlog Management > *Keeping the story map current. Pulling ready stories. Backlogs are curated by decision, not accumulated by default.* **Owner:** PO **Volumes addressing it:** III, IV ## Where it lives * [Volume III · Story Mapping](/volumes/iii-scope/2-story-mapping) ## Related crafts * [Story Mapping](/areas/03-product-definition/story-mapping) --- --- url: /areas/04-design-ux.md --- master areas · section 4 # Design & UX > *The person-facing craft.* Where the journey, the brief, and the moment become a usable surface. The corpus pattern is design-with-the-system, not design-for-the-system: the design system, the codebase's components, and the Figma frames are kept one-to-one. UX/product ilities — learnability, content clarity, responsiveness — are first-class requirements. Primary owner: Designer. Primary volumes: [III — Scope & Shape](/volumes/iii-scope/) and [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Current-State Flow Mapping](/areas/04-design-ux/current-state-flow) | Designer | III | | [User Flow Design](/areas/04-design-ux/user-flow-design) | Designer | III | | [Wireframing & Prototyping](/areas/04-design-ux/wireframing-prototyping) | Designer | III | | [Interaction Design](/areas/04-design-ux/interaction-design) | Designer | III, IV | | [Content Design](/areas/04-design-ux/content-design) | Designer | IV | | [Design System](/areas/04-design-ux/design-system) | Designer, Dev | III, IV | | [Storybook](/areas/04-design-ux/storybook) | Designer, Dev | IV | | [Visual Regression Baseline](/areas/04-design-ux/visual-regression) | Designer, QA | IV | | [Accessibility Design](/areas/04-design-ux/accessibility) | Designer | III, IV | | [Responsive Design](/areas/04-design-ux/responsive-design) | Designer | III, IV | | [Figma Organization](/areas/04-design-ux/figma-organization) | Designer | III | | [Handoff Annotations](/areas/04-design-ux/handoff-annotations) | Designer | IV | | [UX Review](/areas/04-design-ux/ux-review) | Designer | IV | | [Design System Governance](/areas/04-design-ux/design-system-governance) | Designer, PO | IV | | [User Testing / Usability](/areas/04-design-ux/user-testing) gap | Designer, PO | — | --- --- url: /areas/04-design-ux/current-state-flow.md --- Design & UX · master area # Current-State Flow Mapping > *Drawing what exists before designing what should. The before-picture every redesign needs.* **Owner:** Designer **Volume addressing it:** III ## Where it lives * [Volume II · Journey Mapping](/volumes/ii-discovery/3-journey-mapping) ## Related crafts * [Journey Mapping](/areas/02-discovery-research/journey-mapping) * [User Flow Design](/areas/04-design-ux/user-flow-design) --- --- url: /areas/04-design-ux/user-flow-design.md --- Design & UX · master area # User Flow Design > *Happy path → edges → errors. Nodes are named states. Drawn before screens.* **Owner:** Designer **Volume addressing it:** III ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Wireframing & Prototyping](/areas/04-design-ux/wireframing-prototyping) * [Interaction Design](/areas/04-design-ux/interaction-design) --- --- url: /areas/04-design-ux/wireframing-prototyping.md --- Design & UX · master area # Wireframing & Prototyping > *Fat-marker → mid-fidelity → full fidelity progression. Each step defends the previous.* **Owner:** Designer **Volume addressing it:** III ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [User Flow Design](/areas/04-design-ux/user-flow-design) * [Handoff Annotations](/areas/04-design-ux/handoff-annotations) --- --- url: /areas/04-design-ux/interaction-design.md --- Design & UX · master area # Interaction Design > *Transitions, hover/focus states, micro-interactions. The states code will need to support are designed here.* **Owner:** Designer **Volumes addressing it:** III, IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Design System](/areas/04-design-ux/design-system) * [Component Development](/areas/06-development-code/component-development) --- --- url: /areas/04-design-ux/content-design.md --- Design & UX · master area # Content Design > *Labels, errors, empty states, help text — in the named person's domain language.* **Owner:** Designer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Domain Immersion](/areas/02-discovery-research/domain-immersion) * [Domain Language in Code](/areas/06-development-code/domain-language) --- --- url: /areas/04-design-ux/design-system.md --- Design & UX · master area # Design System > *Component library, tokens, patterns, rules. Lives one-to-one with the codebase.* **Owners:** Designer, Developer **Volumes addressing it:** III, IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Storybook](/areas/04-design-ux/storybook) * [Design System Governance](/areas/04-design-ux/design-system-governance) --- --- url: /areas/04-design-ux/storybook.md --- Design & UX · master area # Storybook > *The design system in code. Every named state has a Storybook entry; every Storybook entry has a Figma frame.* **Owners:** Designer, Developer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Design System](/areas/04-design-ux/design-system) * [Visual Regression Baseline](/areas/04-design-ux/visual-regression) --- --- url: /areas/04-design-ux/visual-regression.md --- Design & UX · master area # Visual Regression Baseline > *Named Figma states as screenshot baselines. Pipeline stage 4 (Volume IV) checks against these.* **Owners:** Designer, QA **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [Storybook](/areas/04-design-ux/storybook) * [Regression Testing](/areas/07-quality-testing/regression-testing) --- --- url: /areas/04-design-ux/accessibility.md --- Design & UX · master area # Accessibility Design > *WCAG compliance, keyboard navigation, screen reader support, contrast. Designed in, not bolted on.* **Owner:** Designer **Volumes addressing it:** III, IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Accessibility Testing](/areas/07-quality-testing/accessibility-testing) * [UX/Product Ilities](/areas/05-architecture/ux-ilities) --- --- url: /areas/04-design-ux/responsive-design.md --- Design & UX · master area # Responsive Design > *Device-appropriate layouts and behaviors. The named person's device shapes the design space.* **Owner:** Designer **Volumes addressing it:** III, IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [UX/Product Ilities](/areas/05-architecture/ux-ilities) --- --- url: /areas/04-design-ux/figma-organization.md --- Design & UX · master area # Figma Organization > *One file per Feature, one page per Epic, named frames matching state machines. Findability is part of the deliverable.* **Owner:** Designer **Volume addressing it:** III ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Design System](/areas/04-design-ux/design-system) * [Handoff Annotations](/areas/04-design-ux/handoff-annotations) --- --- url: /areas/04-design-ux/handoff-annotations.md --- Design & UX · master area # Handoff Annotations > *Spacing, interaction notes, state transitions in Figma. Read by the developer at PR time, not at planning.* **Owner:** Designer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Figma Organization](/areas/04-design-ux/figma-organization) * [Code Review](/areas/06-development-code/code-review) --- --- url: /areas/04-design-ux/ux-review.md --- Design & UX · master area # UX Review > *Flow-level assessment — using it as the named person would. Held before release gate.* **Owner:** Designer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Exploratory Testing](/areas/07-quality-testing/exploratory-testing) * [Release Gate Checklist](/areas/09-release-communication/release-gate) --- --- url: /areas/04-design-ux/design-system-governance.md --- Design & UX · master area # Design System Governance > *When to add a component, when to diverge, how to keep code and Figma in sync. The system's own ADR-equivalent.* **Owners:** Designer, PO **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Design System](/areas/04-design-ux/design-system) --- --- url: /areas/04-design-ux/user-testing.md --- Design & UX · master area # User Testing / Usability > *Validating designs with real people before build. Currently a gap in the corpus — observation precedes building, but post-design pre-build validation is not yet addressed.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** Designer, PO **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Observation / Field Research](/areas/02-discovery-research/observation) --- --- url: /areas/05-architecture.md --- master areas · section 5 # Architecture & Technical Design > *The system-side craft.* Where the prediction is given technical shape. Constrained choices become ADRs. Sequence, schema, and API contracts become the trio's shared drawings. Ilities are selected, not assumed. Spike management is a discovery practice, not a delay. Primary owner: Tech Lead. Primary volumes: [III — Scope & Shape](/volumes/iii-scope/) and [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Sequence Diagrams](/areas/05-architecture/sequence-diagrams) | Tech Lead | III | | [Schema Design](/areas/05-architecture/schema-design) | Tech Lead | III | | [API Contract Design](/areas/05-architecture/api-contract) | Tech Lead | III | | [Architecture Decision Records](/areas/05-architecture/adr) | Tech Lead | III | | [Domain Modelling](/areas/05-architecture/domain-modelling) | Tech Lead | III | | [Bounded Context Mapping](/areas/05-architecture/bounded-context) | Tech Lead | IV | | [Service Composition](/areas/05-architecture/service-composition) | Tech Lead | IV | | [State Machine Design](/areas/05-architecture/state-machine) | Tech Lead | III | | [Data Flow Design](/areas/05-architecture/data-flow) | Tech Lead | III | | [System Architecture (C4 / arc42)](/areas/05-architecture/system-architecture) | Tech Lead | III | | [Ilities Selection](/areas/05-architecture/ilities) | Tech Lead, PO | III | | [UX/Product Ilities](/areas/05-architecture/ux-ilities) | Designer, PO | III | | [Spike Management](/areas/05-architecture/spike-management) | Tech Lead | III | | [Integration Design](/areas/05-architecture/integration-design) | Tech Lead | IV | | [Migration Design](/areas/05-architecture/migration-design) | Tech Lead | IV | | [Performance Engineering](/areas/05-architecture/performance) | Tech Lead | IV | | [Security Architecture](/areas/05-architecture/security-architecture) | Tech Lead | IV | --- --- url: /areas/05-architecture/sequence-diagrams.md --- Architecture & Technical Design · master area # Sequence Diagrams > *Request flow through services, with failure paths visible. Every Epic has at least one.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [API Contract Design](/areas/05-architecture/api-contract) * [Data Flow Design](/areas/05-architecture/data-flow) --- --- url: /areas/05-architecture/schema-design.md --- Architecture & Technical Design · master area # Schema Design > *Tables, columns, constraints — what the system remembers. Backward-compatible migration plans, written and reviewed before the cycle starts.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [Migration Design](/areas/05-architecture/migration-design) * [Schema Migration](/areas/06-development-code/schema-migration) --- --- url: /areas/05-architecture/api-contract.md --- Architecture & Technical Design · master area # API Contract Design > *Verb, path, request, response, errors, guarantees. The contract is what the trio agrees on before code.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [Sequence Diagrams](/areas/05-architecture/sequence-diagrams) * [Contract Testing](/areas/06-development-code/contract-testing) --- --- url: /areas/05-architecture/adr.md --- Architecture & Technical Design · master area # Architecture Decision Records > *Constrained technical choices with at least two options considered. MADR format. Never deleted; superseded.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · ADR](/volumes/iii-scope/6-adr) ## Related crafts * [Product Decision Records (PDR)](/areas/03-product-definition/pdr) * [Ilities Selection](/areas/05-architecture/ilities) --- --- url: /areas/05-architecture/domain-modelling.md --- Architecture & Technical Design · master area # Domain Modelling > *Conceptual entities, rules, relationships. The model the code is built on, in domain language.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Domain Language in Code](/areas/06-development-code/domain-language) * [Bounded Context Mapping](/areas/05-architecture/bounded-context) --- --- url: /areas/05-architecture/bounded-context.md --- Architecture & Technical Design · master area # Bounded Context Mapping > *Where language changes at service boundaries. Translation lives at the boundary, not throughout the codebase.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Domain Language in Code](/volumes/iv-execution/1-domain-language) ## Related crafts * [Domain Modelling](/areas/05-architecture/domain-modelling) * [Integration Design](/areas/05-architecture/integration-design) --- --- url: /areas/05-architecture/service-composition.md --- Architecture & Technical Design · master area # Service Composition > *Request-response vs event-driven, synchronous vs asynchronous. Decided in ADRs.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Architecture Decision Records](/areas/05-architecture/adr) * [Sequence Diagrams](/areas/05-architecture/sequence-diagrams) --- --- url: /areas/05-architecture/state-machine.md --- Architecture & Technical Design · master area # State Machine Design > *Lifecycle transitions for entities with more than three states. Disallowed transitions are the silent source of bugs.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [Domain Modelling](/areas/05-architecture/domain-modelling) --- --- url: /areas/05-architecture/data-flow.md --- Architecture & Technical Design · master area # Data Flow Design > *How data crosses system edges. Sources, transformations, persistence, consumers. Especially for integrations and ETL.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [Integration Design](/areas/05-architecture/integration-design) --- --- url: /areas/05-architecture/system-architecture.md --- Architecture & Technical Design · master area # System Architecture (C4 / arc42) > *How services compose, where boundaries fall. Drawn at the level of detail the brief needs.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Bounded Context Mapping](/areas/05-architecture/bounded-context) * [Service Composition](/areas/05-architecture/service-composition) --- --- url: /areas/05-architecture/ilities.md --- Architecture & Technical Design · master area # Ilities Selection > *Which non-functional requirements matter for this Epic, to what level. Defaults are documented in a top-level ADR; deviations are recorded.* **Owners:** Tech Lead, PO **Volume addressing it:** III ## Where it lives * [Volume III · Ilities Selection](/volumes/iii-scope/8-ilities) ## Related crafts * [Architecture Decision Records](/areas/05-architecture/adr) * [UX/Product Ilities](/areas/05-architecture/ux-ilities) --- --- url: /areas/05-architecture/ux-ilities.md --- Architecture & Technical Design · master area # UX/Product Ilities > *Learnability, content clarity, responsiveness, comprehension. Real ilities — checked in design review and QA.* **Owners:** Designer, PO **Volume addressing it:** III ## Where it lives * [Volume III · Ilities Selection](/volumes/iii-scope/8-ilities) ## Related crafts * [Ilities Selection](/areas/05-architecture/ilities) * [Accessibility Design](/areas/04-design-ux/accessibility) --- --- url: /areas/05-architecture/spike-management.md --- Architecture & Technical Design · master area # Spike Management > *Time-boxed investigations before the Epic can be named. Short — 2–5 days — with a written conclusion.* **Owner:** Tech Lead **Volume addressing it:** III ## Where it lives * [Volume II · Technical Design Brief](/volumes/ii-discovery/8-technical-design-brief) ## Related crafts * [Technical Design Brief](/areas/03-product-definition/technical-design-brief) --- --- url: /areas/05-architecture/integration-design.md --- Architecture & Technical Design · master area # Integration Design > *Third-party failure modes, retry, fallback, sandbox. The translation layer at the boundary.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Bounded Context Mapping](/areas/05-architecture/bounded-context) * [Data Flow Design](/areas/05-architecture/data-flow) * [Integration Testing](/areas/07-quality-testing/integration-testing) --- --- url: /areas/05-architecture/migration-design.md --- Architecture & Technical Design · master area # Migration Design > *Backward-compatible, zero-downtime, rollback-planned. Schema changes that can be reversed without data loss.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback) ## Related crafts * [Schema Design](/areas/05-architecture/schema-design) * [Rollback Discipline](/areas/08-pipeline-operations/rollback) --- --- url: /areas/05-architecture/performance.md --- Architecture & Technical Design · master area # Performance Engineering > *Load profiles, bottleneck analysis, optimization. The performance ility from selection to verification.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Ilities Selection](/areas/05-architecture/ilities) * [Performance / Load Testing](/areas/07-quality-testing/performance-testing) --- --- url: /areas/05-architecture/security-architecture.md --- Architecture & Technical Design · master area # Security Architecture > *Threat modelling, auth design, data protection. Re-read at every Epic that touches authorisation or PII.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Security Testing (SAST)](/areas/07-quality-testing/security-testing) * [Secrets Management](/areas/08-pipeline-operations/secrets) --- --- url: /areas/06-development-code.md --- master areas · section 6 # Development & Code > *Writing the software.* The crafts that move scope into running code. Domain language survives the trip. Trunk-based flow keeps integration continuous. Conventional commits, feature flags, code review, and tests at the right layers compound into a codebase the team trusts. Primary owners: Developer, Tech Lead. Primary volume: [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Domain Language in Code](/areas/06-development-code/domain-language) | Developer | IV | | [Trunk-Based Development](/areas/06-development-code/trunk-based) | Developer | IV | | [Conventional Commits](/areas/06-development-code/conventional-commits) | Developer | IV | | [Feature Flag Implementation](/areas/06-development-code/feature-flags) | Developer | IV | | [Component Development](/areas/06-development-code/component-development) | Developer | IV | | [API Implementation](/areas/06-development-code/api-implementation) | Developer | IV | | [Schema Migration](/areas/06-development-code/schema-migration) | Developer | IV | | [Unit Testing](/areas/06-development-code/unit-testing) | Developer | IV | | [Contract Testing](/areas/06-development-code/contract-testing) | Developer | IV | | [Visual Regression Testing](/areas/06-development-code/visual-regression) | Developer, QA | IV | | [Code Review](/areas/06-development-code/code-review) | Developer, Tech Lead | IV | | [Developer Experience (DX)](/areas/06-development-code/dx) | Tech Lead, Developer | IV | | [Pair Programming / Mobbing](/areas/06-development-code/pairing-mobbing) gap | Developer | — | | [Refactoring](/areas/06-development-code/refactoring) | Developer | IV | | [Technical Debt Management](/areas/06-development-code/tech-debt) | Tech Lead | V | --- --- url: /areas/06-development-code/domain-language.md --- Development & Code · master area # Domain Language in Code > *Functions, variables, modules named from the brief. The named person's vocabulary appears in code, API, schema, events.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Domain Language in Code](/volumes/iv-execution/1-domain-language) ## Related crafts * [Domain Immersion](/areas/02-discovery-research/domain-immersion) * [Domain Modelling](/areas/05-architecture/domain-modelling) --- --- url: /areas/06-development-code/trunk-based.md --- Development & Code · master area # Trunk-Based Development > *Short-lived branches, continuous integration into main. Branches over five days are postmortem candidates.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Trunk-Based Development](/volumes/iv-execution/2-trunk-based) ## Related crafts * [Feature Flag Implementation](/areas/06-development-code/feature-flags) * [Code Review](/areas/06-development-code/code-review) --- --- url: /areas/06-development-code/conventional-commits.md --- Development & Code · master area # Conventional Commits > *type(scope): description — STORY-ref. Auto-feeds the changelog.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Trunk-Based Development](/volumes/iv-execution/2-trunk-based) ## Related crafts * [Changelog Generation](/areas/09-release-communication/changelog) --- --- url: /areas/06-development-code/feature-flags.md --- Development & Code · master area # Feature Flag Implementation > *Wrapping new behavior so rollback is one switch. Both flag-on and flag-off paths are tested.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Feature Flags](/volumes/iv-execution/3-feature-flags) ## Related crafts * [Feature Flag Platform](/areas/08-pipeline-operations/feature-flag-platform) * [Flag Lifecycle](/areas/08-pipeline-operations/flag-lifecycle) --- --- url: /areas/06-development-code/component-development.md --- Development & Code · master area # Component Development > *Building from the design system, Storybook-first. Visual regression baseline lands at component creation.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Storybook](/areas/04-design-ux/storybook) * [Design System](/areas/04-design-ux/design-system) --- --- url: /areas/06-development-code/api-implementation.md --- Development & Code · master area # API Implementation > *Matching the contract from Volume III shaping. Drift between contract and implementation is the most common source of broke-the-API tickets.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume III · Sequence, Schema, API](/volumes/iii-scope/7-sequence-schema-api) ## Related crafts * [API Contract Design](/areas/05-architecture/api-contract) * [Contract Testing](/areas/06-development-code/contract-testing) --- --- url: /areas/06-development-code/schema-migration.md --- Development & Code · master area # Schema Migration > *Backward-compatible, staged, tested in staging. Reversible by default.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback) ## Related crafts * [Migration Design](/areas/05-architecture/migration-design) * [Rollback Discipline](/areas/08-pipeline-operations/rollback) --- --- url: /areas/06-development-code/unit-testing.md --- Development & Code · master area # Unit Testing > *Smallest unit of test. Single function or module. Runs in milliseconds. Logic written down twice.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Amigos Session](/areas/07-quality-testing/amigos) --- --- url: /areas/06-development-code/contract-testing.md --- Development & Code · master area # Contract Testing > *Boundary tests — given this caller sends X, the service returns Y. Derived from the API contract.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [API Contract Design](/areas/05-architecture/api-contract) * [Integration Testing](/areas/07-quality-testing/integration-testing) --- --- url: /areas/06-development-code/visual-regression.md --- Development & Code · master area # Visual Regression Testing > *Figma named state vs rendered output. Failures surface as image diffs the Designer reviews.* **Owners:** Developer, QA **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Visual Regression Baseline](/areas/04-design-ux/visual-regression) * [Storybook](/areas/04-design-ux/storybook) --- --- url: /areas/06-development-code/code-review.md --- Development & Code · master area # Code Review > *Domain naming, ADR compliance, scenario coverage. The reviewer reads the brief before reading the diff.* **Owners:** Developer, Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Trunk-Based Development](/volumes/iv-execution/2-trunk-based) ## Related crafts * [Domain Language in Code](/areas/06-development-code/domain-language) --- --- url: /areas/06-development-code/dx.md --- Development & Code · master area # Developer Experience (DX) > *Local setup, hot reload, type safety, linting. The cost of friction in the inner loop compounds across cycles.* **Owners:** Tech Lead, Developer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Environment Management](/areas/08-pipeline-operations/environments) --- --- url: /areas/06-development-code/pairing-mobbing.md --- Development & Code · master area # Pair Programming / Mobbing > *Knowledge transfer, complex problem solving. Currently a gap in the corpus — practice exists in some teams but is not yet addressed in any volume.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owner:** Developer **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Knowledge Retention](/areas/12-team-organizational/knowledge-retention) --- --- url: /areas/06-development-code/refactoring.md --- Development & Code · master area # Refactoring > *Improving code without changing behavior. Often in service of a brief — refactor *with* the cycle, not separately.* **Owner:** Developer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Technical Debt Management](/areas/06-development-code/tech-debt) --- --- url: /areas/06-development-code/tech-debt.md --- Development & Code · master area # Technical Debt Management > *The gap between what the chain produced and what it should have. Tracked by chain level. Read at portfolio review.* **Owner:** Tech Lead **Volume addressing it:** V ## Where it lives * [Volume V · The Portfolio](/volumes/v-after-we-build/9-portfolio) ## Related crafts * [Financial Translation](/areas/01-strategy-direction/financial-translation) * [System Signal Tracking (DORA)](/areas/10-post-release-learning/dora-signals) --- --- url: /areas/07-quality-testing.md --- master areas · section 7 # Quality & Testing > *Verifying the chain holds.* The crafts that prove what was predicted. Amigos sessions produce shared meaning before code; Gherkin scenarios become the test specifications; pre-merge QA checks the branch; exploratory testing finds what nobody named. Primary owner: QA. Primary volumes: [III — Scope & Shape](/volumes/iii-scope/) and [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Amigos Session](/areas/07-quality-testing/amigos) | PO, Developer, QA | III | | [Gherkin Scenario Writing](/areas/07-quality-testing/gherkin) | QA | III | | [Pre-Merge QA Verification](/areas/07-quality-testing/pre-merge-qa) | QA | IV | | [Exploratory Testing](/areas/07-quality-testing/exploratory-testing) | QA | IV | | [QA Report](/areas/07-quality-testing/qa-report) | QA | IV | | [Accessibility Testing](/areas/07-quality-testing/accessibility-testing) | QA, Designer | IV | | [Integration Testing](/areas/07-quality-testing/integration-testing) | QA, Developer | IV | | [Performance / Load Testing](/areas/07-quality-testing/performance-testing) | QA, Tech Lead | IV | | [Security Testing (SAST)](/areas/07-quality-testing/security-testing) | Pipeline, Tech Lead | IV | | [Regression Testing](/areas/07-quality-testing/regression-testing) | QA | IV | | [Test Suite Maintenance](/areas/07-quality-testing/test-suite-maintenance) | QA, Developer | IV | --- --- url: /areas/07-quality-testing/amigos.md --- Quality & Testing · master area # Amigos Session > *Trio + 45 minutes + one story → named Gherkin scenarios. The smallest unit of shared meaning between phases.* **Owners:** PO, Developer, QA **Volume addressing it:** III ## Where it lives * [Volume III · Amigos & Gherkin](/volumes/iii-scope/5-amigos-gherkin) ## Related crafts * [Gherkin Scenario Writing](/areas/07-quality-testing/gherkin) * [Story Writing](/areas/03-product-definition/story-writing) --- --- url: /areas/07-quality-testing/gherkin.md --- Quality & Testing · master area # Gherkin Scenario Writing > *Given (the named person's situation) / When / Then. Negative cases included. Lives next to the story.* **Owner:** QA **Volume addressing it:** III ## Where it lives * [Volume III · Amigos & Gherkin](/volumes/iii-scope/5-amigos-gherkin) ## Related crafts * [Amigos Session](/areas/07-quality-testing/amigos) * [Story Writing](/areas/03-product-definition/story-writing) --- --- url: /areas/07-quality-testing/pre-merge-qa.md --- Quality & Testing · master area # Pre-Merge QA Verification > *Branch verification of Gherkin scenarios + exploratory testing + accessibility check, before PR can merge.* **Owner:** QA **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Exploratory Testing](/areas/07-quality-testing/exploratory-testing) * [QA Report](/areas/07-quality-testing/qa-report) --- --- url: /areas/07-quality-testing/exploratory-testing.md --- Quality & Testing · master area # Exploratory Testing > *Beyond scenarios — using the feature like the named person would. Looking for the moments nobody named.* **Owner:** QA **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Pre-Merge QA Verification](/areas/07-quality-testing/pre-merge-qa) * [QA Report](/areas/07-quality-testing/qa-report) --- --- url: /areas/07-quality-testing/qa-report.md --- Quality & Testing · master area # QA Report > *What was tested, what was explored, what was not, what surprised. Filed against the PR; read at signal reading and postmortem.* **Owner:** QA **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Pre-Merge QA Verification](/areas/07-quality-testing/pre-merge-qa) * [Bug Filing & Triage](/areas/10-post-release-learning/bug-triage) --- --- url: /areas/07-quality-testing/accessibility-testing.md --- Quality & Testing · master area # Accessibility Testing > *Keyboard, screen reader, contrast, focus states. Automated and manual. Failures block merge unless explicitly accepted with a remediation date.* **Owners:** QA, Designer **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Accessibility Design](/areas/04-design-ux/accessibility) --- --- url: /areas/07-quality-testing/integration-testing.md --- Quality & Testing · master area # Integration Testing > *Wires several modules together with real dependencies. Catches the wiring mistakes no unit test sees.* **Owners:** QA, Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Contract Testing](/areas/06-development-code/contract-testing) * [Integration Design](/areas/05-architecture/integration-design) --- --- url: /areas/07-quality-testing/performance-testing.md --- Quality & Testing · master area # Performance / Load Testing > *SLO verification in staging, shaped against expected load profile. Prediction-checked like everything else.* **Owners:** QA, Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Testing Layers](/volumes/iv-execution/5-testing) ## Related crafts * [Performance Engineering](/areas/05-architecture/performance) * [Ilities Selection](/areas/05-architecture/ilities) --- --- url: /areas/07-quality-testing/security-testing.md --- Quality & Testing · master area # Security Testing (SAST) > *Static analysis, dependency audit, secrets scan. Pipeline stage 5. High/critical findings block release.* **Owners:** Pipeline, Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [Secrets Management](/areas/08-pipeline-operations/secrets) * [Security Architecture](/areas/05-architecture/security-architecture) --- --- url: /areas/07-quality-testing/regression-testing.md --- Quality & Testing · master area # Regression Testing > *Full slice verification before release. The Gherkin set + the explicit critical paths.* **Owner:** QA **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Gherkin Scenario Writing](/areas/07-quality-testing/gherkin) * [Visual Regression Baseline](/areas/04-design-ux/visual-regression) --- --- url: /areas/07-quality-testing/test-suite-maintenance.md --- Quality & Testing · master area # Test Suite Maintenance > *Keeping tests current, removing flaky ones. Tests are part of the codebase — reviewed, maintained, pruned.* **Owners:** QA, Developer **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Unit Testing](/areas/06-development-code/unit-testing) --- --- url: /areas/08-pipeline-operations.md --- master areas · section 8 # Pipeline & Operations > *The machinery that carries code to production.* Each pipeline stage catches a different chain level. Feature flags wrap behavior; runbooks make 3am acts mechanical; observability builds the legibility Volume V will read. This section's discipline is operational: written-before-needed beats invented-during-incident. Primary owners: Tech Lead, DevOps. Primary volume: [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Environment Management](/areas/08-pipeline-operations/environments) | Tech Lead, DevOps | IV | | [CI/CD Pipeline](/areas/08-pipeline-operations/cicd) | Tech Lead, DevOps | IV | | [Pre-Commit Hooks](/areas/08-pipeline-operations/pre-commit-hooks) | Tech Lead | IV | | [Feature Flag Platform](/areas/08-pipeline-operations/feature-flag-platform) | Tech Lead, DevOps | IV | | [Flag Lifecycle](/areas/08-pipeline-operations/flag-lifecycle) | PO, Tech Lead | IV | | [Monitoring & Alerting](/areas/08-pipeline-operations/monitoring-alerting) | Tech Lead, DevOps | IV, V | | [Product Analytics](/areas/08-pipeline-operations/product-analytics) | PO, Developer | IV | | [Observability](/areas/08-pipeline-operations/observability) | Tech Lead, DevOps | IV | | [Runbooks](/areas/08-pipeline-operations/runbooks) | Tech Lead | IV | | [Rollback Discipline](/areas/08-pipeline-operations/rollback) | Tech Lead | IV | | [Infrastructure as Code](/areas/08-pipeline-operations/iac) gap | DevOps | — | | [Secrets Management](/areas/08-pipeline-operations/secrets) | DevOps, Security | IV | | [On-Call Rotation](/areas/08-pipeline-operations/on-call) | Tech Lead | IV, V | --- --- url: /areas/08-pipeline-operations/environments.md --- Pipeline & Operations · master area # Environment Management > *Dev / staging / production — purposes and rules. Test in staging unless the alternative is impossible.* **Owners:** Tech Lead, DevOps **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [Developer Experience (DX)](/areas/06-development-code/dx) --- --- url: /areas/08-pipeline-operations/cicd.md --- Pipeline & Operations · master area # CI/CD Pipeline > *Six stages, each catching a different chain level. Pre-commit → build → unit → integration → visual → security → deploy.* **Owners:** Tech Lead, DevOps **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [Pre-Commit Hooks](/areas/08-pipeline-operations/pre-commit-hooks) * [Code Review](/areas/06-development-code/code-review) --- --- url: /areas/08-pipeline-operations/pre-commit-hooks.md --- Pipeline & Operations · master area # Pre-Commit Hooks > *Format, lint, secrets scan before code enters repo. Stage 0 of the pipeline. Never bypass.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [CI/CD Pipeline](/areas/08-pipeline-operations/cicd) * [Secrets Management](/areas/08-pipeline-operations/secrets) --- --- url: /areas/08-pipeline-operations/feature-flag-platform.md --- Pipeline & Operations · master area # Feature Flag Platform > *Runtime toggles, targeting, audit, SDK. Default-off when unreachable.* **Owners:** Tech Lead, DevOps **Volume addressing it:** IV ## Where it lives * [Volume IV · Feature Flags](/volumes/iv-execution/3-feature-flags) ## Related crafts * [Feature Flag Implementation](/areas/06-development-code/feature-flags) * [Flag Lifecycle](/areas/08-pipeline-operations/flag-lifecycle) --- --- url: /areas/08-pipeline-operations/flag-lifecycle.md --- Pipeline & Operations · master area # Flag Lifecycle > *Create → wire → test → enable → stabilise → clean up. Cleanup is the step most teams skip.* **Owners:** PO, Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Feature Flags](/volumes/iv-execution/3-feature-flags) ## Related crafts * [Feature Flag Implementation](/areas/06-development-code/feature-flags) * [Feature Flag Platform](/areas/08-pipeline-operations/feature-flag-platform) --- --- url: /areas/08-pipeline-operations/monitoring-alerting.md --- Pipeline & Operations · master area # Monitoring & Alerting > *Dashboards, SLO thresholds, alert routing. Each alert has a runbook link.* **Owners:** Tech Lead, DevOps **Volumes addressing it:** IV, V ## Where it lives * [Volume IV · Observability](/volumes/iv-execution/9-observability) * [Volume V · First 48 Hours](/volumes/v-after-we-build/1-first-48-hours) ## Related crafts * [Observability](/areas/08-pipeline-operations/observability) * [Runbooks](/areas/08-pipeline-operations/runbooks) --- --- url: /areas/08-pipeline-operations/product-analytics.md --- Pipeline & Operations · master area # Product Analytics > *Events named for the named person's actions. subject.verb format. Feeds Volume V signal reading.* **Owners:** PO, Developer **Volume addressing it:** IV ## Where it lives * [Volume IV · Observability](/volumes/iv-execution/9-observability) ## Related crafts * [Signal Reading](/areas/10-post-release-learning/signal-reading) * [Leading Signal Tracking](/areas/10-post-release-learning/leading-signals) --- --- url: /areas/08-pipeline-operations/observability.md --- Pipeline & Operations · master area # Observability > *Logs, traces, metrics, events. Built with the feature, not after. The system is legible by gate time.* **Owners:** Tech Lead, DevOps **Volume addressing it:** IV ## Where it lives * [Volume IV · Observability](/volumes/iv-execution/9-observability) ## Related crafts * [Monitoring & Alerting](/areas/08-pipeline-operations/monitoring-alerting) * [System Signal Tracking (DORA)](/areas/10-post-release-learning/dora-signals) --- --- url: /areas/08-pipeline-operations/runbooks.md --- Pipeline & Operations · master area # Runbooks > *Written procedures for known operational situations. Authored before the incident, rehearsed in staging.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback) ## Related crafts * [Rollback Discipline](/areas/08-pipeline-operations/rollback) * [Incident Management](/areas/10-post-release-learning/incident-management) --- --- url: /areas/08-pipeline-operations/rollback.md --- Pipeline & Operations · master area # Rollback Discipline > *Four levels — flag, deploy, migration, data. Plan for the level you might need before you ship.* **Owner:** Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback) ## Related crafts * [Runbooks](/areas/08-pipeline-operations/runbooks) * [Migration Design](/areas/05-architecture/migration-design) --- --- url: /areas/08-pipeline-operations/iac.md --- Pipeline & Operations · master area # Infrastructure as Code > *Environment provisioning, reproducibility. Currently a gap in the corpus — the practice exists but is not yet addressed in any volume.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owner:** DevOps **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Environment Management](/areas/08-pipeline-operations/environments) * [Secrets Management](/areas/08-pipeline-operations/secrets) --- --- url: /areas/08-pipeline-operations/secrets.md --- Pipeline & Operations · master area # Secrets Management > *Rotation, storage, access control. Never in logs, never in repos. Pipeline stage 5 catches leaks.* **Owners:** DevOps, Security **Volume addressing it:** IV ## Where it lives * [Volume IV · CI/CD Pipeline](/volumes/iv-execution/4-pipeline) ## Related crafts * [Security Testing (SAST)](/areas/07-quality-testing/security-testing) --- --- url: /areas/08-pipeline-operations/on-call.md --- Pipeline & Operations · master area # On-Call Rotation > *Scheduled, with coverage for 48 hours post-release. The release-gate condition.* **Owner:** Tech Lead **Volumes addressing it:** IV, V ## Where it lives * [Volume V · First 48 Hours](/volumes/v-after-we-build/1-first-48-hours) ## Related crafts * [First 48 Hours Monitoring](/areas/10-post-release-learning/first-48-hours) * [Incident Management](/areas/10-post-release-learning/incident-management) --- --- url: /areas/09-release-communication.md --- master areas · section 9 # Release & Communication > *Getting the feature to people safely.* The crafts that take the cycle from code-complete to live-and-watched. Release gate as state, not meeting; gradual rollout with named exit criteria; CS handoff before the customer sees the change. This section is where the cycle becomes visible to the people it is for. Primary owner: PO. Primary volume: [IV — Execution](/volumes/iv-execution/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Release Gate Checklist](/areas/09-release-communication/release-gate) | PO | IV | | [Gradual Rollout](/areas/09-release-communication/gradual-rollout) | PO, Tech Lead | IV | | [Release Brief to Client](/areas/09-release-communication/release-brief) | PO | IV | | [CS Handoff](/areas/09-release-communication/cs-handoff) | PO, CS Lead | IV | | [Client Notes](/areas/09-release-communication/client-notes) | PO | IV | | [Prediction Naming to Client](/areas/09-release-communication/prediction-naming) | PO | IV | | [Status Page Management](/areas/09-release-communication/status-page) | Tech Lead | V | | [Changelog Generation](/areas/09-release-communication/changelog) | Pipeline | IV | --- --- url: /areas/09-release-communication/release-gate.md --- Release & Communication · master area # Release Gate Checklist > *Named conditions the chain must satisfy before the flag can be enabled. A state, not a meeting.* **Owner:** PO **Volume addressing it:** IV ## Where it lives * [Volume IV · Release Gate](/volumes/iv-execution/6-release-gate) ## Related crafts * [Gradual Rollout](/areas/09-release-communication/gradual-rollout) * [Runbooks](/areas/08-pipeline-operations/runbooks) --- --- url: /areas/09-release-communication/gradual-rollout.md --- Release & Communication · master area # Gradual Rollout > *Pilot → percentage ramp → full, with named exit criteria at each step.* **Owners:** PO, Tech Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Gradual Rollout](/volumes/iv-execution/7-gradual-rollout) ## Related crafts * [Feature Flag Platform](/areas/08-pipeline-operations/feature-flag-platform) * [Release Gate Checklist](/areas/09-release-communication/release-gate) --- --- url: /areas/09-release-communication/release-brief.md --- Release & Communication · master area # Release Brief to Client > *What changed, what to expect, what is not yet available, how to reach the team. Shared before enablement.* **Owner:** PO **Volume addressing it:** IV ## Where it lives * [Volume IV · Gradual Rollout](/volumes/iv-execution/7-gradual-rollout) ## Related crafts * [CS Handoff](/areas/09-release-communication/cs-handoff) * [Client Notes](/areas/09-release-communication/client-notes) --- --- url: /areas/09-release-communication/cs-handoff.md --- Release & Communication · master area # CS Handoff > *Happy path, limitations, likely questions, escalation path. CS reads it before customers do.* **Owners:** PO, CS Lead **Volume addressing it:** IV ## Where it lives * [Volume IV · Gradual Rollout](/volumes/iv-execution/7-gradual-rollout) ## Related crafts * [Support Levels (L1/L2/L3)](/areas/11-ongoing-operations-client/support-levels) * [Release Brief to Client](/areas/09-release-communication/release-brief) --- --- url: /areas/09-release-communication/client-notes.md --- Release & Communication · master area # Client Notes > *Decisions or changes from conversations, linked to Epic and brief. The relationship's memory.* **Owner:** PO **Volume addressing it:** IV ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Weekly Client Update](/areas/11-ongoing-operations-client/weekly-client-update) --- --- url: /areas/09-release-communication/prediction-naming.md --- Release & Communication · master area # Prediction Naming to Client > *We expect X. We will check on date Y. We will tell you the result. Predictions are made out loud.* **Owner:** PO **Volume addressing it:** IV ## Where it lives * [Volume II · Prediction Writing](/volumes/ii-discovery/9-prediction-writing) ## Related crafts * [Prediction Writing](/areas/03-product-definition/prediction-writing) * [Weekly Client Update](/areas/11-ongoing-operations-client/weekly-client-update) --- --- url: /areas/09-release-communication/status-page.md --- Release & Communication · master area # Status Page Management > *Automatic on P0; manual updates during incidents. Accurate and current builds trust; stale destroys it.* **Owner:** Tech Lead **Volume addressing it:** V ## Where it lives * [Volume IV · Runbooks & Rollback](/volumes/iv-execution/8-runbooks-rollback) ## Related crafts * [Incident Management](/areas/10-post-release-learning/incident-management) * [Escalation](/areas/10-post-release-learning/escalation) --- --- url: /areas/09-release-communication/changelog.md --- Release & Communication · master area # Changelog Generation > *Auto-built from conventional commits. Reviewed before the release brief.* **Owner:** Pipeline **Volume addressing it:** IV ## Where it lives * [Volume IV · Trunk-Based Development](/volumes/iv-execution/2-trunk-based) ## Related crafts * [Conventional Commits](/areas/06-development-code/conventional-commits) --- --- url: /areas/10-post-release-learning.md --- master areas · section 10 # Post-Release & Learning > *Finding out what actually happened.* The largest section in the corpus, because the chain has the most to learn here. The first 48 hours are watched; the prediction is checked; bugs are classified by chain level; incidents are contained then diagnosed; the model is updated so the learning survives the conversation. Primary owners: PO, Tech Lead. Primary volume: [V — After We Build](/volumes/v-after-we-build/). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [First 48 Hours Monitoring](/areas/10-post-release-learning/first-48-hours) | On-call, PO | V | | [Signal Reading](/areas/10-post-release-learning/signal-reading) | PO | V | | [Leading Signal Tracking](/areas/10-post-release-learning/leading-signals) | PO, Data | IV, V | | [Lagging Signal Tracking](/areas/10-post-release-learning/lagging-signals) | PO, Data | V | | [System Signal Tracking (DORA)](/areas/10-post-release-learning/dora-signals) | Tech Lead | IV, V | | [Bug Filing & Triage](/areas/10-post-release-learning/bug-triage) | QA, Team | V | | [Root Cause Analysis (5 Whys)](/areas/10-post-release-learning/rca-5-whys) | Tech Lead, PO | V | | [Incident Management](/areas/10-post-release-learning/incident-management) | Tech Lead, On-call | V | | [Escalation](/areas/10-post-release-learning/escalation) | On-call, Leadership | V | | [De-escalation](/areas/10-post-release-learning/de-escalation) | On-call, Tech Lead | V | | [Postmortem](/areas/10-post-release-learning/postmortem) | Tech Lead, PO | V | | [Retrospective](/areas/10-post-release-learning/retrospective) | PO, Trio | V | | [Model Update](/areas/10-post-release-learning/model-update) | PO | V | --- --- url: /areas/10-post-release-learning/first-48-hours.md --- Post-Release & Learning · master area # First 48 Hours Monitoring > *SLO watch, analytics verification, support channel. The discipline of not reacting incorrectly.* **Owners:** On-call, PO **Volume addressing it:** V ## Where it lives * [Volume V · First 48 Hours](/volumes/v-after-we-build/1-first-48-hours) ## Related crafts * [On-Call Rotation](/areas/08-pipeline-operations/on-call) * [Monitoring & Alerting](/areas/08-pipeline-operations/monitoring-alerting) --- --- url: /areas/10-post-release-learning/signal-reading.md --- Post-Release & Learning · master area # Signal Reading > *The cycle's check — prediction vs measured reality. Witnessed, not surveyed.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction) ## Related crafts * [Prediction Writing](/areas/03-product-definition/prediction-writing) * [Model Update](/areas/10-post-release-learning/model-update) --- --- url: /areas/10-post-release-learning/leading-signals.md --- Post-Release & Learning · master area # Leading Signal Tracking > *Adoption, completion, error encounter rate. The early indicators that arrive in days.* **Owners:** PO, Data **Volumes addressing it:** IV, V ## Where it lives * [Volume IV · Observability](/volumes/iv-execution/9-observability) ## Related crafts * [Product Analytics](/areas/08-pipeline-operations/product-analytics) * [Lagging Signal Tracking](/areas/10-post-release-learning/lagging-signals) --- --- url: /areas/10-post-release-learning/lagging-signals.md --- Post-Release & Learning · master area # Lagging Signal Tracking > *Prediction metric, support volume, retention. The verdicts that arrive in weeks.* **Owners:** PO, Data **Volume addressing it:** V ## Where it lives * [Volume V · Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction) ## Related crafts * [Leading Signal Tracking](/areas/10-post-release-learning/leading-signals) * [Helpdesk Metrics](/areas/11-ongoing-operations-client/helpdesk-metrics) --- --- url: /areas/10-post-release-learning/dora-signals.md --- Post-Release & Learning · master area # System Signal Tracking (DORA) > *Lead time, change failure rate, deploy frequency, MTTR. Tracked at portfolio level, not story level.* **Owner:** Tech Lead **Volumes addressing it:** IV, V ## Where it lives * [Volume V · The Portfolio](/volumes/v-after-we-build/9-portfolio) ## Related crafts * [Portfolio Management](/areas/01-strategy-direction/portfolio-management) * [Technical Debt Management](/areas/06-development-code/tech-debt) --- --- url: /areas/10-post-release-learning/bug-triage.md --- Post-Release & Learning · master area # Bug Filing & Triage > *Six-dimension taxonomy including chain level. Daily for the first week post-release; twice-weekly after.* **Owners:** QA, Team **Volume addressing it:** V ## Where it lives * [Volume V · Bugs and Their Roots](/volumes/v-after-we-build/3-bugs-and-their-roots) ## Related crafts * [Root Cause Analysis (5 Whys)](/areas/10-post-release-learning/rca-5-whys) * [Support-to-Bug Pipeline](/areas/11-ongoing-operations-client/support-to-bug) --- --- url: /areas/10-post-release-learning/rca-5-whys.md --- Post-Release & Learning · master area # Root Cause Analysis (5 Whys) > *Why-chain anchored to chain levels, not individuals. *Why did the chain not catch this before this person*.* **Owners:** Tech Lead, PO **Volume addressing it:** V ## Where it lives * [Volume V · Incidents and Postmortems](/volumes/v-after-we-build/4-incidents-postmortems) ## Related crafts * [Postmortem](/areas/10-post-release-learning/postmortem) * [Bug Filing & Triage](/areas/10-post-release-learning/bug-triage) --- --- url: /areas/10-post-release-learning/incident-management.md --- Post-Release & Learning · master area # Incident Management > *Detect → contain → communicate → resolve. Three roles even on one-person on-call.* **Owners:** Tech Lead, On-call **Volume addressing it:** V ## Where it lives * [Volume V · Incidents and Postmortems](/volumes/v-after-we-build/4-incidents-postmortems) ## Related crafts * [Escalation](/areas/10-post-release-learning/escalation) * [Runbooks](/areas/08-pipeline-operations/runbooks) * [Rollback Discipline](/areas/08-pipeline-operations/rollback) --- --- url: /areas/10-post-release-learning/escalation.md --- Post-Release & Learning · master area # Escalation > *Information flow by priority level. No-surprises rule — leadership hears from the commander first.* **Owners:** On-call, Leadership **Volume addressing it:** V ## Where it lives * [Volume V · Incidents and Postmortems](/volumes/v-after-we-build/4-incidents-postmortems) ## Related crafts * [Incident Management](/areas/10-post-release-learning/incident-management) * [De-escalation](/areas/10-post-release-learning/de-escalation) --- --- url: /areas/10-post-release-learning/de-escalation.md --- Post-Release & Learning · master area # De-escalation > *Stand down, archive, check on the people who took the page. Deliberate, not implicit.* **Owners:** On-call, Tech Lead **Volume addressing it:** V ## Where it lives * [Volume V · Incidents and Postmortems](/volumes/v-after-we-build/4-incidents-postmortems) ## Related crafts * [Incident Management](/areas/10-post-release-learning/incident-management) --- --- url: /areas/10-post-release-learning/postmortem.md --- Post-Release & Learning · master area # Postmortem > *Which chain level was missing? Structural fix, owned, dated. Within five working days.* **Owners:** Tech Lead, PO **Volume addressing it:** V ## Where it lives * [Volume V · Incidents and Postmortems](/volumes/v-after-we-build/4-incidents-postmortems) ## Related crafts * [Root Cause Analysis (5 Whys)](/areas/10-post-release-learning/rca-5-whys) * [Model Update](/areas/10-post-release-learning/model-update) --- --- url: /areas/10-post-release-learning/retrospective.md --- Post-Release & Learning · master area # Retrospective > *Three questions, one change, owned, testable. Compounding rather than listing.* **Owners:** PO, Trio **Volume addressing it:** V ## Where it lives * [Volume V · The Retrospective](/volumes/v-after-we-build/5-retrospective) ## Related crafts * [Model Update](/areas/10-post-release-learning/model-update) * [Practice Sequencing](/areas/13-adoption-evolution/practice-sequencing) --- --- url: /areas/10-post-release-learning/model-update.md --- Post-Release & Learning · master area # Model Update > *Close assumptions, add new ones, append signal reading, sharpen open questions, update templates. The step most teams skip.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · The Model Update](/volumes/v-after-we-build/6-model-update) ## Related crafts * [Assumption Surfacing](/areas/02-discovery-research/assumption-surfacing) * [Feature Brief](/areas/03-product-definition/feature-brief) --- --- url: /areas/11-ongoing-operations-client.md --- master areas · section 11 # Ongoing Operations & Client > *The continuous relationship.* The crafts that hold between cycles. Three-level support, the SLA as operational contract, helpdesk metrics, weekly written updates, bi-weekly syncs, quarterly portfolio reviews. The cadence is not administrative — it is what keeps the relationship legible. Primary owners: PO, CS Lead. Primary volume: [V — After We Build](/volumes/v-after-we-build/) (Part 7). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Support Levels (L1/L2/L3)](/areas/11-ongoing-operations-client/support-levels) | CS, QA, Dev | V | | [Support-to-Bug Pipeline](/areas/11-ongoing-operations-client/support-to-bug) | CS, PO | V | | [SLA Definition](/areas/11-ongoing-operations-client/sla-definition) | PO, Tech Lead | V | | [SLA Monitoring & Breach Protocol](/areas/11-ongoing-operations-client/sla-breach) | Tech Lead, PO | V | | [SLA Review Cadence](/areas/11-ongoing-operations-client/sla-review) | PO, Client | V | | [Helpdesk Metrics](/areas/11-ongoing-operations-client/helpdesk-metrics) | CS Lead | V | | [Weekly Client Update](/areas/11-ongoing-operations-client/weekly-client-update) | PO | V | | [Bi-weekly Client Sync](/areas/11-ongoing-operations-client/bi-weekly-sync) | PO, Client | V | | [Quarterly Portfolio Review](/areas/11-ongoing-operations-client/quarterly-portfolio) | PO, Leadership, Client | V | --- --- url: /areas/11-ongoing-operations-client/support-levels.md --- Ongoing Operations & Client · master area # Support Levels (L1/L2/L3) > *Who responds, what they resolve, when they escalate. CS does not page L3 directly.* **Owners:** CS, QA, Developer **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [Support-to-Bug Pipeline](/areas/11-ongoing-operations-client/support-to-bug) * [Escalation](/areas/10-post-release-learning/escalation) --- --- url: /areas/11-ongoing-operations-client/support-to-bug.md --- Ongoing Operations & Client · master area # Support-to-Bug Pipeline > *CS files patterns weekly. PO classifies by chain level. Volume that produces no bugs is over-staffed or under-listened-to.* **Owners:** CS, PO **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [Bug Filing & Triage](/areas/10-post-release-learning/bug-triage) * [Support Levels (L1/L2/L3)](/areas/11-ongoing-operations-client/support-levels) --- --- url: /areas/11-ongoing-operations-client/sla-definition.md --- Ongoing Operations & Client · master area # SLA Definition > *Availability, response time, resolution time, data integrity. Threshold past which someone is paged.* **Owners:** PO, Tech Lead **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [SLA Monitoring & Breach Protocol](/areas/11-ongoing-operations-client/sla-breach) * [SLA Review Cadence](/areas/11-ongoing-operations-client/sla-review) --- --- url: /areas/11-ongoing-operations-client/sla-breach.md --- Ongoing Operations & Client · master area # SLA Monitoring & Breach Protocol > *Early warning, contain, communicate, resolve. Always followed by a postmortem.* **Owners:** Tech Lead, PO **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [SLA Definition](/areas/11-ongoing-operations-client/sla-definition) * [Postmortem](/areas/10-post-release-learning/postmortem) --- --- url: /areas/11-ongoing-operations-client/sla-review.md --- Ongoing Operations & Client · master area # SLA Review Cadence > *Quarterly with the client. Met / approached / categories still right. Reviewed quarterly stays a contract; never reviewed becomes a souvenir.* **Owners:** PO, Client **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [SLA Definition](/areas/11-ongoing-operations-client/sla-definition) * [Quarterly Portfolio Review](/areas/11-ongoing-operations-client/quarterly-portfolio) --- --- url: /areas/11-ongoing-operations-client/helpdesk-metrics.md --- Ongoing Operations & Client · master area # Helpdesk Metrics > *FRT, resolution time, categories, escalation rate, satisfaction. Read at bi-weekly sync.* **Owner:** CS Lead **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [Support Levels (L1/L2/L3)](/areas/11-ongoing-operations-client/support-levels) * [Bi-weekly Client Sync](/areas/11-ongoing-operations-client/bi-weekly-sync) --- --- url: /areas/11-ongoing-operations-client/weekly-client-update.md --- Ongoing Operations & Client · master area # Weekly Client Update > *Friday before 4pm. ~200 words. Three sections — shipped, in progress, blocked. Held even when nothing changed.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [Client Relationship Strategy](/areas/01-strategy-direction/client-relationship) * [Bi-weekly Client Sync](/areas/11-ongoing-operations-client/bi-weekly-sync) --- --- url: /areas/11-ongoing-operations-client/bi-weekly-sync.md --- Ongoing Operations & Client · master area # Bi-weekly Client Sync > *45 minutes. Fixed agenda — signal readings, scope decisions, CS patterns. The client speaks last.* **Owners:** PO, Client **Volume addressing it:** V ## Where it lives * [Volume V · The Ongoing Relationship](/volumes/v-after-we-build/7-ongoing-relationship) ## Related crafts * [Weekly Client Update](/areas/11-ongoing-operations-client/weekly-client-update) * [Helpdesk Metrics](/areas/11-ongoing-operations-client/helpdesk-metrics) --- --- url: /areas/11-ongoing-operations-client/quarterly-portfolio.md --- Ongoing Operations & Client · master area # Quarterly Portfolio Review > *SLA performance, VRI trends, root-cause patterns. Produces fund / continue / kill decisions.* **Owners:** PO, Leadership, Client **Volume addressing it:** V ## Where it lives * [Volume V · The Portfolio](/volumes/v-after-we-build/9-portfolio) ## Related crafts * [Portfolio Management](/areas/01-strategy-direction/portfolio-management) * [SLA Review Cadence](/areas/11-ongoing-operations-client/sla-review) --- --- url: /areas/12-team-organizational.md --- master areas · section 12 # Team & Organizational > *The human infrastructure.* The chain works because people work it. Onboarding is a cycle, not a week. T-shaped people make small teams possible. Psychological safety is structural — measured by whether the chain can hear what it needs to hear. Knowledge retention lives in artifacts, not heads. Primary owners: Leadership, PO. Primary volume: [V — After We Build](/volumes/v-after-we-build/) (Part 8). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Onboarding to the Chain](/areas/12-team-organizational/onboarding) | PO, Team | V | | [T-Shaped Development](/areas/12-team-organizational/t-shaped) | Individual, Team | V | | [Psychological Safety](/areas/12-team-organizational/psychological-safety) | Leadership, PO | V | | [Small Team Adaptation](/areas/12-team-organizational/small-teams) | PO | V | | [Knowledge Retention](/areas/12-team-organizational/knowledge-retention) | Whole team | V | | [Team Capacity Planning](/areas/12-team-organizational/capacity-planning) gap | Leadership, PO | — | | [Hiring for Chain Fit](/areas/12-team-organizational/hiring) gap | Leadership | — | | [Cross-Team Coordination](/areas/12-team-organizational/cross-team) gap | Tech Lead, PO | — | --- --- url: /areas/12-team-organizational/onboarding.md --- Team & Organizational · master area # Onboarding to the Chain > *Shadow one cycle. First story with support. Six weeks to read chain artifacts and know what is missing.* **Owners:** PO, Team **Volume addressing it:** V ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [Knowledge Retention](/areas/12-team-organizational/knowledge-retention) --- --- url: /areas/12-team-organizational/t-shaped.md --- Team & Organizational · master area # T-Shaped Development > *Deep in one craft, working in two or three adjacent. The shape that makes small teams possible.* **Owners:** Individual, Team **Volume addressing it:** V ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [Small Team Adaptation](/areas/12-team-organizational/small-teams) * [Hiring for Chain Fit](/areas/12-team-organizational/hiring) --- --- url: /areas/12-team-organizational/psychological-safety.md --- Team & Organizational · master area # Psychological Safety > *Structural. Defects trace to chain levels, not people. Silence treated as system signal, not agreement.* **Owners:** Leadership, PO **Volume addressing it:** V ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [Postmortem](/areas/10-post-release-learning/postmortem) * [Retrospective](/areas/10-post-release-learning/retrospective) --- --- url: /areas/12-team-organizational/small-teams.md --- Team & Organizational · master area # Small Team Adaptation > *Roles combine, not eliminate. Conscious hat-switching. Incident commander, communicator, investigator never combine.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [T-Shaped Development](/areas/12-team-organizational/t-shaped) * [Incident Management](/areas/10-post-release-learning/incident-management) --- --- url: /areas/12-team-organizational/knowledge-retention.md --- Team & Organizational · master area # Knowledge Retention > *Chain artifacts as institutional memory. Briefs, ADRs, runbooks, model file. Not Slack, not email.* **Owner:** Whole team **Volume addressing it:** V ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [Onboarding to the Chain](/areas/12-team-organizational/onboarding) * [Architecture Decision Records](/areas/05-architecture/adr) --- --- url: /areas/12-team-organizational/capacity-planning.md --- Team & Organizational · master area # Team Capacity Planning > *Sustainable pace, on-call recovery, no burnout. Currently a gap in the corpus — practice exists in some teams but is not yet addressed.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** Leadership, PO **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [On-Call Rotation](/areas/08-pipeline-operations/on-call) --- --- url: /areas/12-team-organizational/hiring.md --- Team & Organizational · master area # Hiring for Chain Fit > *What to look for in new team members — names a person, names a wrong prediction, traces defects to levels. Currently a gap.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owner:** Leadership **Volume addressing it:** — ## Where it lives * [Volume V · The Team](/volumes/v-after-we-build/8-team) ## Related crafts * [T-Shaped Development](/areas/12-team-organizational/t-shaped) --- --- url: /areas/12-team-organizational/cross-team.md --- Team & Organizational · master area # Cross-Team Coordination > *Dependencies, shared services, platform teams. Currently a gap — coordination patterns exist but are not yet addressed.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** Tech Lead, PO **Volume addressing it:** — ## Where it lives * This area is currently a **gap** in the corpus. It is named here for visibility; future cycles will produce a volume that addresses it. ## Related crafts * [Bounded Context Mapping](/areas/05-architecture/bounded-context) --- --- url: /areas/13-adoption-evolution.md --- master areas · section 13 # Adoption & Evolution > *Getting started and getting better.* Where the chain itself becomes the subject. Minimum viable chain, practice sequencing, resistance handling, maturity assessment, artifact lifecycle, and chain evolution — the meta-crafts the corpus uses on itself. Primary owners: PO, Leadership. Primary volume: [V — After We Build](/volumes/v-after-we-build/) (Part 10). ## Crafts in this section | Craft | Owner | Volume | |---|---|---| | [Minimum Viable Chain](/areas/13-adoption-evolution/mvc) | PO | V | | [Practice Sequencing](/areas/13-adoption-evolution/practice-sequencing) | PO | V | | [Resistance Handling](/areas/13-adoption-evolution/resistance) | PO, Leadership | V | | [Chain Maturity Assessment](/areas/13-adoption-evolution/maturity) | PO, Leadership | V | | [Artifact Lifecycle](/areas/13-adoption-evolution/artifact-lifecycle) gap | PO, Tech Lead | — | | [Chain Evolution](/areas/13-adoption-evolution/chain-evolution) gap | PO, Leadership | — | --- --- url: /areas/13-adoption-evolution/mvc.md --- Adoption & Evolution · master area # Minimum Viable Chain > *Twenty minutes. Two practices. One cycle. Prediction with a check date, then run the check. The chain's spine.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · Adoption](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Prediction Writing](/areas/03-product-definition/prediction-writing) * [Practice Sequencing](/areas/13-adoption-evolution/practice-sequencing) --- --- url: /areas/13-adoption-evolution/practice-sequencing.md --- Adoption & Evolution · master area # Practice Sequencing > *Prediction → brief → amigos → retro → model update → ADR → discovery → postmortem → portfolio → full corpus.* **Owner:** PO **Volume addressing it:** V ## Where it lives * [Volume V · Adoption](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Minimum Viable Chain](/areas/13-adoption-evolution/mvc) * [Chain Maturity Assessment](/areas/13-adoption-evolution/maturity) --- --- url: /areas/13-adoption-evolution/resistance.md --- Adoption & Evolution · master area # Resistance Handling > *Real concerns underneath, real answers. Resistance is signal about where conditions don't yet support the practice.* **Owners:** PO, Leadership **Volume addressing it:** V ## Where it lives * [Volume V · Adoption](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Chain Maturity Assessment](/areas/13-adoption-evolution/maturity) --- --- url: /areas/13-adoption-evolution/maturity.md --- Adoption & Evolution · master area # Chain Maturity Assessment > *Where the team is, where to invest next. A mature chain is quiet, not busy.* **Owners:** PO, Leadership **Volume addressing it:** V ## Where it lives * [Volume V · Adoption](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Practice Sequencing](/areas/13-adoption-evolution/practice-sequencing) --- --- url: /areas/13-adoption-evolution/artifact-lifecycle.md --- Adoption & Evolution · master area # Artifact Lifecycle > *Archiving deprecated docs, lightweight track for micro-changes, last\_reviewed dates. Currently a gap that the corpus is now beginning to address.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** PO, Tech Lead **Volume addressing it:** — ## Where it lives * [Volume V · Adoption (artifact lifecycle)](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Knowledge Retention](/areas/12-team-organizational/knowledge-retention) --- --- url: /areas/13-adoption-evolution/chain-evolution.md --- Adoption & Evolution · master area # Chain Evolution > *When and how to change the chain itself — through the chain, on the chain. Currently a gap that the corpus is beginning to address.* ::: warning This is a gap area This craft exists in the chain but is not yet addressed in any volume. Filling it is part of the corpus's job. ::: **Owners:** PO, Leadership **Volume addressing it:** — ## Where it lives * [Volume V · Adoption (chain evolution)](/volumes/v-after-we-build/10-adoption) ## Related crafts * [Chain Maturity Assessment](/areas/13-adoption-evolution/maturity) --- --- url: /practice.md --- how we work · practice # Practice > *Operational guides anchored to the canon. Each Practice page derives from a Volume section, takes 10–25 minutes, and tells you how to do the thing — with the failure modes named.* The Volumes are the canon. Practice pages are *projections* of the canon at the cognitive load you need when you're about to do the thing. Every Practice page declares `derives_from:` — the Volume anchor that owns the truth — so when the canon updates, the practice doesn't drift. ## Shape Every Practice page follows the same shape: | Section | What it is | |---|---| | **TL;DR** | One paragraph. The thing in its smallest defensible form. | | **What it is** | Definition. The artefact or activity. | | **Why it matters** | What breaks without it. | | **How to do it** | Step-by-step. Templates and checklists embedded or linked. | | **Evidence** | What we know about this from our cycles or external research. | | **Anti-patterns** | Common ways this goes wrong. Often pointed at a [Clinic](/clinics/). | | **Further reading** | The Volume part it derives from + adjacent Practice pages. | ## Browse ### Briefs & Discovery * [Writing predictions](/practice/writing-predictions) — the five-field discipline that makes a brief a brief. ### Coming next The corpus is being filled. Planned shape: * **Briefs & Discovery** — Initiative briefs · Feature briefs · Technical Design briefs · Observation sessions · The Five Stations * **Stories & Scope** — Epic kickoff · Story writing · Amigos · Story mapping · Walking skeleton * **Releases** — Release gate · Gradual rollout · CS handoff · Release brief * **Checks & Signal** — Check sessions · Signal reading · The model update * **Incidents & Postmortems** — Containment · Postmortem session · Runbook authoring * **Operations** — Daily triage · Weekly client update · Quarterly portfolio review ## Clinics [Clinics](/clinics/) are the corpus's anti-pattern teachers — corrupt artifacts with diagnoses. Each Practice page links to the Clinics that show its failure modes. --- --- url: /practice/writing-predictions.md --- practice · briefs & discovery # Writing predictions > *The five-field discipline that makes a brief a brief. A claim about a measurable change, written before the cycle runs, with a date someone has committed to running the check.* ## TL;DR Every prediction names five things, all together: the **baseline** (current measured state, with sample size and date), the **target** (what we expect after), the **check date** (when someone runs the measurement), the **check method** (the specific way), and the **owner** (the named person responsible). Less than five fields and the brief is decoration. The corpus rule: **no baseline, no execution.** ## What it is A prediction is a falsifiable claim about a measurable change, written before the cycle runs. It is the smallest unit of honest commitment in the chain. It lives inside a [Feature Brief](/volumes/ii-discovery/7-feature-brief) or [Initiative Brief](/volumes/ii-discovery/6-initiative-brief). It is what [Volume V Part 2 — Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction) checks at the end of the cycle. ::: tip Distinguish from **Feature Brief** — the document that *contains* a prediction. **KPI** — a leading or lagging indicator at portfolio scale; not cycle-bound. **Success metric** — vague; the corpus does not use this term. *See [Confusable with](#confusable-with) at the foot.* ::: ## Why it matters The cycle is a closed loop. The prediction is the loop's anchor. Without a prediction: * There is nothing for [Volume V Part 2](/volumes/v-after-we-build/2-signal-and-the-prediction) to check. The cycle ran *blind* — the worst of the four outcomes. * The model never updates. The next cycle inherits the same wrong model. * The team's calibration over time cannot be measured. You cannot improve at predicting if you never recorded a prediction. Without a prediction, the rest of the chain still functions — code ships, features go live — but the team is now building software whose effect on the world is unmeasured. That is the most expensive kind of working. ## How to do it ### Step 1 — Capture the baseline before the cycle starts A baseline without a sample size and a date is a guess. Witness the current state directly. ```text Baseline: 47 minutes (mean), 38 minutes (median), n=12, captured 2026-04-22 by direct observation of three named graders. ``` If the metric the prediction will check has no instrumentation today, the cycle's first story is *as PO, I want to know how long Gal spent grading, so that I can run the check on 2026-06-15.* The instrumentation lands as part of the slice. See [Volume IV · Observability](/volumes/iv-execution/9-observability). ### Step 2 — Name the target with the form the change supports Pick one of three forms: | Form | When to use | |---|---| | **Specific number** | The change has a single dominant effect | | **Range** | The change has high variance across the population | | **Threshold** | The change must clear a binary criterion | Don't say "we'll see what happens." That's not a prediction. ### Step 3 — Set the check date with a calendar commitment ```text Check date: 2026-06-15. ``` The check date is a date the **named owner** has committed to. It is in their calendar. It is at most six weeks after the flag is enabled — long enough for first-contact noise to settle (per [Volume V · First 48 Hours](/volumes/v-after-we-build/1-first-48-hours)), short enough that the cycle is still fresh in everyone's head. ### Step 4 — Specify the check method The method is named in Discovery, not invented at check time. If the method requires instrumentation, the instrumentation is a story. If the method requires observation, the observation sessions are scheduled. ```text Check method: Three observation sessions across three named graders, in the field, with a stopwatch and the time-on-task event log as cross-check. Minimum 8 cycles total. Same observers as Discovery. ``` ### Step 5 — Name the owner A role is not an owner. *PO* is not an owner. *Alex (PO)* is. ```text Owner: Alex (PO). ``` ## A complete prediction ```text Prediction: Gal completes a grading cycle in under 15 minutes. Baseline: 47 minutes (mean), 38 minutes (median), n=12, captured 2026-04-22 by direct observation. Target: <15 minutes (mean) across n>=8 observed cycles OR <12 minutes (median) across n>=8. Check date: 2026-06-15. Check method: Three observation sessions across three named graders, in the field, with a stopwatch and the existing time-on-task instrumentation as cross-check. Minimum 8 cycles total. Same observers as Discovery. Owner: Alex (PO). ``` [Copy the template →](/templates/prediction) ## Evidence Across our cycles, the predictions that survived contact with reality shared three properties. 1. **Baseline captured by observation, not by query.** Cycles where the baseline was retrieved from a dashboard had a 2.3× higher *too conservative / too optimistic* rate than cycles where the baseline came from sitting next to the named person. 2. **Check method named in Discovery, not at check time.** Cycles where the check method was decided after the cycle ran produced *not checked* outcomes 4× more often. 3. **Owner is a person, not a role.** When the brief said "PO" instead of "Alex (PO)", the check happened on time 60% of the time. With a name, 95%. The largest gap remains: in two cycles in five, the baseline was witnessed for the wrong moment. The grader's *47 minutes* was wall-clock; the prediction implicitly meant *focused minutes*. That gap belongs to [Discovery](/volumes/ii-discovery/2-person-moment) — see Clinic below. ## Anti-patterns These are the failure shapes worth seeing first. | Pattern | What it looks like | Where to fix | |---|---|---| | **No baseline** | "We expect users to be happier." No number. | The corpus rule: *no baseline, no execution.* Capture before the cycle starts. | | **Vanity baseline** | The baseline is the metric you wanted to see, not the metric you measured. | Witness in person. See [Volume II · Observation](/volumes/ii-discovery/1-observation). | | **Unnamed owner** | Brief says *PO will check.* | Replace with a named person. The calendar commitment is the discipline. | | **Survey-shaped check method** | "We will ask graders if it feels faster." | Replace with observation. See [Volume V · Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction). | | **Drifting check date** | The date passes. The PO moves it without recording the move. | Mark as `not_checked` (the only worthless outcome). The chain treats moved dates as a chain-level signal at retro. | The clinic for the most common failure: [A brief that didn't witness](/clinics/a-brief-that-didnt-witness). ## Confusable with | This | Not this | Difference | |---|---|---| | **Prediction** | KPI / OKR | Cycle-bound, falsifiable, owned. KPIs are portfolio-scale. | | **Prediction** | Hypothesis | Hypothesis is exploratory; prediction is committed. *Predictions are made before, checked after.* | | **Prediction** | Goal (Volume I) | Goal is 12-month; prediction is 4–6 weeks. | | **Baseline** | Benchmark | Benchmark is industry-comparison; baseline is *our person, our cycle, our number.* | ## Further reading * **Canon** — [Volume II · Part 9 — Prediction Writing](/volumes/ii-discovery/9-prediction-writing) * **Canon** — [Volume V · Part 2 — Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction) * **Reference** — [Master Area · Prediction Writing](/areas/03-product-definition/prediction-writing) * **Template** — [Prediction skeleton](/templates/prediction) * **Checklist** — [Five fields](/checklists/prediction-five-fields) * **Clinic** — [A brief that didn't witness](/clinics/a-brief-that-didnt-witness) * **Skill path** — [PO foundations](/skills/po-foundations) (steps 4–6) --- --- url: /clinics.md --- how we work · clinics # Clinics > *Anti-pattern teachers. Each clinic shows a corrupt artefact — a brief that didn't witness, a story missing a state, a postmortem that produced a feeling — and walks the diagnosis. The corpus's most direct way to teach what wrong looks like.* Clinics exist because the corpus otherwise only shows good shape. A new practitioner needs to see *bad*, examined honestly, more than they need another *good*. A senior practitioner uses clinics to catch themselves — fine-grained discrimination between *almost right* and *right* is what the senior years demand. ## Shape Every clinic follows the same shape: | Section | What it is | |---|---| | **The artefact** | A real (or anonymised real) document the corpus has seen go wrong. Often a brief, a story, a postmortem, a runbook. | | **What's wrong?** | One direct prompt. Reader pauses to look before reading the diagnosis. | | **Diagnosis** | What's missing or broken, named at the right chain level. | | **The fix** | What the artefact should look like. | | **Where this comes from in the chain** | The level the failure traces to. | ## Available * [A brief that didn't witness](/clinics/a-brief-that-didnt-witness) — predictions written without observation. The most common Volume II failure. ## Coming next * A story without a state * A postmortem that produced a feeling * An ADR with one option * A runbook that doesn't run * A retro that listed instead of compounded * An initiative without a goal * An SLA review that became a sales conversation * A flag that never got cleaned up --- --- url: /clinics/a-brief-that-didnt-witness.md --- clinic · a brief that didn't witness # A brief that didn't witness > *The most common Volume II failure. The brief looks structurally correct. The prediction has all five fields. And reality, when it answers, says the team measured the wrong thing.* ## The artefact > ::: warning Excerpt — Feature Brief, "Grading Flow v2", March 2026 > > **Experience snapshot:** Graders today are slowed down by repetitive friction in the LMS. The new grading flow will reduce this friction and improve their grading experience. > > **Prediction:** > > * Baseline: 45 minutes per grading cycle (estimated from internal usage data) > * Target: under 20 minutes per cycle > * Check date: 2026-06-15 > * Check method: query the time-on-task analytics dashboard > * Owner: Alex (PO) > > **Sign-off:** PO ✅ · Designer ✅ · Tech Lead ✅ > ::: The brief looks structurally complete. Every Volume II checkbox is ticked. The trio signed off. The cycle ran. Six weeks later, the dashboard query returned **18 minutes** — better than the target. The team celebrated. Two weeks after that, support tickets started coming in: graders saying the new flow was *worse*. CSAT was down. The customer's head of department asked for a meeting. ## What's wrong? Stop. Read the artefact again. Three things are wrong before you read the diagnosis. Find at least two. ::: details Diagnosis (open when ready) ### 1. The experience snapshot is generic > *Graders today are slowed down by repetitive friction in the LMS.* This is not an experience snapshot. It is a sentence about a category of person, not a moment in a named person's day. Compare with the corpus standard: > *It is Wednesday morning. Gal sits down at 08:50 with her coffee and opens the LMS to grade the morning's batch of CS101 finals. The fourth submission is from a student named Yael Rosenberg-Hayut. The system displays her name with the hyphen and accents broken…* The corpus rule, from [Volume II · Person & Moment](/volumes/ii-discovery/2-person-moment): *every brief begins with a named person whose life will change.* Without a named person, the brief is a hypothesis about an aggregate. The team is now solving for an average that no one experiences. ### 2. The baseline was not witnessed > *Baseline: 45 minutes per grading cycle (estimated from internal usage data)* *Estimated from internal usage data* is the warning sign. The dashboard's *time-on-task* metric measures wall-clock — the time the LMS tab is in the foreground. It does not measure focus. A grader who alt-tabs to a spreadsheet to fix a Hebrew name is *off-task* in the dashboard but *fully on-task* in their actual work. The brief should have witnessed the activity directly — sat next to a named grader for one full session, with a stopwatch. *47 minutes mean, n=12, captured 2026-04-22 by direct observation of three named graders* is what the brief should say. The 45-minute number is a query result, not a measurement of the activity. When the cycle ran, the dashboard reported 18 minutes — but it was now reporting on a different pattern: graders had stopped alt-tabbing because the new flow handled Hebrew names. The dashboard time fell. The actual focused-grading time *also* fell, but less than the dashboard suggested. And graders disliked the new flow for an unrelated reason that the brief never witnessed: it removed the workaround they had built for *something else* — a typo-correction step they had been doing as a side effect. ### 3. The check method is the same as the baseline method > *Check method: query the time-on-task analytics dashboard.* Same dashboard, same metric. This is the third silent failure. A check method that re-uses the baseline's measurement infrastructure is not a check — it's a recursion. Reality cannot disagree with the prediction because the dashboard cannot disagree with itself. The check method should have been the same shape as the baseline witnessing: *three observation sessions across three named graders, in the field.* If the cycle had run that check, the third grader would have surfaced the typo-correction loss in the first 90 seconds. ::: ## The fix ```text Experience snapshot: It is Wednesday morning. Gal sits down at 08:50 with her coffee and opens the LMS to grade the morning's batch of CS101 finals. Of the seven submissions in front of her, four have Hebrew names. Each one she opens, she reads, copies the name to a spreadsheet to verify it, types her grade, types feedback. By 09:14 she has graded one. By 10:00 she has graded four. By 10:30 she stops for water and the morning is gone. Prediction: Gal completes a grading cycle in under 15 minutes of focused work. Baseline: 47 minutes per cycle (mean), n=12, captured 2026-04-22 by direct observation of three named graders. Note: time-on-task dashboard reports 38 min for the same sessions — the gap is alt-tab time graders spend correcting Hebrew names. Target: <15 minutes (mean) of focused-grading time across n>=8 observed cycles. Check date: 2026-06-15. Check method: Three observation sessions across three named graders, in the field, with a stopwatch. Time-on-task event log used as cross-check, NOT as primary signal. Owner: Alex (PO). ``` The diagnosis names *three* changes. The team would have caught the third one — the typo-correction loss — in the first observation session. It would have been a Discovery finding before code was written. ## Where this comes from in the chain This failure traces to **Discovery (Level 2)**. The brief looks like a Volume II artefact and ticks Volume II's checkboxes, but the underlying observation work was skipped. The cost of that skip was paid downstream — across a full Volume IV execution and into a Volume V signal reading that contradicted the team's customer relationship. The remedy is Volume II discipline: * [Observation](/volumes/ii-discovery/1-observation) — witness, do not survey. * [Person & Moment](/volumes/ii-discovery/2-person-moment) — name the person, name the moment. * [Assumption Surfacing](/volumes/ii-discovery/4-assumption-surfacing) — the brief should have listed *graders use the LMS time-on-task dashboard time as their actual time* as a *not witnessed* assumption. ## See also * **Practice** — [Writing predictions](/practice/writing-predictions) * **Practice** — *Running an observation session* (planned) * **Canon** — [Volume II · Part 1 — Observation](/volumes/ii-discovery/1-observation) * **Volume V** — [The Model Update](/volumes/v-after-we-build/6-model-update) (where the team would have updated the model after this failure) --- --- url: /skills.md --- how we work · skills hub # Skills > *Scaffolded curricula, anchored to your real work. Each path reads → practices → checks → reflects, with a real artefact at every step. The path is doing its job when your work this month is better than last month.* Skills are not reading lists. They are the corpus's pedagogy — read a Volume part, hold a session, write an artefact, run a check, retro on your own output, teach back to a junior. Pick a path that matches your role and level. Bring a real story from your sprint. ## How a path works | Step shape | Purpose | |---|---| | **Read** | Volume canon or Practice page | | **Worked example** | A real artefact from the corpus, annotated | | **Practice prompt** | Apply to your current work — bring a real story | | **Mini-check** | 3–5 questions, including a negative case | | **Pair task** | With a more senior role; explicit handoff | | **Retro on own work** | Use the corpus's own checklists on yourself | | **Teach-back** | Write a one-pager for a junior — the Feynman gate | | **Authoring contribution** | PR a refinement to the corpus | Each step belongs to one of these shapes. A path that doesn't reach *retro on own work* and *teach-back* by the end has not finished — these are the two steps that compound. ## What if my role isn't here yet? Most non-PO paths are still being authored. The rough cut you can rely on today, building on Volumes I–V: * **Tech Lead foundations** — Volume III parts 6–8 (ADRs, sequence/schema/API, ilities) → Volume IV parts 4, 8, 9 (pipeline, runbooks, observability). * **Designer foundations** — Volume II part 3 (journey) → Volume III parts 1, 2, 5 (epic kickoff, story mapping, amigos) → Volume IV part 1 (domain language). * **QA foundations** — Volume III part 5 (amigos) → Volume IV part 5 (testing layers) → Volume V part 3 (bug taxonomy). * **On-call competence** — Volume IV part 8 (runbooks) → Volume V parts 1, 4 (first 48h, postmortems). When the formal path lands, this list will move into Skills Hub above. --- --- url: /skills/po-foundations.md --- skill · foundations # PO foundations > *A six-week, ten-step path. By the end of it you have run one full cycle — predicted something, witnessed something, ran a check, written a model update, and contributed one refinement back to the corpus. You are not a senior PO. You are a PO whose first cycle produced a real artefact the team can use.* ::: tip How a skill works in this corpus A skill is not a reading list. It is a scaffolded sequence of read → practice → check → reflect, anchored to your real cycle. **Bring a real story from your sprint** to each step that asks for it. The skill is doing its job when your work output this month is better than last month, not when you have ticked ten boxes. ::: ## Mastery looks like When you finish this path, you can: * Sit with a named person and write a brief that begins with their day, not your concept. * Write a prediction that has all five fields, including a baseline you witnessed. * Hold a 45-minute amigos session and produce Gherkin a developer and QA can defend. * Run a check on the date in the brief and write a five-line signal reading. * Write a model update that another PO can read without you in the room. If you cannot do all five at the end, the skill has not finished. Repeat the steps where you got stuck. ## Self-rating before you start Before step 1, rate yourself on a 1–5 for each of these. We will compare at the end. | | 1 — Never | 3 — Sometimes | 5 — Default | |---|---|---|---| | I name the person in every brief | | | | | I capture baselines by observation | | | | | I write a prediction with all five fields | | | | | I hold amigos before code begins | | | | | I run the check on the date in the brief | | | | | I write a model update that lives next to the brief | | | | *** ## Step 1 — Orient yourself in the chain **Goal:** know where Volume II sits in the chain and why this skill is *foundational*. **Read:** * [Volume I · Strategy & Direction · Introduction](/volumes/i-strategy/intro) — context for what comes before Volume II * [Volume II · Discovery & Brief · Introduction](/volumes/ii-discovery/intro) * [The chain map](/map) **Practice prompt:** in your own words, write three sentences explaining what Volume II's job is, what it inherits from Volume I, and what it hands to Volume III. Save this to your notes — you will compare it to the version you write at step 10. **Mini-check:** if your three sentences could equally describe a Discovery sprint at *any* company, you have not yet absorbed the corpus's stance. Re-read [The Five Stations](/volumes/ii-discovery/5-five-stations) and try again — the corpus is opinionated about *how* Discovery happens. *** ## Step 2 — Witness one moment in person **Goal:** experience the difference between observation and interview. **Read:** * [Volume II · Part 1 — Observation](/volumes/ii-discovery/1-observation) * [Volume II · Part 2 — Person & Moment](/volumes/ii-discovery/2-person-moment) **Worked example:** the *Gal grades a CS101 final* observation note in [Volume II · Observation](/volumes/ii-discovery/1-observation#what-gets-recorded) is your reference shape. **Practice prompt:** schedule one 90-minute observation session this week with a real named person who uses something your team built or maintains. Sit next to them. Take time-stamped notes in the corpus's observation-note format. Do not interview. At the end, ask only the corpus's three anchored questions: *what was that thing you did at \[time], and why?* **Mini-check:** when you read your notes, can you identify at least one workaround the person uses that they have stopped noticing? If no, you were watching the screen, not the person. Read the note example again — *Gal stops looking at the timer in the LMS* is the kind of detail you are looking for. *** ## Step 3 — Map a journey **Goal:** convert the observation into a journey map. **Read:** * [Volume II · Part 3 — Journey Mapping](/volumes/ii-discovery/3-journey-mapping) **Practice prompt:** draw J1–Jn for the activity you observed. Mark friction with the three labels (cognitive / mechanical / domain-mismatch). Pick one moment to anchor a brief. **Mini-check:** if you have more than 15 steps, you are at task-level not activity-level. Compress. *** ## Step 4 — Write a prediction **Goal:** the central act of Volume II. **Read:** * [Volume II · Part 9 — Prediction Writing](/volumes/ii-discovery/9-prediction-writing) * [Practice · Writing predictions](/practice/writing-predictions) **Worked example:** the *Gal grading cycle* prediction in [Practice · Writing predictions · A complete prediction](/practice/writing-predictions#a-complete-prediction). **Practice prompt:** write a prediction for the journey-step friction you anchored in step 3. Use the [template](/templates/prediction). Do not skip the baseline. Witness it — sit with the same person again with a stopwatch. **Mini-check (Clinic):** read [A brief that didn't witness](/clinics/a-brief-that-didnt-witness). Find the three errors before reading the diagnosis. Then look at your own prediction. Does it contain any of those three patterns? *** ## Step 5 — Sign a brief and book a check date **Goal:** the brief becomes a commitment. **Read:** * [Volume II · Part 7 — Feature Brief](/volumes/ii-discovery/7-feature-brief) **Practice prompt:** write the full Feature Brief — Experience snapshot (150–200 words, named person), purpose, in scope, out of scope, prediction, success signal, open questions. Get a Designer and a Tech Lead to sign. **Put the check date in your calendar with a 1-hour block.** **Pair task:** review your brief with a more senior PO. Specifically ask them: *which of the experience snapshots in our corpus would mine most resemble — and what's still missing?* *** ## Step 6 — Hold amigos before code begins **Goal:** the smallest unit of shared meaning. **Read:** * [Volume III · Part 5 — Amigos & Gherkin](/volumes/iii-scope/5-amigos-gherkin) **Practice prompt:** when one of the Epic's stories is ready, hold a 45-minute amigos session — you, the developer assigned, the QA. Produce three Gherkin scenarios, including at least one negative case. Use the corpus's Given/When/Then form (the Given names the person's situation, not the system state). **Mini-check:** if the developer or QA asks you a question during amigos that you cannot answer from the brief, the brief is not ready. Pause amigos. Re-walk the brief. Resume. *** ## Step 7 — Watch the first 48 hours **Goal:** transition from build to watch. **Read:** * [Volume V · Part 1 — The First 48 Hours](/volumes/v-after-we-build/1-first-48-hours) **Practice prompt:** when the flag enables, you watch the dashboards. Not support tickets. Use the SLO thresholds. Hold the discipline of *not reacting incorrectly* — note things, do not act on first-hour noise. **Pair task:** sit with the Tech Lead for the first hour. They watch the system signals; you watch the leading product signals (adoption, completion). Compare notes at hour 2. *** ## Step 8 — Run the check **Goal:** Volume V Part 2 against your own prediction. **Read:** * [Volume V · Part 2 — Signal and the Prediction](/volumes/v-after-we-build/2-signal-and-the-prediction) **Practice prompt:** on the date in your brief, **run the check.** Do not move the date. If the result is *not what you hoped*, that is the second-most-valuable outcome. If you forgot to run it, that is the worthless outcome — name it as such, do not dress it up. Write a five-line signal reading next to the brief. Use the corpus form: prediction / baseline / target / measured / gap. **Mini-check:** does your signal reading have a gap line that names one specific thing? *"Better than predicted"* alone is not a gap line. *"Better than predicted — we suspect the keyboard shortcut absorbed more time than the deep-link navigation we built for"* is. *** ## Step 9 — Retrospective and model update **Goal:** Volume V Parts 5 and 6 — the step most teams skip. **Read:** * [Volume V · Part 5 — The Retrospective](/volumes/v-after-we-build/5-retrospective) * [Volume V · Part 6 — The Model Update](/volumes/v-after-we-build/6-model-update) **Practice prompt:** hold the 60-minute retrospective with the trio. Three questions, one change, owned, dated, testable. Then — separately, the same week — write the model update. Close assumptions in your brief. Add new ones. Update at least one template, checklist, or glossary entry. **Retro on your own work:** look back at your prediction (step 4). Were the five fields all *truly* there, or did one decay during the cycle? Most often the *check method* is the field that drifts. If yours did, write a one-line note for your future self: *next cycle, do this differently.* *** ## Step 10 — Teach back, contribute back **Goal:** the Feynman gate. You don't know it until you can write it. **Practice prompt:** write a one-page guide for the next new PO joining your team. *What I wish I'd known before my first cycle.* Use the corpus's voice — person-first, terse, witnessed-not-described. Limit yourself to one page. **Authoring contribution:** find one thing in this corpus that, after running your cycle, you now know is wrong, missing, or could be sharper. Open a PR. The corpus is meant to evolve; this is how it does. **Self-rating after:** rate yourself again on the same six dimensions you rated at the start. Compare. Where did you move? Where didn't you? Bring that gap to your next cycle. *** ## After this path You have run one cycle. Two more and *foundational* becomes *practitioner*. The next paths in sequence: * **PO · Practitioner** *(coming)* — running multiple Epics, holding the trio across two simultaneous cycles, writing PDRs that other POs read. * **PO · Advanced** *(coming)* — initiative-level discovery, portfolio decisions, kill briefs. ## Stuck? Common stuck points and where to go: | If you got stuck at | Read | |---|---| | Step 2 — couldn't find someone to observe | [Volume II · When observation is impossible](/volumes/ii-discovery/1-observation#when-observation-is-impossible) | | Step 4 — prediction feels arbitrary | [Clinic · A brief that didn't witness](/clinics/a-brief-that-didnt-witness) | | Step 6 — amigos didn't produce shared meaning | [Volume III · When amigos surfaces a problem](/volumes/iii-scope/5-amigos-gherkin#when-amigos-surfaces-a-problem) | | Step 8 — date passed without the check | This is the worthless outcome. Name it. Schedule a retrospective topic. Do not paper over. | | Step 9 — retrospective produced a list, not a change | Re-read [Volume V · The Retrospective](/volumes/v-after-we-build/5-retrospective) — *one change. owned. dated. testable.* | --- --- url: /roles.md --- how we work · roles # Roles > *A role is a stance, not a title. The hub for each role names what good looks like, the three artefacts a competent practitioner produces, and a scaffolded path from first day to depth.* Each role hub has four entry points: * **Landing** — what this role is for, and what good looks like. * **First 30 days** — the gated linear track for new hires. Six steps, one cycle. * **Deepening** — a menu of skills with a self-rated maturity matrix. * **You're stuck if…** — the anti-symptom catalogue. Senior practitioners catch themselves here. ## Roles in the corpus * [**Product Owner**](/roles/po) · the chain's spine — briefs, predictions, signal, model update * [**Tech Lead**](/roles/tech-lead) · ADRs, sequence/schema/API, runbooks, postmortem structure *(coming)* * [**Designer**](/roles/designer) · journey, flow, design system, content design *(coming)* * [**Developer**](/roles/developer) · domain language, trunk-based, pair, review *(coming)* * [**QA**](/roles/qa) · amigos, Gherkin, exploratory, accessibility *(coming)* * [**CS Lead**](/roles/cs-lead) · L1/2/3, support-to-bug pipeline, helpdesk metrics *(coming)* * [**On-call**](/roles/on-call) · runbooks, incidents, communication during, postmortem *(coming)* * [**Leadership**](/roles/leadership) · vision, goals, portfolio, kill decisions *(coming)* ## Cross-role artefacts Some artefacts are produced by the trio, not by a single role. The hubs link to the same artefact from each angle: * The **Feature Brief** appears in PO (writes), Designer (signs), Tech Lead (signs), QA (reads at amigos). * The **postmortem** appears in Tech Lead (drives), On-call (timeline), PO (chain-level), Leadership (no surprises). * The **release brief to client** appears in PO (writes), CS Lead (handoff), Tech Lead (technical confirmation). --- --- url: /roles/po.md --- role · product owner # Product Owner > *The chain's spine. The PO names the change, predicts what will happen, runs the check, and writes the model update so the next cycle inherits a sharper version of the understanding.* \