clinic · a brief that didn't witness
A brief that didn't witness
The most common Volume II failure. The brief looks structurally correct. The prediction has all five fields. And reality, when it answers, says the team measured the wrong thing.
The artefact
Excerpt — Feature Brief, "Grading Flow v2", March 2026
Experience snapshot: Graders today are slowed down by repetitive friction in the LMS. The new grading flow will reduce this friction and improve their grading experience.
Prediction:
- Baseline: 45 minutes per grading cycle (estimated from internal usage data)
- Target: under 20 minutes per cycle
- Check date: 2026-06-15
- Check method: query the time-on-task analytics dashboard
- Owner: Alex (PO)
Sign-off: PO ✅ · Designer ✅ · Tech Lead ✅
The brief looks structurally complete. Every Volume II checkbox is ticked. The trio signed off. The cycle ran. Six weeks later, the dashboard query returned 18 minutes — better than the target. The team celebrated. Two weeks after that, support tickets started coming in: graders saying the new flow was worse. CSAT was down. The customer's head of department asked for a meeting.
What's wrong?
Stop. Read the artefact again. Three things are wrong before you read the diagnosis. Find at least two.
Diagnosis (open when ready)
1. The experience snapshot is generic
Graders today are slowed down by repetitive friction in the LMS.
This is not an experience snapshot. It is a sentence about a category of person, not a moment in a named person's day. Compare with the corpus standard:
It is Wednesday morning. Gal sits down at 08:50 with her coffee and opens the LMS to grade the morning's batch of CS101 finals. The fourth submission is from a student named Yael Rosenberg-Hayut. The system displays her name with the hyphen and accents broken…
The corpus rule, from Volume II · Person & Moment: every brief begins with a named person whose life will change. Without a named person, the brief is a hypothesis about an aggregate. The team is now solving for an average that no one experiences.
2. The baseline was not witnessed
Baseline: 45 minutes per grading cycle (estimated from internal usage data)
Estimated from internal usage data is the warning sign. The dashboard's time-on-task metric measures wall-clock — the time the LMS tab is in the foreground. It does not measure focus. A grader who alt-tabs to a spreadsheet to fix a Hebrew name is off-task in the dashboard but fully on-task in their actual work.
The brief should have witnessed the activity directly — sat next to a named grader for one full session, with a stopwatch. 47 minutes mean, n=12, captured 2026-04-22 by direct observation of three named graders is what the brief should say. The 45-minute number is a query result, not a measurement of the activity.
When the cycle ran, the dashboard reported 18 minutes — but it was now reporting on a different pattern: graders had stopped alt-tabbing because the new flow handled Hebrew names. The dashboard time fell. The actual focused-grading time also fell, but less than the dashboard suggested. And graders disliked the new flow for an unrelated reason that the brief never witnessed: it removed the workaround they had built for something else — a typo-correction step they had been doing as a side effect.
3. The check method is the same as the baseline method
Check method: query the time-on-task analytics dashboard.
Same dashboard, same metric. This is the third silent failure. A check method that re-uses the baseline's measurement infrastructure is not a check — it's a recursion. Reality cannot disagree with the prediction because the dashboard cannot disagree with itself.
The check method should have been the same shape as the baseline witnessing: three observation sessions across three named graders, in the field. If the cycle had run that check, the third grader would have surfaced the typo-correction loss in the first 90 seconds.
The fix
Experience snapshot:
It is Wednesday morning. Gal sits down at 08:50 with her coffee
and opens the LMS to grade the morning's batch of CS101 finals.
Of the seven submissions in front of her, four have Hebrew names.
Each one she opens, she reads, copies the name to a spreadsheet
to verify it, types her grade, types feedback. By 09:14 she has
graded one. By 10:00 she has graded four. By 10:30 she stops
for water and the morning is gone.
Prediction: Gal completes a grading cycle in under 15 minutes
of focused work.
Baseline: 47 minutes per cycle (mean), n=12, captured
2026-04-22 by direct observation of three named
graders. Note: time-on-task dashboard reports 38 min
for the same sessions — the gap is alt-tab time
graders spend correcting Hebrew names.
Target: <15 minutes (mean) of focused-grading time across
n>=8 observed cycles.
Check date: 2026-06-15.
Check method: Three observation sessions across three named
graders, in the field, with a stopwatch. Time-on-task
event log used as cross-check, NOT as primary signal.
Owner: Alex (PO).The diagnosis names three changes. The team would have caught the third one — the typo-correction loss — in the first observation session. It would have been a Discovery finding before code was written.
Where this comes from in the chain
This failure traces to Discovery (Level 2). The brief looks like a Volume II artefact and ticks Volume II's checkboxes, but the underlying observation work was skipped. The cost of that skip was paid downstream — across a full Volume IV execution and into a Volume V signal reading that contradicted the team's customer relationship.
The remedy is Volume II discipline:
- Observation — witness, do not survey.
- Person & Moment — name the person, name the moment.
- Assumption Surfacing — the brief should have listed graders use the LMS time-on-task dashboard time as their actual time as a not witnessed assumption.
See also
- Practice — Writing predictions
- Practice — Running an observation session (planned)
- Canon — Volume II · Part 1 — Observation
- Volume V — The Model Update (where the team would have updated the model after this failure)