part nine · observability
Observability
Logs, traces, metrics, events.
Observability is the property of a system that makes it possible to ask questions about its current state without having to deploy new code. The corpus pattern: observability is built with the feature, not after. By the time the gate is reached, the system is already legible.
The four signal types
| Signal | What it answers | Cost | Read by |
|---|---|---|---|
| Logs | What happened in this specific request? | High in volume; easy to author | Developer, on-call |
| Traces | How did this request flow across services? | Moderate, depends on sampling | Developer, on-call |
| Metrics | What is happening across many requests? | Low per-event; needs cardinality discipline | On-call, Tech Lead |
| Product analytics events | What did the named person do? | Low; needs naming discipline | PO, data |
Each signal has a different consumer and a different cost profile. The corpus uses all four; not as overlap, but as complement.
Logs
Structured. Always. JSON format with consistent fields.
{
"ts": "2026-05-22T08:53:14.029Z",
"service": "grading-api",
"level": "info",
"request_id": "req_a4f2c1",
"user_id": "usr_2103",
"endpoint": "GET /submissions/1234",
"duration_ms": 187,
"event": "submission.opened",
"subject_id": "sub_1234",
"domain_terms": ["submission", "grader"]
}Log fields are picked at design time, not at incident time. The fields appear in the brief as part of the observability section.
The corpus pattern: never log PII. The grading flow logs user_id, not name. The privacy ility (Volume III Part 8) constrains the log shape.
Traces
Each request gets a trace ID. Spans nest within the trace. Sampling is intelligent — every error trace is captured; healthy traces are sampled at low rate.
Traces show the request's path. Did this request hit the LMS adapter? Yes. Did the LMS adapter respond in time? 124ms. Did the response normalise correctly? Yes.
A team that uses traces solves more incidents in less time. A team that doesn't reads logs sequentially and reconstructs the path mentally — slower, more error-prone.
Metrics
Counters and gauges and histograms. Aggregated. Cheap per data point.
The corpus's standard set, per service:
http_request_duration_ms{endpoint, method, status} histogram
http_requests_total{endpoint, method, status} counter
http_requests_in_flight{endpoint} gauge
job_duration_ms{queue, type, outcome} histogram
job_queue_depth{queue} gauge
db_query_duration_ms{query_class} histogram
db_connections_active{} gauge
flag_evaluations_total{flag, outcome} counter
flag_evaluation_duration_ms{flag} histogramPlus the Epic-specific metrics named in the brief.
Cardinality is managed: labels are bounded. Labels per username is forbidden — that explodes cardinality. Labels per endpoint is fine.
Product analytics events
The named-action signals. The brief names them; the implementation emits them.
Brief: 'When Gal opens a submission, we want to know.'
Event: submission.opened
Properties: submission_id, grader_id, ts, duration_to_open_ms
Brief: 'When Gal saves a grade, we want to know.'
Event: submission.graded
Properties: submission_id, grader_id, ts, score_count, total_time_msEvents use domain language. They follow subject.verb format. They are versioned conservatively — a property name doesn't change without a migration story.
These events feed Volume V's signal reading. The prediction Gal completes a grading cycle in under 15 minutes is checked against submission.graded.total_time_ms. The instrumentation is in place by release-gate time.
Alerts
Alerts are derived from metrics, not logs. They fire when an SLO threshold is crossed.
- alert: GradingApiHighErrorRate
expr: sum(rate(http_requests_total{service="grading-api",status=~"5.."}[5m])) / sum(rate(http_requests_total{service="grading-api"}[5m])) > 0.01
for: 5m
severity: P1
runbook: https://runbooks.200apps.example/grading-flow-high-error-rate
message: "grading-api error rate above 1% for 5 minutes"Each alert has a runbook link. An alert without a runbook is an alert that produces panic, not action.
Dashboards
Two kinds.
- Service dashboard — for the on-call. Latency, error rate, saturation. Read at every sync.
- Feature dashboard — for the PO. Adoption, completion, error encounter rate, prediction-relevant metrics. Read at every signal reading.
The dashboards are versioned in code (or whatever the platform supports). They live next to the service. Changes to dashboards go through review like code.
What gets instrumented
The corpus rule: instrument what the brief named. If the prediction is checked against time-to-grade, the instrumentation that captures time-to-grade is part of the cycle. It is not deferred.
Instrumentation that isn't needed for the prediction or for operations isn't added. The corpus is opinionated against premature observability — too many metrics make the right ones harder to find.
The signal feeds Volume V
The whole observability stack exists to make Volume V's check possible. The check date arrives. The PO opens the feature dashboard. The metric is there. The check is straightforward.
A team that arrives at the check date and discovers the metric isn't instrumented has discovered, late, that the chain skipped a step. The corpus's discipline: instrument with the feature, not after.
What this volume produces, in one sentence
Volume IV carries the prediction from a signed brief through code, test, and release — with the language preserved, the trunk integrated, the gate held, the flag wrapped, the rollback rehearsed, and the instrumentation in place by the time the flag flips.