Architecture & Technical Design · master area
Data Flow Design
How data crosses system edges — sources, transformations, persistence, consumers. The big-picture diagram that makes integrations and ETL legible before the first pipeline runs in anger.
Owners: Tech Lead, Developer Phase it lives in: What We Build (Volume III) The corpus principle this enacts: Translation lives at the boundary, not throughout the codebase.
Where it lives in the chain
- What We Build · Sequence, Schema, API — the canon
How to do this
A data-flow diagram answers, for every piece of data:
- Source — where does it originate? User input, third-party event, internal system, batch job.
- Transformations — what happens between source and destination? Validation, enrichment, aggregation, anonymisation.
- Persistence — where does it land? OLTP database, data warehouse, event log, cache, file storage.
- Consumers — who reads it? UI, reports, downstream services, analytics, regulators.
- Lifetime — how long does it live? Forever, 30 days, until the next batch, transient.
What good practice looks like
The diagram lives next to the bounded-context map. Each line is annotated with the schema (link to the contract), the cadence (real-time, hourly, nightly), and the failure behaviour (drop, retry, dead-letter, alert). For PII or regulated data, the diagram also names the legal basis (consent, contract, legitimate interest) and the retention rule.
A team that doesn't draw the data flow ends up with shadow ETLs — pipelines invented inside services for local reasons, duplicating each other, with no shared definition of what counts as a Submission across the warehouse. The cost shows up at the first compliance audit, the first analytics question that crosses two services, or the first PII deletion request.
Related crafts
- Integration Design — the third-party edges
- Schema Design — where the data lands
- Bounded Context Mapping — where the language changes along the flow