From 06e3c2e3e6248b3ead8ffcf2045e022b10f6427a Mon Sep 17 00:00:00 2001 From: Liam Wynne Date: Thu, 7 May 2026 13:03:09 +1000 Subject: [PATCH] docs: add collapsible "How it works" section to README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a
block at the end of Getting started covering the file format, folder layout, and lifecycle. Renders as expand-on-click on GitHub; degrades to a long visible section if the marketplace renderer strips the wrapper. Single source of truth — no separate doc file. Covers: the two-file mental model (model YAML + domain JSON), the on-disk folder layout, anatomy of a model file and a domain file with field-reference tables, the Logical/Physical stage distinction with cardinality inference rules, AI harness install targets and the edit guard, auto-generated outputs (selectors.yml, discrepancy reports), and when to edit by hand vs the AI vs the canvas. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 215 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 215 insertions(+) diff --git a/README.md b/README.md index 9d86ec5..b25d7f4 100644 --- a/README.md +++ b/README.md @@ -119,6 +119,221 @@ The harness installs the right file for your assistant: The baseline harness teaches your assistant the domain format and sync workflow, with a guard that blocks AI edits to `erd-studio/` until the spec is loaded. Layer your own skills, prompts, and style guides on top so the generated dbt reflects your team's conventions. +
+ +
+How it works — file format, folders, and lifecycle  (click to expand) + +
+ +A guide to the file format and lifecycle so you can edit by hand, debug AI output, and understand what the canvas is showing. + +### The mental model + +ERD Studio stores your semantic model as plain text files in your dbt repo. The visual canvas is a renderer for those files — you can edit them in any text editor and the canvas updates, or edit on the canvas and the files update. + +Two kinds of file do all the work: + +- **Model YAML** — one file per table. Describes the table itself: columns, types, grain, role. +- **Domain JSON** — one file per diagram. References models by name, draws relationships between them, stores canvas positions. + +A model is reusable across diagrams. `dim_customer` can appear in both your "Sales" and "Marketing" domains — there's still only one YAML for it. A domain is a single ERD view. + +### Folder layout + +After setup, your dbt project gets: + +``` +your-dbt-project/ +├── erd-studio/ +│ ├── layers.json # Layer definitions (silver, gold, etc.) +│ ├── logical-models/ # Model definitions — one YAML per table +│ │ ├── dim_customer.yml +│ │ ├── dim_project.yml +│ │ └── fct_order.yml +│ ├── silver/ # Domains in the silver layer +│ │ ├── customer-360.json +│ │ └── orders.json +│ ├── gold/ # Domains in the gold layer +│ │ └── reporting.json +│ └── templates/ # Optional starting points for new models +│ ├── dimension.json +│ ├── fact.json +│ ├── bridge.json +│ └── scd2.json +├── selectors.yml # Auto-generated dbt selectors (one per domain) +└── target/manifest.json # Your dbt build artifact — ERD Studio reads it +``` + +Layer folders match what's in `layers.json`. You're not stuck with `silver`/`gold` — rename them to fit your warehouse (`staging`/`mart`, `bronze`/`silver`/`gold`/`platinum`, or your own scheme). + +### Anatomy of a model file + +`erd-studio/logical-models/dim_customer.yml`: + +```yaml +name: dim_customer +schema: silver +description: Customer master, deduplicated and SCD2-tracked. +grain: One row per customer (current + history) +modelRole: conformed-dim +columns: + - name: customer_sk + dataType: BIGINT + description: Surrogate key + isPrimaryKey: true + - name: customer_id + dataType: VARCHAR + description: Source system identifier + isNaturalKey: true + scdType: 1 + - name: email + dataType: VARCHAR + scdType: 2 +``` + +| Field | Purpose | +|---|---| +| `grain` | The "one row per ___" statement. The single most important design decision. | +| `modelRole` | Architecture role: `conformed-dim`, `domain-dim`, `transaction-fact`, `periodic-snapshot`, `accumulating-snapshot`, `factless-fact`, `bridge`, `reference`. | +| `isPrimaryKey` | Surrogate or business PK. | +| `isNaturalKey` | Business identifier (email, SKU, source system ID). | +| `isForeignKey` | Design intent — separate from the relationships you draw in the domain JSON. | +| `scdType` | `0` fixed, `1` overwrite, `2` track history. | +| `additiveType` | For fact measures: `additive`, `semi-additive`, `non-additive`. | +| `rationale` | Optional. Free-text fields capturing *why*: `purpose`, `design`, `grainChoice`, `roleChoice`, `scdStrategy`, `measures`. | + +### Anatomy of a domain file + +`erd-studio/silver/orders.json`: + +```json +{ + "schemaVersion": 5, + "domain": "orders", + "layer": "silver", + "description": "Order transactions and project dimensions", + "logical": { + "models": ["fct_order", "dim_project"], + "relationships": [ + { + "fromModel": "fct_order", + "fromColumn": "project_id", + "toModel": "dim_project", + "toColumn": "project_id", + "cardinality": "many-to-one" + } + ] + }, + "viewConfig": { + "positions": { + "fct_order": { "x": 100, "y": 200 }, + "dim_project": { "x": 400, "y": 100 } + } + } +} +``` + +The thing to notice: `logical.models` is an array of **strings** — names of files in `logical-models/`. The domain doesn't duplicate columns; it just references the model. This is why you can edit `dim_customer.yml` once and have every domain that includes it pick up the change. + +| Field | Purpose | +|---|---| +| `schemaVersion: 5` | Required. The current format version. | +| `logical.models[]` | Names matching `logical-models/{name}.yml`. | +| `logical.relationships[]` | FK relationships drawn between models in this diagram. | +| `viewConfig.positions` | Canvas x/y per model. Updated when you drag. | +| `viewConfig.annotations` | Sticky-note "build notes" you can pin to the canvas. | +| `stubColumns[]` | Optional. Suppresses "missing physical columns" warnings for conformed dimensions where you only model the keys. | + +### The two stages + +Open a domain on the canvas and you'll see a **Logical / Physical** toggle. + +- **Logical stage** — what you and the AI designed. Reads from `logical-models/*.yml` and the domain JSON. +- **Physical stage** — what dbt actually built. Derived at runtime from `target/manifest.json` and your existing dbt schema YAMLs (`models/**/*.yml`). **Nothing is written to disk for the physical stage** — it's recomputed from the manifest each time you switch. + +Cardinality on the physical stage is inferred from your existing dbt tests: + +| dbt test on FK side | dbt test on PK side | Cardinality shown | +|---|---|---| +| no `unique` test | `unique` | many-to-one | +| `unique` | `unique` | one-to-one | +| `unique_combination_of_columns` | `unique_combination_of_columns` | composite key | + +So an existing dbt project with decent test coverage gets a useful Physical stage on day one — no extra config. + +### How the AI is wired in + +When you run **Install AI Coding Harness**, ERD Studio writes the schema spec into the location your assistant looks at by default: + +| Assistant | File written | +|---|---| +| Claude Code | `.claude/skills/erd-studio/SKILL.md` | +| GitHub Copilot | `.github/instructions/erd-studio.instructions.md` | +| Google Gemini | `.gemini/styleguide.md` | +| OpenAI Codex | `AGENTS.md` (appended) | + +The spec teaches the AI: + +- Which file to edit for each operation (column changes go in the YAML; relationship changes go in the JSON) +- The naming conventions (`dim_`, `fct_`, `ref_`, `brg_` prefixes for dimensions, facts, references, bridges) +- The full field reference (every key documented above) + +For Claude Code, the install also adds a **PreToolUse hook** at `.claude/settings.local.json` that blocks the first edit to any `erd-studio/` file in a session until the assistant has loaded the skill. No half-read spec, no drift. + +The harness embeds a version marker. When you upgrade ERD Studio, the extension detects out-of-date harness files and prompts to update. + +### Auto-generated outputs + +Two things are written for you — don't hand-edit them. + +**`selectors.yml`** lives at the dbt project root. ERD Studio writes one selector per domain: + +```yaml +selectors: + - name: domain_silver_orders + definition: + union: + - method: fqn + value: fct_order + - method: fqn + value: dim_project +``` + +So `dbt run --selector domain_silver_orders` refreshes every model in your "orders" diagram. Regenerated whenever a domain changes. Selectors you write yourself (anything not prefixed `domain_`) are preserved across regenerations. + +**Discrepancy reports** — toggle "Compare to Physical" on the canvas. ERD Studio runs a comparison between Logical and Physical and overlays the result: + +| Status | Meaning | +|---|---| +| Matched | Same model / column / relationship on both sides | +| Extra | In source, not in target (e.g. you designed it, dbt hasn't built it) | +| Missing | In target, not in source (e.g. dbt has it, your model doesn't) | +| Type mismatch | Same column, different `dataType` | +| Cardinality mismatch | Same relationship, different cardinality | + +Type comparison normalises common aliases (`varchar`/`string`, `int`/`integer`, `timestamp_ntz`/`timestamp`, etc.) so equivalent types don't show as mismatches. + +For unrecoverable drift, the AI can generate a **sync plan** at `erd-studio/.sync-plan.json` — every discrepancy mapped to a concrete action (`add-to-logical`, `update-type-in-physical`, etc.). You pick the source of truth per item; the AI executes it. + +### Editing by hand vs by AI vs on the canvas + +All three write to the same files. Pick whichever fits the task. + +| If you want to… | Easiest path | +|---|---| +| Add a column to one model | Edit the YAML directly. | +| Reshape a domain (add models, draw relationships) | The canvas. | +| Backfill many models from sources | Ask the AI ("read the bronze layer and draft a star schema for orders"). | +| Document a design decision | Edit the YAML's `rationale` field. | +| Change a cardinality on the diagram | The canvas, or the JSON's `relationships[]`. | + +VS Code dirty-state, undo/redo, and git all work as you'd expect — every write goes through the editor's `WorkspaceEdit` API. + +That's the whole system. Three folders, two file types, one diagram per JSON, one model per YAML. + +
+ ---