|
| 1 | +# Schemas |
| 2 | + |
| 3 | +Formal JSON Schema (draft 2020-12) definitions for the canonical slide specs |
| 4 | +and handshake files proposed in [`../v0.2-scope.md`](../v0.2-scope.md). |
| 5 | + |
| 6 | +These schemas turn the v0.2 proposal into a machine-checkable artifact. A tool |
| 7 | +that wants to speak the canonical format does not need to read any prose — it |
| 8 | +needs to validate against these files. An LLM that wants to generate specs |
| 9 | +can be grounded on them directly. A reviewer who wants to know "is this |
| 10 | +proposal concrete enough to react to" gets a definitive answer by running a |
| 11 | +validator against the examples. |
| 12 | + |
| 13 | +## What is here |
| 14 | + |
| 15 | +### Template schemas (5, canonical core) |
| 16 | + |
| 17 | +| File | Canonical name | Status in v0.1 | Status in v0.2 | |
| 18 | +|---|---|---|---| |
| 19 | +| [`financial_summary.schema.json`](financial_summary.schema.json) | `financial_summary` | Implemented as `render_financial_summary` | Schema matches current shape, plus optional `source_refs` and `unit` | |
| 20 | +| [`trading_comps.schema.json`](trading_comps.schema.json) | `trading_comps` | Implemented as `render_trading_comps` | Schema matches current shape, plus optional `source_refs` and `unit` | |
| 21 | +| [`transaction_comps.schema.json`](transaction_comps.schema.json) | `transaction_comps` | **Not implemented** | **New template.** Schema documents the proposed shape; renderer implementation is a v0.2 deliverable. | |
| 22 | +| [`sensitivity.schema.json`](sensitivity.schema.json) | `sensitivity` | Implemented as `render_sensitivity` | Schema matches current shape, plus optional 2D `source_refs` grid | |
| 23 | +| [`operating_metrics.schema.json`](operating_metrics.schema.json) | `operating_metrics` | Implemented as `render_dual_chart` | Canonical rename. `template` field accepts both `operating_metrics` and `dual_chart` (the latter as a deprecation alias that prints a one-line warning). | |
| 24 | + |
| 25 | +The remaining 7 templates in the repo (`cover`, `section_divider`, `toc`, |
| 26 | +`exec_summary`, `investment_highlights`, `stacked_bar_table`, `football_field`, |
| 27 | +`sources_uses`) are **reference extensions**. They are not part of the |
| 28 | +canonical core and do not have formal schemas in this directory. They remain |
| 29 | +available through the renderer and are documented in |
| 30 | +[`../../ib-deck-engine/skills/ib-deck-engine/reference/`](../../ib-deck-engine/skills/ib-deck-engine/reference/). |
| 31 | + |
| 32 | +### Handshake file schemas (4) |
| 33 | + |
| 34 | +| File | Purpose | |
| 35 | +|---|---| |
| 36 | +| [`source_refs.schema.json`](source_refs.schema.json) | Shared schema fragment for the `source_refs` arrays used by every template schema. Defines the four accepted forms: `null` (derived), bare cell string, explicit cell object, range object with aggregator. Referenced via `$ref` from each template schema. | |
| 37 | +| [`provenance.schema.json`](provenance.schema.json) | Workbook-to-canonical-metric map produced by `/ib-import-excel`. Lives alongside the workbook file. | |
| 38 | +| [`import_config.schema.json`](import_config.schema.json) | Workspace-local variant of provenance. Functionally equivalent; chosen when the analyst lacks write permission to the workbook's directory. | |
| 39 | +| [`deck_exceptions.schema.json`](deck_exceptions.schema.json) | Durable record of reviewed intentional discrepancies. Consumed by `/ib-audit` to downgrade matched findings from FAIL to INFO. | |
| 40 | +| [`audit_report.schema.json`](audit_report.schema.json) | Machine-readable output of `/ib-audit`. Every audit run produces exactly one of these. | |
| 41 | + |
| 42 | +## Versioning |
| 43 | + |
| 44 | +These schemas are the **v0.2 proposal** and will remain at this version for |
| 45 | +as long as they are actively being reviewed. Once the proposal is either |
| 46 | +accepted or iterated on meaningfully, a versioned subdirectory (e.g., |
| 47 | +`docs/schemas/v0.3/`) will be introduced. Until then, this directory holds |
| 48 | +exactly one version of each schema and the `$id` fields point at the `main` |
| 49 | +branch. |
| 50 | + |
| 51 | +**Breaking change policy:** a breaking change to any schema requires bumping |
| 52 | +the version and moving the old files to a versioned subdirectory so existing |
| 53 | +consumers are not silently broken. Additive changes (new optional properties, |
| 54 | +new enum values) do not require a version bump. |
| 55 | + |
| 56 | +## Using the schemas |
| 57 | + |
| 58 | +### Validation with Python's jsonschema |
| 59 | + |
| 60 | +```python |
| 61 | +import json |
| 62 | +from jsonschema import Draft202012Validator |
| 63 | + |
| 64 | +with open("docs/schemas/financial_summary.schema.json") as f: |
| 65 | + schema = json.load(f) |
| 66 | + |
| 67 | +with open("my_slide.spec.json") as f: |
| 68 | + spec = json.load(f) |
| 69 | + |
| 70 | +validator = Draft202012Validator(schema) |
| 71 | +errors = sorted(validator.iter_errors(spec), key=lambda e: e.path) |
| 72 | + |
| 73 | +if errors: |
| 74 | + for e in errors: |
| 75 | + print(f" {list(e.path)}: {e.message}") |
| 76 | +else: |
| 77 | + print("OK") |
| 78 | +``` |
| 79 | + |
| 80 | +### Resolving cross-schema `$ref` |
| 81 | + |
| 82 | +The template schemas reference `source_refs.schema.json#/$defs/sourceRefArray` |
| 83 | +via a relative `$ref`. Python's `jsonschema` library resolves this |
| 84 | +automatically when both files are loaded from the same directory. For tools |
| 85 | +that need an explicit resolver, the schemas are also self-describing via their |
| 86 | +absolute `$id` URIs (pointing at raw GitHub URLs). |
| 87 | + |
| 88 | +### What the schemas do *not* enforce |
| 89 | + |
| 90 | +Some rules in [`../v0.2-scope.md`](../v0.2-scope.md) cannot be expressed in |
| 91 | +JSON Schema alone. These need to be enforced by the validator or the audit |
| 92 | +layer, not by schema validation: |
| 93 | + |
| 94 | +- **Parallel array length invariants.** `source_refs[i]` must be parallel to |
| 95 | + `values[i]`. JSON Schema cannot express "array A has the same length as |
| 96 | + array B at the same nesting level." The validator must check this |
| 97 | + separately. |
| 98 | +- **`headers` length vs `rows[i].values` length.** Every row's values array |
| 99 | + must have `len(headers) - 1` entries. Not expressible in JSON Schema. |
| 100 | +- **Unique canonical metric names within a provenance file.** Not expressible |
| 101 | + without JSON Schema 2020-12's `unevaluatedProperties` + custom logic. |
| 102 | +- **`base_row` / `base_col` within data dimensions.** Sensitivity's base-case |
| 103 | + indices must be within the row/column header arrays. Schema validates the |
| 104 | + types but not the bounds. |
| 105 | +- **Audit check semantics.** The 10 checks in v0.2-scope.md §6 are documented |
| 106 | + by the `audit_report.schema.json` `check` enum but their behavior is a |
| 107 | + validator responsibility, not a schema responsibility. |
| 108 | + |
| 109 | +The intention is: schemas catch shape errors at authoring time; the audit |
| 110 | +layer and validator catch semantic errors at run time. |
| 111 | + |
| 112 | +### Value type conventions |
| 113 | + |
| 114 | +The canonical templates use two different conventions for storing values. This |
| 115 | +is a current-state observation, not an aspiration. |
| 116 | + |
| 117 | +- **Table templates** (`financial_summary`, `trading_comps`, `transaction_comps`, |
| 118 | + `sensitivity`) store values as **pre-formatted display strings**. The caller |
| 119 | + is responsible for formatting (`"1,058,651"`, `"32.5%"`, `"$3.83"`) before |
| 120 | + handing the spec to the renderer. |
| 121 | +- **Chart templates** (`operating_metrics`) store values as **numbers** (because |
| 122 | + bar heights are derived from them). The `secondary_values` array in |
| 123 | + `operating_metrics` stores pre-formatted strings for the display row below |
| 124 | + the bars. |
| 125 | + |
| 126 | +A future version could unify this by allowing numeric values everywhere and |
| 127 | +having the renderer format them via a declared `unit`. That is a v0.3 |
| 128 | +question and is not part of this proposal. |
| 129 | + |
| 130 | +### Audit integration |
| 131 | + |
| 132 | +A spec author who wants the audit to run against their slide should include: |
| 133 | + |
| 134 | +1. `source_workbook` and `source_sheet_default` at the top of the spec |
| 135 | +2. A `source_refs` array parallel to each row's `values` array |
| 136 | +3. A `unit` declaration on each row (for the unit_mismatch check) |
| 137 | + |
| 138 | +Any of the three can be omitted. When they are, the audit emits WARN-level |
| 139 | +findings rather than FAIL. Partial adoption is a design goal. |
| 140 | + |
| 141 | +## What is deliberately not here |
| 142 | + |
| 143 | +- **Schemas for the 7 reference-extension templates** (cover, section_divider, |
| 144 | + toc, exec_summary, investment_highlights, stacked_bar_table, football_field, |
| 145 | + sources_uses). These are not being pitched as canonical primitives and do not |
| 146 | + need formal schemas in v0.2. If the canonical core is accepted, the |
| 147 | + extensions can graduate to the schemas directory one at a time. |
| 148 | +- **A schema for the analyst workspace layout.** The example layout in |
| 149 | + `../v0.2-scope.md` §2 is a suggestion, not a mandated structure. Any tool |
| 150 | + that speaks these schemas can organize files however it wants as long as the |
| 151 | + individual files validate. |
| 152 | +- **A schema for the `format` callable in `operating_metrics` charts.** The |
| 153 | + current renderer's Python API accepts an optional `format` callable per |
| 154 | + chart for custom value formatting. Callables cannot appear in a JSON spec, |
| 155 | + so the schema omits the field entirely. Specs that need custom formatting |
| 156 | + should pre-format their values or rely on the renderer default. |
| 157 | +- **Excel formula references in `source_refs`.** The `source_refs` schema |
| 158 | + intentionally forbids formula references. The audit compares evaluated values |
| 159 | + only, not formula trees. If a spec needs a computed value, it lives in the |
| 160 | + spec as a derived row (with `source_refs: null`), not in the audit engine. |
| 161 | + |
| 162 | +## Sanity check against existing examples |
| 163 | + |
| 164 | +The schemas in this directory have been designed to validate the example JSON |
| 165 | +files in |
| 166 | +[`../../ib-deck-engine/skills/ib-deck-engine/reference/examples/`](../../ib-deck-engine/skills/ib-deck-engine/reference/examples/) |
| 167 | +with the caveat that those examples do not carry a `template` discriminator |
| 168 | +field. To validate an existing example against its schema, add |
| 169 | +`"template": "<name>"` at the top level and remove the `_comment` field. |
| 170 | + |
| 171 | +| Example file | Validates against | |
| 172 | +|---|---| |
| 173 | +| `financial_summary.json` | `financial_summary.schema.json` | |
| 174 | +| `trading_comps.json` | `trading_comps.schema.json` | |
| 175 | +| `sensitivity.json` | `sensitivity.schema.json` | |
| 176 | +| `dual_chart.json` | `operating_metrics.schema.json` (via the `dual_chart` alias) | |
| 177 | + |
| 178 | +The `comparison/input/financial_summary_spec.json` used by the |
| 179 | +[`comparison/`](../../comparison/) artifact also validates, with the same |
| 180 | +caveat about the `template` field. |
| 181 | + |
| 182 | +## Open questions |
| 183 | + |
| 184 | +Reproduced from [`../v0.2-scope.md`](../v0.2-scope.md) §10 for convenience, |
| 185 | +scoped to the schemas specifically: |
| 186 | + |
| 187 | +1. **Parallel array vs inline objects for `source_refs`.** v0.2 picks parallel |
| 188 | + arrays for compactness; inline-per-value objects would be more robust to |
| 189 | + row reordering but much more verbose. Pushback welcome. |
| 190 | +2. **Should the value type convention be unified?** Today table templates use |
| 191 | + strings and chart templates use numbers. Unifying on numbers + unit-driven |
| 192 | + formatting is a v0.3 question. |
| 193 | +3. **Should the canonical core be 5 or 7?** Adding `football_field` and |
| 194 | + `sources_uses` would cover M&A sell-side pitches; today they remain as |
| 195 | + reference extensions. Easy to promote if the Claude team prefers coverage |
| 196 | + over focus. |
0 commit comments