Skip to content

Commit bd783be

Browse files
gorajinclaude
andcommitted
Add docs/schemas — formal JSON Schemas for the v0.2 proposal
Ten files (5 template + 4 handshake + 1 shared fragment) plus a schemas README. Each schema is draft 2020-12 and carries an absolute $id pointing at its raw GitHub URL. Template schemas reference source_refs.schema.json via $ref for the parallel-array source_refs fragment, keeping the schema set consistent without duplication. Schemas: - source_refs.schema.json: shared fragment; null / cell string / explicit cellRef / rangeRef with aggregator. Formula refs forbidden (audit compares values, not formulas). - financial_summary.schema.json: matches current renderer shape. - trading_comps.schema.json: matches current renderer shape. - transaction_comps.schema.json: new in v0.2, marked pending impl. - sensitivity.schema.json: matches current renderer shape. - operating_metrics.schema.json: canonical rename of dual_chart; both names accepted during transition. - provenance.schema.json: workbook-to-canonical-metric map produced by /ib-import-excel, lives alongside the workbook. - import_config.schema.json: workspace-local variant of provenance. - deck_exceptions.schema.json: reviewed intentional discrepancies. - audit_report.schema.json: machine-readable /ib-audit output. Validation verified end-to-end: - All 10 schemas self-validate as draft 2020-12. - All 4 existing example JSON files validate against their schemas (after adding the template discriminator). - Constructed-minimal fixtures validate against the 5 handshake schemas that don't have pre-existing examples. - source_refs accepts all 4 valid forms and rejects 4 invalid forms (formula ref, lowercase cell, missing sheet, bad aggregator). README documents what the schemas do NOT enforce (parallel array length invariants, header-vs-row-count, unique canonical names) since those are validator responsibilities rather than schema ones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 95487fe commit bd783be

11 files changed

Lines changed: 1200 additions & 0 deletions

docs/schemas/README.md

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
# Schemas
2+
3+
Formal JSON Schema (draft 2020-12) definitions for the canonical slide specs
4+
and handshake files proposed in [`../v0.2-scope.md`](../v0.2-scope.md).
5+
6+
These schemas turn the v0.2 proposal into a machine-checkable artifact. A tool
7+
that wants to speak the canonical format does not need to read any prose — it
8+
needs to validate against these files. An LLM that wants to generate specs
9+
can be grounded on them directly. A reviewer who wants to know "is this
10+
proposal concrete enough to react to" gets a definitive answer by running a
11+
validator against the examples.
12+
13+
## What is here
14+
15+
### Template schemas (5, canonical core)
16+
17+
| File | Canonical name | Status in v0.1 | Status in v0.2 |
18+
|---|---|---|---|
19+
| [`financial_summary.schema.json`](financial_summary.schema.json) | `financial_summary` | Implemented as `render_financial_summary` | Schema matches current shape, plus optional `source_refs` and `unit` |
20+
| [`trading_comps.schema.json`](trading_comps.schema.json) | `trading_comps` | Implemented as `render_trading_comps` | Schema matches current shape, plus optional `source_refs` and `unit` |
21+
| [`transaction_comps.schema.json`](transaction_comps.schema.json) | `transaction_comps` | **Not implemented** | **New template.** Schema documents the proposed shape; renderer implementation is a v0.2 deliverable. |
22+
| [`sensitivity.schema.json`](sensitivity.schema.json) | `sensitivity` | Implemented as `render_sensitivity` | Schema matches current shape, plus optional 2D `source_refs` grid |
23+
| [`operating_metrics.schema.json`](operating_metrics.schema.json) | `operating_metrics` | Implemented as `render_dual_chart` | Canonical rename. `template` field accepts both `operating_metrics` and `dual_chart` (the latter as a deprecation alias that prints a one-line warning). |
24+
25+
The remaining 7 templates in the repo (`cover`, `section_divider`, `toc`,
26+
`exec_summary`, `investment_highlights`, `stacked_bar_table`, `football_field`,
27+
`sources_uses`) are **reference extensions**. They are not part of the
28+
canonical core and do not have formal schemas in this directory. They remain
29+
available through the renderer and are documented in
30+
[`../../ib-deck-engine/skills/ib-deck-engine/reference/`](../../ib-deck-engine/skills/ib-deck-engine/reference/).
31+
32+
### Handshake file schemas (4)
33+
34+
| File | Purpose |
35+
|---|---|
36+
| [`source_refs.schema.json`](source_refs.schema.json) | Shared schema fragment for the `source_refs` arrays used by every template schema. Defines the four accepted forms: `null` (derived), bare cell string, explicit cell object, range object with aggregator. Referenced via `$ref` from each template schema. |
37+
| [`provenance.schema.json`](provenance.schema.json) | Workbook-to-canonical-metric map produced by `/ib-import-excel`. Lives alongside the workbook file. |
38+
| [`import_config.schema.json`](import_config.schema.json) | Workspace-local variant of provenance. Functionally equivalent; chosen when the analyst lacks write permission to the workbook's directory. |
39+
| [`deck_exceptions.schema.json`](deck_exceptions.schema.json) | Durable record of reviewed intentional discrepancies. Consumed by `/ib-audit` to downgrade matched findings from FAIL to INFO. |
40+
| [`audit_report.schema.json`](audit_report.schema.json) | Machine-readable output of `/ib-audit`. Every audit run produces exactly one of these. |
41+
42+
## Versioning
43+
44+
These schemas are the **v0.2 proposal** and will remain at this version for
45+
as long as they are actively being reviewed. Once the proposal is either
46+
accepted or iterated on meaningfully, a versioned subdirectory (e.g.,
47+
`docs/schemas/v0.3/`) will be introduced. Until then, this directory holds
48+
exactly one version of each schema and the `$id` fields point at the `main`
49+
branch.
50+
51+
**Breaking change policy:** a breaking change to any schema requires bumping
52+
the version and moving the old files to a versioned subdirectory so existing
53+
consumers are not silently broken. Additive changes (new optional properties,
54+
new enum values) do not require a version bump.
55+
56+
## Using the schemas
57+
58+
### Validation with Python's jsonschema
59+
60+
```python
61+
import json
62+
from jsonschema import Draft202012Validator
63+
64+
with open("docs/schemas/financial_summary.schema.json") as f:
65+
schema = json.load(f)
66+
67+
with open("my_slide.spec.json") as f:
68+
spec = json.load(f)
69+
70+
validator = Draft202012Validator(schema)
71+
errors = sorted(validator.iter_errors(spec), key=lambda e: e.path)
72+
73+
if errors:
74+
for e in errors:
75+
print(f" {list(e.path)}: {e.message}")
76+
else:
77+
print("OK")
78+
```
79+
80+
### Resolving cross-schema `$ref`
81+
82+
The template schemas reference `source_refs.schema.json#/$defs/sourceRefArray`
83+
via a relative `$ref`. Python's `jsonschema` library resolves this
84+
automatically when both files are loaded from the same directory. For tools
85+
that need an explicit resolver, the schemas are also self-describing via their
86+
absolute `$id` URIs (pointing at raw GitHub URLs).
87+
88+
### What the schemas do *not* enforce
89+
90+
Some rules in [`../v0.2-scope.md`](../v0.2-scope.md) cannot be expressed in
91+
JSON Schema alone. These need to be enforced by the validator or the audit
92+
layer, not by schema validation:
93+
94+
- **Parallel array length invariants.** `source_refs[i]` must be parallel to
95+
`values[i]`. JSON Schema cannot express "array A has the same length as
96+
array B at the same nesting level." The validator must check this
97+
separately.
98+
- **`headers` length vs `rows[i].values` length.** Every row's values array
99+
must have `len(headers) - 1` entries. Not expressible in JSON Schema.
100+
- **Unique canonical metric names within a provenance file.** Not expressible
101+
without JSON Schema 2020-12's `unevaluatedProperties` + custom logic.
102+
- **`base_row` / `base_col` within data dimensions.** Sensitivity's base-case
103+
indices must be within the row/column header arrays. Schema validates the
104+
types but not the bounds.
105+
- **Audit check semantics.** The 10 checks in v0.2-scope.md §6 are documented
106+
by the `audit_report.schema.json` `check` enum but their behavior is a
107+
validator responsibility, not a schema responsibility.
108+
109+
The intention is: schemas catch shape errors at authoring time; the audit
110+
layer and validator catch semantic errors at run time.
111+
112+
### Value type conventions
113+
114+
The canonical templates use two different conventions for storing values. This
115+
is a current-state observation, not an aspiration.
116+
117+
- **Table templates** (`financial_summary`, `trading_comps`, `transaction_comps`,
118+
`sensitivity`) store values as **pre-formatted display strings**. The caller
119+
is responsible for formatting (`"1,058,651"`, `"32.5%"`, `"$3.83"`) before
120+
handing the spec to the renderer.
121+
- **Chart templates** (`operating_metrics`) store values as **numbers** (because
122+
bar heights are derived from them). The `secondary_values` array in
123+
`operating_metrics` stores pre-formatted strings for the display row below
124+
the bars.
125+
126+
A future version could unify this by allowing numeric values everywhere and
127+
having the renderer format them via a declared `unit`. That is a v0.3
128+
question and is not part of this proposal.
129+
130+
### Audit integration
131+
132+
A spec author who wants the audit to run against their slide should include:
133+
134+
1. `source_workbook` and `source_sheet_default` at the top of the spec
135+
2. A `source_refs` array parallel to each row's `values` array
136+
3. A `unit` declaration on each row (for the unit_mismatch check)
137+
138+
Any of the three can be omitted. When they are, the audit emits WARN-level
139+
findings rather than FAIL. Partial adoption is a design goal.
140+
141+
## What is deliberately not here
142+
143+
- **Schemas for the 7 reference-extension templates** (cover, section_divider,
144+
toc, exec_summary, investment_highlights, stacked_bar_table, football_field,
145+
sources_uses). These are not being pitched as canonical primitives and do not
146+
need formal schemas in v0.2. If the canonical core is accepted, the
147+
extensions can graduate to the schemas directory one at a time.
148+
- **A schema for the analyst workspace layout.** The example layout in
149+
`../v0.2-scope.md` §2 is a suggestion, not a mandated structure. Any tool
150+
that speaks these schemas can organize files however it wants as long as the
151+
individual files validate.
152+
- **A schema for the `format` callable in `operating_metrics` charts.** The
153+
current renderer's Python API accepts an optional `format` callable per
154+
chart for custom value formatting. Callables cannot appear in a JSON spec,
155+
so the schema omits the field entirely. Specs that need custom formatting
156+
should pre-format their values or rely on the renderer default.
157+
- **Excel formula references in `source_refs`.** The `source_refs` schema
158+
intentionally forbids formula references. The audit compares evaluated values
159+
only, not formula trees. If a spec needs a computed value, it lives in the
160+
spec as a derived row (with `source_refs: null`), not in the audit engine.
161+
162+
## Sanity check against existing examples
163+
164+
The schemas in this directory have been designed to validate the example JSON
165+
files in
166+
[`../../ib-deck-engine/skills/ib-deck-engine/reference/examples/`](../../ib-deck-engine/skills/ib-deck-engine/reference/examples/)
167+
with the caveat that those examples do not carry a `template` discriminator
168+
field. To validate an existing example against its schema, add
169+
`"template": "<name>"` at the top level and remove the `_comment` field.
170+
171+
| Example file | Validates against |
172+
|---|---|
173+
| `financial_summary.json` | `financial_summary.schema.json` |
174+
| `trading_comps.json` | `trading_comps.schema.json` |
175+
| `sensitivity.json` | `sensitivity.schema.json` |
176+
| `dual_chart.json` | `operating_metrics.schema.json` (via the `dual_chart` alias) |
177+
178+
The `comparison/input/financial_summary_spec.json` used by the
179+
[`comparison/`](../../comparison/) artifact also validates, with the same
180+
caveat about the `template` field.
181+
182+
## Open questions
183+
184+
Reproduced from [`../v0.2-scope.md`](../v0.2-scope.md) §10 for convenience,
185+
scoped to the schemas specifically:
186+
187+
1. **Parallel array vs inline objects for `source_refs`.** v0.2 picks parallel
188+
arrays for compactness; inline-per-value objects would be more robust to
189+
row reordering but much more verbose. Pushback welcome.
190+
2. **Should the value type convention be unified?** Today table templates use
191+
strings and chart templates use numbers. Unifying on numbers + unit-driven
192+
formatting is a v0.3 question.
193+
3. **Should the canonical core be 5 or 7?** Adding `football_field` and
194+
`sources_uses` would cover M&A sell-side pitches; today they remain as
195+
reference extensions. Easy to promote if the Claude team prefers coverage
196+
over focus.
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"$id": "https://raw.githubusercontent.com/gorajing/ib-deck-plugin/main/docs/schemas/audit_report.schema.json",
4+
"title": "audit report",
5+
"description": "Schema for an audit.report.json file — machine-readable output of /ib-audit. Designed so that other tools (a CI job, a review dashboard, a Slack bot) can consume audit results without parsing human-readable text. Every run of /ib-audit produces exactly one audit report.",
6+
"type": "object",
7+
"required": ["run_id", "generated_at", "workbook", "specs", "summary", "findings"],
8+
"additionalProperties": false,
9+
"properties": {
10+
"run_id": {
11+
"type": "string",
12+
"minLength": 1,
13+
"description": "Unique identifier for this audit run. Convention: '<iso8601>-<short_hash>' (e.g., '2026-04-09T14-22-00Z-a7c3')."
14+
},
15+
"generated_at": {
16+
"type": "string",
17+
"format": "date-time",
18+
"description": "ISO 8601 timestamp of when this audit report was generated."
19+
},
20+
"workbook": {
21+
"type": "string",
22+
"minLength": 1,
23+
"description": "Filename of the workbook the audit ran against."
24+
},
25+
"workbook_hash": {
26+
"type": "string",
27+
"pattern": "^sha256:[0-9a-f]{64}$",
28+
"description": "SHA-256 hash of the workbook at audit time. A mismatch against the provenance file's workbook_hash surfaces as a WARN."
29+
},
30+
"workbook_last_recalc": {
31+
"type": "string",
32+
"format": "date-time",
33+
"description": "Best-effort timestamp of the workbook's last full recalculation. Used to compute workbook_recalc_staleness_hours."
34+
},
35+
"workbook_recalc_staleness_hours": {
36+
"type": "number",
37+
"minimum": 0,
38+
"description": "Hours elapsed between workbook_last_recalc and generated_at. The audit emits a WARN when this exceeds 24 (configurable)."
39+
},
40+
"specs": {
41+
"type": "array",
42+
"items": {
43+
"type": "string"
44+
},
45+
"description": "Paths (relative or absolute) of the spec files included in this audit run."
46+
},
47+
"exceptions_file": {
48+
"type": "string",
49+
"description": "Optional path to the deck.exceptions.json file used for this audit run. Null or omitted if no exceptions file was provided."
50+
},
51+
"summary": {
52+
"type": "object",
53+
"required": ["pass", "fail", "warn", "info"],
54+
"additionalProperties": false,
55+
"properties": {
56+
"pass": {
57+
"type": "integer",
58+
"minimum": 0
59+
},
60+
"fail": {
61+
"type": "integer",
62+
"minimum": 0
63+
},
64+
"warn": {
65+
"type": "integer",
66+
"minimum": 0
67+
},
68+
"info": {
69+
"type": "integer",
70+
"minimum": 0
71+
}
72+
},
73+
"description": "Counts of findings by severity. The audit command's exit code is 0 when fail == 0 and 1 otherwise."
74+
},
75+
"findings": {
76+
"type": "array",
77+
"items": {
78+
"$ref": "#/$defs/finding"
79+
},
80+
"description": "List of individual audit findings, in the order they were generated. May be empty if no checks fired."
81+
}
82+
},
83+
"$defs": {
84+
"finding": {
85+
"type": "object",
86+
"required": ["severity", "check", "message"],
87+
"additionalProperties": false,
88+
"properties": {
89+
"severity": {
90+
"enum": ["PASS", "FAIL", "WARN", "INFO"],
91+
"description": "Finding severity. PASS findings are usually omitted from the findings array (they're counted in summary.pass); FAIL, WARN, and INFO are always listed."
92+
},
93+
"check": {
94+
"type": "string",
95+
"enum": [
96+
"value_mismatch",
97+
"unit_mismatch",
98+
"sign_mismatch",
99+
"subtotal_tie_out",
100+
"cross_slide_consistency",
101+
"missing_source_ref",
102+
"workbook_recalc_staleness",
103+
"required_source_line",
104+
"exception_matched",
105+
"exception_expired"
106+
],
107+
"description": "Check identifier. The initial set of 10 checks is defined in docs/v0.2-scope.md section 6. New checks may be added in v0.3+ without breaking this schema."
108+
},
109+
"slide_id": {
110+
"type": "string",
111+
"description": "Identifier of the slide this finding applies to (when applicable — not every check is slide-scoped)."
112+
},
113+
"metric": {
114+
"type": "string",
115+
"description": "Metric name (e.g., 'EBITDA'). Present for slide-scoped findings."
116+
},
117+
"period": {
118+
"type": "string",
119+
"description": "Period label (e.g., 'FY2025A'). Present for slide-scoped findings."
120+
},
121+
"expected": {
122+
"description": "The value as it appears in the spec. Type matches the spec's value type (number or string).",
123+
"oneOf": [
124+
{"type": "number"},
125+
{"type": "string"},
126+
{"type": "null"}
127+
]
128+
},
129+
"actual": {
130+
"description": "The value as it appears in the workbook. Type matches the spec's value type (number or string).",
131+
"oneOf": [
132+
{"type": "number"},
133+
{"type": "string"},
134+
{"type": "null"}
135+
]
136+
},
137+
"source_ref": {
138+
"description": "The source_ref that was evaluated to produce this finding. Useful for debugging mapping issues.",
139+
"oneOf": [
140+
{"type": "null"},
141+
{
142+
"type": "object",
143+
"required": ["workbook", "sheet", "cell"],
144+
"properties": {
145+
"workbook": {"type": "string"},
146+
"sheet": {"type": "string"},
147+
"cell": {"type": "string"}
148+
}
149+
}
150+
]
151+
},
152+
"message": {
153+
"type": "string",
154+
"minLength": 1,
155+
"description": "Human-readable message describing the finding. Always present."
156+
},
157+
"exception_matched": {
158+
"description": "When this finding was downgraded from FAIL to INFO by a matching entry in deck.exceptions.json, this field carries the exception id. Null otherwise.",
159+
"oneOf": [
160+
{"type": "null"},
161+
{"type": "string"}
162+
]
163+
}
164+
}
165+
}
166+
}
167+
}

0 commit comments

Comments
 (0)