Skip to content

Commit 2eeceec

Browse files
shaypal5claude
andauthored
PR 7.2.1: agent-reviewable release artifacts (#78)
* PR 7.2.1: agent-reviewable release artifacts Make the published Kaggle / HuggingFace bundle self-contained for AI and offline review. Every numerical / structural claim in the README is now verifiable from inside the bundle without following a github.com/blob/main/... link. What's new - release/metrics.json (root) + release/<tier>/metrics.json (per tier): deterministic JSON view of LR AUC / AP / P@100 / Brier / conversion rate / cohort-shift / cross-tier ordering medians, with JSON-path back-references to release/validation/validation_report.json. Built by scripts/build_release_metrics.py (--check mode for CI). - release/docs/ vendored copies of generation_method.md, channel_signal_audit.md, break_me_guide.md, feature_dictionary.md, v1_acceptance_gates_bands.yaml, v2_decision_log.md, kept in sync by scripts/sync_release_docs.py (--check mode for CI). - release/docs/relational_table_schemas.csv: per-column documentation for all 9 relational tables (64 columns), validated against live parquet schemas in the new tests. Kaggle packager threads these descriptions into resources[].schema.fields[].description so the preview's previously-empty col__desc cells are now populated for every relational table. - release/claims_register_source.yaml (hand-edited) + release/claims_register.{md,json} (rendered by scripts/build_claims_register.py): 26 claims across nine categories, each paired with backing artifact + JSON / YAML path. JSON output carries a schema block so an agent landing on the file with no context can interpret its own fields. - schema.org/Dataset JSON-LD block injected into the <head> of both Kaggle and HuggingFace preview HTML pages; shared render_jsonld_dataset helper in scripts/_preview_common.py HTML-escapes <, >, & inside the rendered JSON. - Instructor HF README gets an "Agent-reviewable artifacts" section pointing reviewers at docs/, claims_register.{md,json}, the per-tier manifest, and feature_dictionary.csv. Cross-tier metrics.json intentionally omitted from instructor (single-tier dataset). Both platform packagers extended - scripts/package_kaggle_release.py and scripts/package_hf_release.py copy the new root files (metrics.json, claims_register.*) and the docs/ subtree into their upload trees so platform agents and offline reviewers see the same files. Kaggle additionally enumerates them in resources[] so the published "Data Files" panel lists them. - scripts/_release_common.py: new AGENT_REVIEWABLE_ROOT_FILES / AGENT_REVIEWABLE_DOCS_DIR constants and load_relational_column_descriptions() helper. SOURCE_TREE_BLOCK updated in lockstep with the source-repo tree diagram in release/README.md. - release/README.md "What's inside" grows an "Agent-reviewable artifacts" subsection mirroring the upload trees. Tests - 28 new cases across tests/scripts/test_sync_release_docs.py, test_build_release_metrics.py, test_build_claims_register.py covering happy path, idempotence, --check drift, missing-source paths, invalid-YAML rejection, per-tier-skipping when bundle dirs aren't materialised, and audit-sync against the real release/ tree. - 4 new cases in test_preview_{kaggle,hf}_page.py pinning JSON-LD presence in <head>, byte-equality of JSON-LD across HF variants, and the SPDX-URL form of the license field. - test_package_kaggle_release.py extended to assert per-table parquet schemas now carry column descriptions and that the new agent-reviewable root resources land in resources[]. - Committed previews (release/_preview_committed/*.html) regenerated. Net: 1400/1400 tests pass + 5 publish-extra-gated skips; ruff clean across the touched scripts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * PR 7.2.1 self-review: actual claims verifier + drift guards Hostile self-review of PR 7.2.1 turned up six gaps; this commit addresses the highest-value ones. (1) **scripts/verify_claims_register.py** — the verifier the original PR was missing. Walks every claim in claims_register_source.yaml, expands `<tier>` placeholders + brace/comma multi-paths, resolves the JSON path inside the backing artifact, and confirms that numerics embedded in the claim text match the resolved value within tolerance. Catches numeric drift (claim says 0.879, artifact says 0.823), broken paths, and resolution errors. Wired into CI as a new job. (2) **scripts/build_release_metrics.py** no longer hardcodes difficulty knobs — reads them live from ``leadforge/recipes/b2b_saas_procurement_v1/difficulty_profiles.yaml`` via a new ``load_difficulty_knobs`` helper. Each tier's metrics file records a ``difficulty_knobs_source`` JSON-path pointer so the recipe-yaml staying authoritative is documented in the artifact. (3) **scripts/sync_release_docs.py** now refuses to clobber a vendored destination whose mtime is newer than the source — the sentinel that someone edited ``release/docs/X.md`` rather than the canonical ``docs/release/X.md``. ``--force`` bypasses with an explicit opt-in. Returns a ``_SyncResult`` dataclass instead of a tuple. New ``release/docs/README.md`` explains the vendoring direction loudly at the front of the directory. (4) **JSON-LD constants single-sourced** in scripts/_preview_common.py (``LICENSE_URL_MIT``, ``JSONLD_CITATION``, ``JSONLD_CREATOR``, ``JSONLD_VERSION``). Both preview scripts now import them instead of duplicating the literal strings — no more drift surface between Kaggle and HF previews on citation / recipe / seed. (5) **CI integration** — new ``release-artifacts-sync`` job in .github/workflows/ci.yml runs the four ``--check``-mode commands (sync_release_docs, build_release_metrics, build_claims_register) plus the new verify_claims_register. Without this job the audit- sync was theatre. (6) **Stronger validation of relational_table_schemas.csv** via new tests/release/test_relational_table_schemas.py: descriptions must be >=12 chars and non-TODO; dtypes from a closed vocabulary; bundle_visibility in {public+instructor, instructor_only}; no duplicate rows; parity with live parquet schemas. Tests: 33 new cases across test_verify_claims_register.py (16 — multi-path expansion, wildcard resolution, numeric extraction, drift detection, and an audit-sync gate against the real tree), test_relational_table_schemas.py (8), plus 4 new sync-script tests for the orphan-destination guard. Existing tests updated for the new _SyncResult shape and profiles_path parameter. 1425 passed total + 5 publish-extra-gated skips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * verify_claims_register: soft-skip gitignored bundle dirs CI's release-artifacts-sync job failed on a fresh checkout because ``release/intermediate_instructor/manifest.json`` is gitignored — the verifier was treating it as a hard error. Demote missing files under any of the four gitignored bundle prefixes (intro/, intermediate/, advanced/, intermediate_instructor/) to a soft skip; ``--strict`` upgrades them back to errors for release-readiness runs. Same posture applied to the test failure in ``test_committed_claims_register_verifies_against_release_tree``, which exercised the same path on Python 3.11 / 3.12 CI runners. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * .agent-plan: record PR 7.2.1 self-review pass Append the six additional gaps closed in the post-PR hostile self review (verifier, recipe-driven knobs, doc-vendoring guard, single- sourced JSON-LD constants, CI integration, stronger schema CSV validation) to the PR 7.2.1 entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 28b4c95 commit 2eeceec

39 files changed

Lines changed: 5523 additions & 163 deletions

.agent-plan.md

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

.github/workflows/ci.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,29 @@ jobs:
6262
include-hidden-files: true
6363
if-no-files-found: ignore
6464

65+
release-artifacts-sync:
66+
name: Release artifacts in sync (PR 7.2.1)
67+
runs-on: ubuntu-latest
68+
steps:
69+
- uses: actions/checkout@v4
70+
- uses: actions/setup-python@v5
71+
with:
72+
python-version: "3.12"
73+
- run: pip install -e ".[dev]"
74+
# Each script's --check mode reports drift as exit-code-1 without
75+
# touching disk; the verifier is exit-code-1 on a real claim drift.
76+
# Running them in CI is the only way the audit-sync guarantee
77+
# actually holds — without this job, a stale metrics.json /
78+
# claims_register / docs/ copy could land on main unnoticed.
79+
- name: docs/ vendored copies are in sync
80+
run: python scripts/sync_release_docs.py --check
81+
- name: release/metrics.json + per-tier metrics.json are in sync
82+
run: python scripts/build_release_metrics.py --check
83+
- name: release/claims_register.{md,json} are in sync with source.yaml
84+
run: python scripts/build_claims_register.py --check
85+
- name: every claim in claims_register_source.yaml resolves & values match
86+
run: python scripts/verify_claims_register.py
87+
6588
validate-dataset:
6689
name: Validate lead scoring dataset
6790
runs-on: ubuntu-latest

release/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,17 @@ rose materially in 2024).
2929
release/
3030
├── intro/ intermediate/ advanced/ # student_public bundles, one per difficulty tier
3131
│ ├── manifest.json # provenance + file hashes
32+
│ ├── metrics.json # per-tier headline metrics (medians + spreads)
3233
│ ├── dataset_card.md # auto-rendered per-bundle card
3334
│ ├── feature_dictionary.csv # authoritative column spec
3435
│ ├── lead_scoring.csv # flat convenience CSV (all splits)
3536
│ ├── tables/*.parquet # 7 snapshot-safe relational tables
3637
│ └── tasks/converted_within_90_days/{train,valid,test}.parquet
3738
├── intermediate_instructor/ # research companion: full-horizon tables + metadata/
39+
├── docs/ # vendored DGP / leakage / break-me docs (agent-readable)
3840
├── notebooks/ # 01 baseline · 02 relational · 03 leakage · 04 calibration
41+
├── metrics.json # top-level cross-tier metrics summary
42+
├── claims_register.{md,json} # claims → backing-artifact map (agent-readable)
3943
└── validation/ # validation_report.{json,md} + figures
4044
```
4145

@@ -45,6 +49,35 @@ hidden causal structure (DAG, latent registry, mechanism summary)
4549
under `metadata/`. The full layout is documented in each bundle's
4650
`manifest.json`.
4751

52+
### Agent-reviewable artifacts
53+
54+
The published bundle is self-contained for AI review and offline
55+
auditing — every numeric / structural claim on this page can be
56+
verified without following an external link:
57+
58+
- **`metrics.json` (root) + `<tier>/metrics.json`** — deterministic
59+
JSON view of the headline LR AUC / AP / P@100 / Brier / conversion
60+
rate / cohort-shift / cross-tier-ordering medians, with JSON-path
61+
back-references to `validation/validation_report.json` (the
62+
source of truth).
63+
- **`claims_register.{md,json}`** — every numerical or structural
64+
claim on this page paired with the artifact and path that backs it.
65+
Rendered from `claims_register_source.yaml` by
66+
`scripts/build_claims_register.py`.
67+
- **`docs/`** — vendored copies of `generation_method.md`,
68+
`channel_signal_audit.md`, `break_me_guide.md`,
69+
`feature_dictionary.md`, `v1_acceptance_gates_bands.yaml`,
70+
`v2_decision_log.md`, plus a hand-authored
71+
`relational_table_schemas.csv` documenting every column of every
72+
relational table. These match the GitHub-blob links cited below but
73+
ship inside the bundle so a reviewer never needs network access.
74+
- **`<tier>/manifest.json`** — SHA-256 hash for every file plus the
75+
full redaction contract (`structural_redactions.columns`,
76+
`omitted_tables`, `relational_snapshot_safe`, `snapshot_day`).
77+
- Kaggle / HuggingFace preview pages additionally inject a
78+
`schema.org/Dataset` JSON-LD block in their `<head>` for agent
79+
ingestion without HTML parsing.
80+
4881
## Quick start
4982

5083
```python

release/_preview_committed/huggingface_instructor.html

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,50 @@
3535
.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
3636
.dataset-footer__note { font-style: italic; margin-top: 8px; }
3737
</style>
38+
<script type="application/ld+json">{
39+
"@context": "https://schema.org",
40+
"@type": "Dataset",
41+
"citation": "Generated by leadforge (https://github.com/leadforge-dev/leadforge); recipe b2b_saas_procurement_v1, seed 42.",
42+
"creator": {
43+
"@type": "Organization",
44+
"name": "leadforge"
45+
},
46+
"description": "Hugging Face preview of leadforge-lead-scoring-v1.",
47+
"distribution": [
48+
{
49+
"@type": "DataDownload",
50+
"contentUrl": "intermediate/tasks/converted_within_90_days/train.parquet",
51+
"encodingFormat": "application/vnd.apache.parquet"
52+
},
53+
{
54+
"@type": "DataDownload",
55+
"contentUrl": "intermediate/tasks/converted_within_90_days/valid.parquet",
56+
"encodingFormat": "application/vnd.apache.parquet"
57+
},
58+
{
59+
"@type": "DataDownload",
60+
"contentUrl": "intermediate/tasks/converted_within_90_days/test.parquet",
61+
"encodingFormat": "application/vnd.apache.parquet"
62+
}
63+
],
64+
"isAccessibleForFree": true,
65+
"keywords": [
66+
"b2b",
67+
"crm",
68+
"datasets",
69+
"lead-scoring",
70+
"pandas",
71+
"synthetic-data",
72+
"tabular"
73+
],
74+
"license": "https://opensource.org/licenses/MIT",
75+
"name": "LeadForge: Synthetic B2B Lead Scoring (v1) — Instructor companion",
76+
"sameAs": [
77+
"https://github.com/leadforge-dev/leadforge",
78+
"https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1"
79+
],
80+
"version": "v1"
81+
}</script>
3882
</head>
3983
<body>
4084
<main class="container">
@@ -92,6 +136,8 @@ <h2>What this companion contains</h2>
92136
│ ├── tables/*.parquet # full-horizon tables (incl. customers, subscriptions)
93137
│ ├── tasks/converted_within_90_days/{train,valid,test}.parquet
94138
│ └── metadata/ # world_spec, graph.{graphml,json}, latent_registry, etc.
139+
├── docs/ # vendored DGP / leakage / break-me docs (agent-readable)
140+
├── claims_register.{md,json} # claims → backing-artifact map (agent-readable)
95141
├── README.md # this file (HF dataset card)
96142
├── dataset-cover-image.png # dataset thumbnail
97143
└── LICENSE
@@ -218,6 +264,23 @@ <h2>Composition</h2>
218264
every parquet file.</li>
219265
<li><strong>Bundle schema version.</strong> 5 (matches the public dataset).</li>
220266
</ul>
267+
<h2>Agent-reviewable artifacts</h2>
268+
<p>The companion ships the same self-contained review surface as the public
269+
bundle so an AI reviewer (or a researcher without GitHub access) can
270+
verify claims locally:</p>
271+
<ul>
272+
<li><code>docs/</code> — vendored copies of the generation method, leakage probes
273+
contract, acceptance bands, break-me guide, v2 decision log, and the
274+
per-relational-table column descriptions (<code>relational_table_schemas.csv</code>).</li>
275+
<li><code>claims_register.{md,json}</code> — every numerical / structural claim
276+
in this card paired with the artifact and path that backs it.</li>
277+
<li><code>intermediate/manifest.json</code> and <code>intermediate/feature_dictionary.csv</code>
278+
— SHA-256-hashed provenance and the authoritative column spec.</li>
279+
</ul>
280+
<p>The instructor companion intentionally omits the top-level
281+
<code>metrics.json</code> (cross-tier medians would be misleading for a single
282+
tier). Use the public dataset's <code>metrics.json</code> when comparing tier
283+
behaviour.</p>
221284
<h2>Maintenance, license</h2>
222285
<p>We <em>want</em> the dataset to be broken. See the
223286
<a href="https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1">public dataset card</a>

release/_preview_committed/huggingface_public.html

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,80 @@
3535
.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
3636
.dataset-footer__note { font-style: italic; margin-top: 8px; }
3737
</style>
38+
<script type="application/ld+json">{
39+
"@context": "https://schema.org",
40+
"@type": "Dataset",
41+
"citation": "Generated by leadforge (https://github.com/leadforge-dev/leadforge); recipe b2b_saas_procurement_v1, seed 42.",
42+
"creator": {
43+
"@type": "Organization",
44+
"name": "leadforge"
45+
},
46+
"description": "Hugging Face preview of leadforge-lead-scoring-v1.",
47+
"distribution": [
48+
{
49+
"@type": "DataDownload",
50+
"contentUrl": "intro/tasks/converted_within_90_days/train.parquet",
51+
"encodingFormat": "application/vnd.apache.parquet"
52+
},
53+
{
54+
"@type": "DataDownload",
55+
"contentUrl": "intro/tasks/converted_within_90_days/valid.parquet",
56+
"encodingFormat": "application/vnd.apache.parquet"
57+
},
58+
{
59+
"@type": "DataDownload",
60+
"contentUrl": "intro/tasks/converted_within_90_days/test.parquet",
61+
"encodingFormat": "application/vnd.apache.parquet"
62+
},
63+
{
64+
"@type": "DataDownload",
65+
"contentUrl": "intermediate/tasks/converted_within_90_days/train.parquet",
66+
"encodingFormat": "application/vnd.apache.parquet"
67+
},
68+
{
69+
"@type": "DataDownload",
70+
"contentUrl": "intermediate/tasks/converted_within_90_days/valid.parquet",
71+
"encodingFormat": "application/vnd.apache.parquet"
72+
},
73+
{
74+
"@type": "DataDownload",
75+
"contentUrl": "intermediate/tasks/converted_within_90_days/test.parquet",
76+
"encodingFormat": "application/vnd.apache.parquet"
77+
},
78+
{
79+
"@type": "DataDownload",
80+
"contentUrl": "advanced/tasks/converted_within_90_days/train.parquet",
81+
"encodingFormat": "application/vnd.apache.parquet"
82+
},
83+
{
84+
"@type": "DataDownload",
85+
"contentUrl": "advanced/tasks/converted_within_90_days/valid.parquet",
86+
"encodingFormat": "application/vnd.apache.parquet"
87+
},
88+
{
89+
"@type": "DataDownload",
90+
"contentUrl": "advanced/tasks/converted_within_90_days/test.parquet",
91+
"encodingFormat": "application/vnd.apache.parquet"
92+
}
93+
],
94+
"isAccessibleForFree": true,
95+
"keywords": [
96+
"b2b",
97+
"crm",
98+
"datasets",
99+
"lead-scoring",
100+
"pandas",
101+
"synthetic-data",
102+
"tabular"
103+
],
104+
"license": "https://opensource.org/licenses/MIT",
105+
"name": "LeadForge: Synthetic B2B Lead Scoring (v1)",
106+
"sameAs": [
107+
"https://github.com/leadforge-dev/leadforge",
108+
"https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1"
109+
],
110+
"version": "v1"
111+
}</script>
38112
</head>
39113
<body>
40114
<main class="container">
@@ -115,11 +189,15 @@ <h2>What's inside</h2>
115189
<pre><code>.
116190
├── intro/ intermediate/ advanced/ # student_public bundles, one per difficulty tier
117191
│ ├── manifest.json # provenance + file hashes
192+
│ ├── metrics.json # per-tier headline metrics (medians + spreads)
118193
│ ├── dataset_card.md # auto-rendered per-bundle card
119194
│ ├── feature_dictionary.csv # authoritative column spec
120195
│ ├── lead_scoring.csv # flat convenience CSV (all splits)
121196
│ ├── tables/*.parquet # 7 snapshot-safe relational tables
122197
│ └── tasks/converted_within_90_days/{train,valid,test}.parquet
198+
├── docs/ # vendored DGP / leakage / break-me docs (agent-readable)
199+
├── metrics.json # top-level cross-tier metrics summary
200+
├── claims_register.{md,json} # claims → backing-artifact map (agent-readable)
123201
├── README.md # this file (HF dataset card)
124202
├── dataset-cover-image.png # dataset thumbnail
125203
└── LICENSE
@@ -129,6 +207,34 @@ <h2>What's inside</h2>
129207
hidden causal structure (DAG, latent registry, mechanism summary)
130208
under <code>metadata/</code>. The full layout is documented in each bundle's
131209
<code>manifest.json</code>.</p>
210+
<h3>Agent-reviewable artifacts</h3>
211+
<p>The published bundle is self-contained for AI review and offline
212+
auditing — every numeric / structural claim on this page can be
213+
verified without following an external link:</p>
214+
<ul>
215+
<li><strong><code>metrics.json</code> (root) + <code>&lt;tier&gt;/metrics.json</code></strong> — deterministic
216+
JSON view of the headline LR AUC / AP / P@100 / Brier / conversion
217+
rate / cohort-shift / cross-tier-ordering medians, with JSON-path
218+
back-references to <code>validation/validation_report.json</code> (the
219+
source of truth).</li>
220+
<li><strong><code>claims_register.{md,json}</code></strong> — every numerical or structural
221+
claim on this page paired with the artifact and path that backs it.
222+
Rendered from <code>claims_register_source.yaml</code> by
223+
<code>scripts/build_claims_register.py</code>.</li>
224+
<li><strong><code>docs/</code></strong> — vendored copies of <code>generation_method.md</code>,
225+
<code>channel_signal_audit.md</code>, <code>break_me_guide.md</code>,
226+
<code>feature_dictionary.md</code>, <code>v1_acceptance_gates_bands.yaml</code>,
227+
<code>v2_decision_log.md</code>, plus a hand-authored
228+
<code>relational_table_schemas.csv</code> documenting every column of every
229+
relational table. These match the GitHub-blob links cited below but
230+
ship inside the bundle so a reviewer never needs network access.</li>
231+
<li><strong><code>&lt;tier&gt;/manifest.json</code></strong> — SHA-256 hash for every file plus the
232+
full redaction contract (<code>structural_redactions.columns</code>,
233+
<code>omitted_tables</code>, <code>relational_snapshot_safe</code>, <code>snapshot_day</code>).</li>
234+
<li>Kaggle / HuggingFace preview pages additionally inject a
235+
<code>schema.org/Dataset</code> JSON-LD block in their <code>&lt;head&gt;</code> for agent
236+
ingestion without HTML parsing.</li>
237+
</ul>
132238
<h2>Quick start</h2>
133239
<pre><code class="language-python"># Flat CSV
134240
df = pd.read_csv(&quot;intermediate/lead_scoring.csv&quot;)

0 commit comments

Comments
 (0)