diff --git a/.agent-plan.md b/.agent-plan.md
index 4f4c290..47c1e6c 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -64,7 +64,7 @@ Goal: ship a best-in-class educational synthetic CRM lead-scoring dataset family
 
 ### Phase 7 — LLM critique + publish (3 PRs)
 - [x] PR 7.1: LLM critique module + prompt + driver landed.  `leadforge/validation/llm_critique.py` (new) — single-provider Anthropic critique core via an `LLMCritiqueClient` protocol (no preemptive OpenAI/Gemini stubs); `_AnthropicCritiqueClient` lazy-imports the SDK so the module imports cleanly even on machines without `anthropic` installed (the skip-cleanly path needs to work without the SDK).  `has_anthropic_credentials` / `api_key_or_skip` treat unset and empty-after-strip identically as "absent", explicitly to handle the `env -i` / stale `.envrc` case where the shell sets `ANTHROPIC_API_KEY=""` and the SDK would otherwise 401 instead of cleanly skipping.  Default model `claude-opus-4-7` with `thinking={"type": "adaptive", "display": "summarized"}` (only mode supported on Opus 4.7 — manual `budget_tokens` 400s) and `output_config={"effort": "high"}` (recommended minimum for intelligence-sensitive work per the `claude-api` skill); two prompt-cache breakpoints (rubric + input bundle) per the design doc's caching strategy so the common adjudication-loop workflow hits cache on both layers; streamed via `messages.stream(...).get_final_message()` to dodge the 10-min idle-connection timeout on long adaptive-thinking responses.  `build_input_bundle` is pure (same `release_dir` → byte-identical bytes → identical `sha256`) and assembles eleven blocks: `release/README.md`, per-tier `dataset_card.md`, `docs/release/generation_method.md`, `manifest.json`, `feature_dictionary.csv`, `validation_report.{md,json}`, the first 100 test-split rows rendered as deterministic CSV, the public/instructor diff summary (live-derived from the `BANNED_LEAD_COLUMNS` / `BANNED_OPP_COLUMNS` / `BANNED_TABLES` / `SNAPSHOT_FILTERED_TABLES` constants in `leakage_probes.py` — single source of truth, auto-stays-in-sync, sync-tested), the public-safe mechanism summary (motif family **names** + difficulty knob **names**, never values — same redaction posture as `student_public`), and the break-me guide verbatim ("avoid re-deriving" the existing nine patterns).  `parse_critique_response` schema-validator pins eleven malformations (missing required field, wrong severity, wrong category, wrong rubric dimension, finding-id collision, findings non-list, top-level non-object, non-JSON, score out of range, defensive code-fence stripping, empty findings list valid) and returns every problem in one error rather than the first one.  Output schema is a frozen dataclass (no pydantic dependency) with the nine-value `category` vocabulary lifted **verbatim** from `break_me_guide.md` so findings route to existing issue-template labels without translation; `rubric_dimension: str` is required on every finding (D1-D14) so reviewers can audit clustering.  Provenance triple (`model` / `effort` / `thinking_mode`) plus per-source-file `bundle_hashes` and the assembled `input_bundle_sha256` are carried on every result for audit-artifact-sync — re-runs on the same RC produce the same bundle hashes.  `docs/release/llm_critique_prompt.md` (new) — the rubric document the driver feeds to Claude, parseable via `<system_prompt>` / `<user_cue>` section markers with surrounding prose ignored; fourteen rubric dimensions (D1 documentation truthfulness · D2 leakage discipline · D3 realism vs disclosure · D4 difficulty signal · D5 calibration / value-aware ranking · D6 cohort/time-window discipline · D7 notebook integrity · D8 platform packaging hygiene · D9 adversarial-framing completeness · D10 pedagogy of the documented `total_touches_all` trap · D11 effective semantic diversity per recommendation #12 v1 scope · D12 Datasheets-for-Datasets composition · D13 manifest/provenance integrity · D14 out-of-scope guard).  Severity calibration explicitly written to discourage padding the report with low-severity nits and to surface "no high-severity findings" as a positive signal vs "the critique didn't surface any".  `scripts/run_llm_critique.py` (new) — driver mirroring `validate_release_candidate.py`'s posture (free-function `parse_args`, frozen `DriverConfig`, `run_critique(config) -> DriverResult`, `main(argv)` returning an exit code).  Skip-cleanly path triggers BEFORE any I/O — no rubric read, no bundle build, no out-dir creation; tested explicitly with `not (tmp_path / "out").exists()` after the skip.  Three modes alongside the live path: `--dry-run` writes the rendered input bundle to `<out-dir>/llm_critique_input_<ts>.md` for human inspection (different filename from the real raw JSON, can't be confused); `--no-execute` calls `api_key_or_skip` + `build_anthropic_client()` to prove the SDK is installed and creds are present without burning an API call (CI smoke); `--out-tag` suffixes the raw filename so adjudication re-runs don't shadow the canonical run.  Outputs: timestamped `llm_critique_raw_<UTC-iso>.json` (accumulates per run, no clobber) + canonical `llm_critique_summary.md` (overwritten in place so dataset-card links don't rot).  Exit codes mirror `validate_release_candidate.py`: 0 pass (skip-cleanly counts as pass), 1 high-severity surfaced and unresolved, 2 pre-flight error or schema-validation failure (every problem rendered to stderr, not just the first).  Adjudication is **maintainer-driven** post-exit — resolve in code OR log to `v2_decision_log.md`, then re-run; the next critique's exit code is the gate.  Tests: 61 cases across `tests/validation/test_llm_critique.py` (48) and `tests/scripts/test_run_llm_critique.py` (13), no live API; the protocol is exercised via a small in-process `_CannedClient` fake.  Sync tests pin: every `VALID_CATEGORIES` entry appears in `break_me_guide.md` (vocabulary doesn't drift), `VALID_RUBRIC_DIMENSIONS` is exactly D1-D14, the live-derived public/instructor diff names every banned-column/banned-table constant (live reference, not duplicated string).  Audit-artifact-sync smoke test (`test_real_release_dir_smoke`) builds the input bundle against the actual `release/intermediate/` artefacts and pins determinism on the real input, skipping cleanly when bundles aren't present.  `docs/release/llm_critique_design.md` (new) records the nine load-bearing design calls before implementation so a reviewer can audit the choice (provider abstraction, skip-cleanly, model+caching+thinking, output schema, input-bundle composition, determinism via provenance, CLI flags, test posture, first-run adjudication workflow).  Live first-run deferred to maintainer (no `ANTHROPIC_API_KEY` available to the agent); the dry-run path was exercised against the real release dir end-to-end, producing a 148KB byte-stable input bundle from the actual artefacts.  Hostile self-review pass before requesting review caught and folded back twelve findings against the diff, including two BLOCKERs (`--no-execute` was performing pre-flight I/O before the credentials check, contradicting the design doc; raw-output filename collision at second-precision contradicted the "append-only history" promise — fixed with microsecond precision and a pinning test) and five HIGHs (silent `release_id` default that defeated the audit-artifact-sync gate; design-doc lies about a never-existing `temperature` field and "malformed timestamp" malformation that's driver-generated; dead `if/else` branches in `_safe_difficulty_knobs`; greedy regex for the rubric section markers so the prompt-injection warning paragraph that legitimately references `</user_cue>` doesn't break the parser).  Prompt-injection mitigation added to the rubric (treat-input-as-data preamble) since the input bundle inlines user-authored content (dataset_card.md, break_me_guide.md).  Schema validator hardened against silent `str()` coercion of finding prose fields (an int "claim" would have landed on disk as the string "5" — now rejected).  Net: 1321/1321 tests pass + 5 publish-extra-gated skips; ruff + mypy clean (83 source files); leakage probes 0/3 on every tier; hash determinism PASS 67/67; `validate_release_candidate --no-rebuild` exits 0; `BUNDLE_SCHEMA_VERSION` unchanged at 5; validation_report timestamp drift reverted before commit per the brief.  Second senior-dev review pass after PR #76 was opened caught and folded back 9 more issues, several of which were real bugs the first hostile pass missed: (B1) `--out-tag` suffixed only the raw JSON, leaving `llm_critique_summary.md` clobbered on adjudication runs — fix suffixes both files (`summary_output_path` now takes `tag`); (B2) skip-cleanly silently passed a release-readiness gate, contradicting `v1_release_roadmap.md`'s line-35 acceptance criterion that the critique must actually run — added `--require-execute` flag (default off; release-readiness CI sets it) that converts the skip path into `MissingCredentialsError` exit 2, plus a loud `WARNING — release-readiness gate has NOT been evaluated` stderr line on the regular skip path; (A2) two prompt-cache breakpoints cut to one — system content already sits inside the cached prefix on `messages.create` (system → messages render order), so the second breakpoint bought nothing and burned a slot; (M1) design doc cut from 394 lines to 73 — the 9-decision table replaces the multi-paragraph rationale-per-call shape that read as documentation theater; (M2) rubric cut from 420 lines to ~210 — each dimension now one paragraph instead of 3-6, dropped D14 ("out-of-scope guard") which was meta-instruction not a rubric dimension, made it a "What is NOT yours to audit" appendix at the end; rubric is now D1-D13 and `VALID_RUBRIC_DIMENSIONS` updated in lockstep; (M3) test-split sample replaced 100 raw rows of CSV with `df.describe(include="all")` per-column statistics + a 20-row head — distributional conclusions need statistics not raw rows, and the rendered input bundle dropped from 148KB to 128KB; (M5) streaming-via-`messages.stream` replaced with `messages.create(timeout=600.0)` — no stream events were processed anyway, the contract is just "don't time out on long adaptive-thinking responses" and an explicit timeout is the right way to spell that; (M6) `render_input_bundle_text` free function moved to `InputBundle.render()` method — leaky abstraction; the audit-artifact-sync framing was misleading (no committed-artefact diff) and was renamed to "smoke test against the real release dir" / "staleness check vs committed result" throughout the module and design doc.  Net after the second pass: 1323/1323 tests pass + 5 publish-extra-gated skips; ruff + mypy clean; leakage probes 0/3 on every tier; hash determinism PASS 67/67; `validate_release_candidate --no-rebuild` exits 0; `BUNDLE_SCHEMA_VERSION` unchanged at 5; validation_report timestamp drift reverted again before this commit.  First live critique run executed by the maintainer with a dedicated Anthropic project key (`leadforge-llm-critique-v1-prod`): score 7/10, six findings (1 high, 4 medium, 1 low), exit code 1 as designed for unresolved high-severity findings.  Adjudication: F001 high-severity (93 % `account_id` overlap between train/test documented only in break_me_guide §5, missing from README/dataset_card) — **resolved in code** by adding a "Group-leakage warning" paragraph to `release/README.md` "Splits" subsection citing the 518/557 figure and a `GroupKFold(account_id)` recipe; the parallel disclosure on the auto-rendered `dataset_card.md` is logged as `accepted-for-v2` because the renderer change is out of scope for PR 7.1's no-bundle-regen rule.  F004 medium (break_me_guide pattern 5 covered `account_id` but not `contact_id`, despite contacts being shared across the lead-keyed split at the same magnitude) — **resolved in code** by extending §5 to enumerate both keys and any reusable foreign-key column as group-leakage axes.  F006 low (README "Conversion rate (recipe band)" column header didn't make clear it was a recipe-acceptance window not an observed range) — **resolved in code** by renaming to "(acceptance band, gate G7.\*)" and adding a one-sentence note that observed five-seed spreads sit comfortably inside the band.  F002 medium (Gaussian noise produces non-physical values: negative ACV, negative day-deltas, day-deltas > snapshot_day=30, undisclosed in dataset card) — `accepted-for-v2`; requires `leadforge/narrative/dataset_card.py` change.  F003 medium (`](../foo)` relative links would 404 on Kaggle/HF) — `wont-fix`: already treated by `scripts/_release_common.py::rewrite_release_links()` which both platform packagers (PR 5.1, 5.2) call at packaging time; the LLM didn't have visibility into the platform packagers and made a wrong inference.  F005 medium (advanced-tier `calibration_max_bin_error = 0.5234` driven by an n=2 high-probability bin, no minimum-bin-count footnote) — `accepted-for-v2`; not a 1-line change, touches `release_quality.py` metric definition and would require regenerating `validation_report.{json,md}` which PR 7.1's brief explicitly forbids.  Three missing-section callouts (Datasheets §Biases, §Privacy, per-bundle group-split warning) and three maintainer questions (noise/windowing interaction, `top_decile_rate` naming, Kaggle/HF docs subtree) all logged to `docs/release/v2_decision_log.md`.  README edits cascaded into the platform packager artefacts; `release/kaggle/dataset-metadata.json` and `release/huggingface/README.md` regenerated cleanly via the existing packagers (`scripts/package_{kaggle,hf}_release.py`).  Critique run output committed to `release/validation/llm_critique_raw_20260508T204359.124834Z.json` + `release/validation/llm_critique_summary.md`.  Final net: 1325/1325 tests pass + 5 publish-extra-gated skips; ruff + mypy clean (83 source files); leakage probes 0/3 on every tier; hash determinism PASS 67/67; `validate_release_candidate --no-rebuild` exits 0; `BUNDLE_SCHEMA_VERSION` unchanged at 5.  Phase 7 PR 7.1 closed; PR 7.2 (local Kaggle/HF mock-page preview) is next.
-- [ ] **PR 7.2** — local Kaggle + HuggingFace mock-page preview tooling (must land before PR 7.3): `scripts/preview_kaggle_page.py` and `scripts/preview_hf_page.py` render offline HTML mocks of the public Kaggle and HF dataset pages from the *exact* upload artefacts (metadata JSON, README, cover image), serve over `localhost`, and let the maintainer click through both pages in a browser before any platform upload — catches styling / link / YAML-rendering issues before they hit cached previews on the live page. Tests cover required-field presence, link resolution, schema column listing, configs-block round-trip.
+- [x] PR 7.2: local Kaggle + HuggingFace mock-page preview tooling landed.  `scripts/preview_kaggle_page.py` (new) — reads the *exact* artefacts the publish PR will upload (`release/kaggle/dataset-metadata.json` + the inlined README body + the cover image, prefer `release/kaggle/dataset-cover-image.png` then fall back to the gitignore-resilient `release/dataset-cover-image.png` master copy) and renders an offline HTML page mocking the public Kaggle dataset view: header (title / subtitle / id pill / licence / update-frequency / visibility), cover image, rendered description (the inlined README body), file tree of declared resources grouped by tier with per-tier counts, schema/columns table for every tabular resource (`resources[].schema.fields[].name/type/description`) with per-table column counts in the heading, user-specified-sources block (rendered only when present), keywords + licence footer.  Serves on `http://localhost:8765` via stdlib `http.server.ThreadingHTTPServer` (the threading variant inherits `allow_reuse_address=True` from `HTTPServer`, so Ctrl-C → re-run within ~60s does not raise `OSError [Errno 48] Address already in use` while the socket sits in TIME_WAIT — caught and folded back in self-review pass 1, the initial draft used `socketserver.ThreadingTCPServer` which defaults to `False`).  `--no-serve` builds the HTML and exits (CI / inspection mode); `--open-browser` pops a tab on startup; `--port` / `--release-dir` / `--out-dir` round out the surface.  `scripts/preview_hf_page.py` (new) — reads `release/huggingface/README.md` (or `release/huggingface-instructor/README.md` per `--variant=public|instructor`) and parses YAML frontmatter + Markdown body via a single anchored regex (`r"\A---\n(?P<yaml>.*?)\n---\n(?P<body>.*)\Z"` with `re.DOTALL`); renders the analogous HF view: header pills (pretty_name + license + task_categories + size_categories + language), tag chips, configs dropdown (one details-block per `configs[]` entry with the default config flagged via a single `badge--default` instance, data_files split→path table per config), file tree of declared YAML paths bucketed by config, README body, footer carrying the variant for human visual confirmation.  `--variant` defaults `--out-dir` to `release/_preview/huggingface/` (public) or `release/_preview/huggingface-instructor/` (instructor); the instructor path also reads its README from a different location (`huggingface-instructor/README.md`) and looks for the cover under the variant directory first.  Both scripts share the validation discipline from the Phase 5 packagers: build → validate → write; pre-flight failures (missing metadata, malformed JSON / YAML, unknown variant, missing cover) raise and the CLI converts to rc=2 without touching disk; runtime success exits 0.  Markdown rendering via `markdown-it-py` in `gfm-like` preset (tables / fenced code / strikethrough on; `linkify` explicitly disabled so the optional `linkify-it-py` transitive dep is not required); the dep is added to the `[publish]` extra alongside `datasets` / `kaggle` (mirrors the PR 5.1 / 5.2 gating posture for publish-pipeline tooling), and absent imports raise a clean `ImportError` pointing at `pip install -e ".[publish]"` instead of a cryptic stdlib `ModuleNotFoundError`.  Both renderers are pure: same `(metadata|doc, cover_filename|variant)` → byte-identical HTML (no `now()`, no random, no clock).  Output landing at `release/_preview/<platform>/index.html` is gitignored (`.gitignore` adds `release/_preview/`); the audit-artefact-sync gate lives at `release/_preview_committed/{kaggle,huggingface_public,huggingface_instructor}.html` (committed alongside the scripts, mirrors the PR 4.1 / 5.1 / 5.2 / 7.1 audit-sync pattern).  HTML is wrapped in a single self-contained file (CSS inlined, no external stylesheet) so each committed sample is human-inspectable directly from `git show` or a browser without a server.  XSS-safety: every user-controlled string passes through a hand-rolled `_escape` (`&`, `<`, `>`, `"`, `'`); kept hand-rolled rather than `html.escape` so the committed samples' `&#39;` (decimal) escapes don't churn against `html.escape`'s `&#x27;` (hex) entity.  Tests: 48 cases across `tests/scripts/test_preview_kaggle_page.py` (20) and `tests/scripts/test_preview_hf_page.py` (28); no live HTTP, no network, no socket open.  The four roadmap-mandated checks per script: required field labels appear in rendered HTML (Kaggle: title / subtitle / id / license / file count / schema column count; HF: pretty_name / license / configs / tags); every Markdown link in the source resolves to a non-allowlisted URL pattern fails the test (allow-list: `https://github.com/leadforge-dev/leadforge`, `https://huggingface.co/datasets/leadforge`, sibling-relative `LICENSE`, in-document `#` anchors — anything else is a 404 risk on the live page); the Kaggle schema table lists every column declared in `resources[].schema.fields` (iterates the committed metadata, asserts each `<code>{name}</code>` appears); every `configs[]` block in the HF YAML round-trips into the rendered dropdown.  Determinism is double-tested: `test_render_is_byte_deterministic` runs two passes against the real release artefact and pins equality; `test_committed_*_sample_matches_fresh_regeneration` pins the committed HTML against fresh regeneration byte-for-byte (the audit-sync gate).  Pre-flight error paths exercised end-to-end: missing artefact (`FileNotFoundError`), malformed JSON / YAML (`ValueError`), unknown variant, missing cover image — all return rc=2 via `main()` with informative stderr.  HTML escape coverage: `test_render_escapes_html_in_field_values` asserts a `<script>` payload in the title / pretty_name field is rendered as `&lt;script&gt;`, not as a live tag (XSS guard for any future recipe that surfaces unescaped user content).  `parse_hf_readme` rejects missing-frontmatter and non-mapping-frontmatter inputs explicitly so the renderer never sees half-parsed input.  `pyproject.toml` `[tool.ruff.lint.per-file-ignores]` adds `E501` for both preview scripts — inlined CSS strings inside f-string templates are the rendered product, not source code that benefits from a 100c wrap (mirrors the existing `scripts/build_release_notebook_*.py` ignore for the same reason).  `docs/release/preview_pages_design.md` (new, 59 lines) records the ten load-bearing design calls in the same decision-table shape as `llm_critique_design.md`: two scripts vs unified renderer, stdlib server vs Flask, f-string templates vs Jinja2, `markdown-it-py` via `[publish]` extra (with rationale for why this differs from the PR 5.1 / 5.2 *test* gating — preview scripts' runtime path requires the renderer, not just the smoke test), output-dir convention, cover-image inlining, HF variant flag, CLI shape, audit-sync, test posture (no live HTTP, no BeautifulSoup dep), plus the link-resolution rule (every rendered href must be in the allow-list — guards against the rewrite-stops-firing regression for `](../foo)` and `](validation/...)`).  Hostile self-review pass 1 caught and folded back three findings: (B1) BUG — `socketserver.ThreadingTCPServer` defaults `allow_reuse_address=False`, restart-after-Ctrl-C would 60-second TIME_WAIT; switched to `http.server.ThreadingHTTPServer`; (D1) DEAD CODE — `COMMITTED_SAMPLE_PATH` (Kaggle) and `_VARIANT_SAMPLE_PATH` (HF) module-level constants defined but never read at runtime (tests use their own `_REPO_ROOT`-rooted paths); deleted both, dropped the now-unused `socketserver` import; (M1) DOC LIE — `_resolve_cover_image` Kaggle docstring claimed "we prefer the kaggle-tree copy" without acknowledging that `release/kaggle/dataset-cover-image.png` is gitignored on a fresh checkout (only the committed master copy at `release/dataset-cover-image.png` is guaranteed present); reworded to call out the lookup order + gitignore reality.  Pass 2 found no significant architectural / scope issues — the ~30 lines of intentional duplication between the two scripts (`_escape`, `_serve`, `_make_handler_factory`, partly-duplicated CSS) are below the threshold where a `_preview_common.py` extraction would pay back; the Phase 5 `_release_common.py` exists for things shared between two callers, and a third caller is not on the horizon.  Net: 1373/1373 tests pass (1325 baseline + 48 new) + 5 publish-extra-gated skips; ruff + mypy clean (83 source files); leakage probes 0/3 on every tier; hash determinism PASS 67/67; `validate_release_candidate --no-rebuild` exits 0 (3 tiers, 5 seeds, 0 leakage findings); `BUNDLE_SCHEMA_VERSION` unchanged at 5; validation_report timestamp drift reverted before commit per the brief.  Phase 7 PR 7.2 closed; PR 7.3 (`publish_kaggle.py` + `publish_hf.py` + `docs/release/v1_release_notes.md` + tag `leadforge-lead-scoring-v1`) is next, and its publish runbook will cite the two preview commands as a required pre-flight step before `kaggle datasets create` / `huggingface-cli upload`.
 - [ ] **PR 7.3** — `scripts/{publish_kaggle,publish_hf}.py` (dry-run → local mock-page review → private/draft → public). Tag `leadforge-lead-scoring-v1`; `docs/release/v1_release_notes.md` (cites PR 7.2's preview commands as required pre-flight).
 
 ---
diff --git a/.gitignore b/.gitignore
index a991601..e9893bd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -233,3 +233,9 @@ release/huggingface/*
 !release/huggingface/README.md
 release/huggingface-instructor/*
 !release/huggingface-instructor/README.md
+
+# Generated local preview-page output (PR 7.2) — runtime HTML rendered
+# by scripts/preview_{kaggle,hf}_page.py.  The committed sample HTML
+# under release/_preview_committed/ is the audit-artefact-sync gate
+# and is checked into git separately.
+release/_preview/
diff --git a/docs/release/preview_pages_design.md b/docs/release/preview_pages_design.md
new file mode 100644
index 0000000..7ad607c
--- /dev/null
+++ b/docs/release/preview_pages_design.md
@@ -0,0 +1,68 @@
+# PR 7.2 — Local Kaggle / HF preview-page design notes
+
+Working notes for `scripts/preview_kaggle_page.py`,
+`scripts/preview_hf_page.py`, the shared `scripts/_preview_common.py`,
+their tests, and the committed sample HTML used as the
+regeneration-discipline gate. Captured before implementation; revised
+after self-review pass 3 to be honest about scope. Kept short on
+purpose.
+
+The PR's pedagogical role is the *staging gate* before PR 7.3: the
+maintainer renders both platforms locally from the same artefacts the
+publish PR will upload, clicks through them in a browser, and catches
+link / config / column-listing issues before they hit cached
+previews on the live page.
+
+This is a **publication-readiness preview**, not a Kaggle / HF
+look-alike. Pixel fidelity is explicitly out of scope; the chrome
+(CSS palette, layout) is approximate. The tool's job is structured
+rendering of the upload artefacts so a maintainer can verify the
+content; visual brand matching is not its job.
+
+## Decisions
+
+| # | Decision | Why |
+|---|---|---|
+| 1 | Two scripts, one per platform, sharing `scripts/_preview_common.py` for `escape` / `make_server` / `serve`. | Inputs are different (`dataset-metadata.json` vs YAML-frontmatter README) and page structures differ enough that one renderer per platform reads better. The 3-helper shared module replaces what was duplicated byte-for-byte. |
+| 2 | Server: stdlib `http.server.ThreadingHTTPServer` via the shared `make_server(directory, port) -> ThreadingHTTPServer`. | `ThreadingHTTPServer` inherits `allow_reuse_address=True` from `HTTPServer` (bare `socketserver.ThreadingTCPServer` does not — Ctrl-C → re-run within ~60s would TIME_WAIT). The `make_server` seam is what lets the tests bind on port 0, GET `/`, and shut down cleanly without forking a subprocess. |
+| 3 | Templates: f-string helpers, not Jinja2. | Layout is layout-stable; two pages don't justify a templating engine. |
+| 4 | Markdown→HTML via `markdown-it-py` (in `[dev]` AND `[publish]`). `gfm-like` preset; `linkify` disabled to avoid the `linkify-it-py` transitive dep. | The dep is small and pure-Python; the renderer is the test surface (not a smoke), so gating it behind `[publish]` would mean CI's `[dev]`-only test job ImportErrors on every render. Listed in both extras so neither path breaks. |
+| 5 | Output dir: `release/_preview/<platform>/` (gitignored). Committed regeneration samples at `release/_preview_committed/{kaggle,huggingface_public,huggingface_instructor}.html`. | Mirrors `release/_release_quality/` convention. The committed samples double as human-inspectable rendered output for code reviewers who don't want to install the dep and run the script. |
+| 6 | Cover image copied into the preview tree (sibling-relative `<img src=>`). | Both platforms inline-display the cover image; serving it under the preview root means the rendered HTML works without absolute paths. |
+| 7 | HF `--variant=public|instructor` reads either `release/huggingface/README.md` or `release/huggingface-instructor/README.md`. Kaggle has no instructor variant. | Matches the PR 5.2 publish reality. |
+| 8 | CLI mirrors `validate_release_candidate.py` / `run_llm_critique.py`: free-function `parse_args`, frozen `Config`, `run_preview(config) -> Outcome`, `main(argv) -> int`. Exit codes 0 / 2. Flags: `--release-dir`, `--port` (8765 Kaggle / 8766 HF), `--out-dir`, `--variant` (HF only), `--open-browser`, `--no-serve`. | Maintainer muscle memory + small surface. `--no-serve` is the CI / inspection mode; `--open-browser` pops a tab on startup. |
+| 9 | The byte-equality test against the committed sample is a **regeneration-discipline gate**, NOT a renderer audit. The renderer-audit work is done by the structural tests (schema-column exhaustiveness, link allow-list, configs round-trip), each of which compares rendered output against an independent source of truth. | A bug in the renderer propagates to both the committed sample (regenerated by the same code) and the test, so byte-equality alone catches "someone forgot to regenerate", not correctness. Calling it "audit-artefact-sync" oversells; the structural tests are the real audits. |
+| 10 | Test posture: in-process. No live HTTP for the rendering tests; `_preview_common.make_server` is exercised by a single port-0 smoke test that GETs `/` and shuts down. The render functions are pure and tested via substring + regex on the rendered string. | No new test deps (no BeautifulSoup). Substring assertions on deterministic rendered HTML give the same coverage with less surface. The smoke test covers the previously-untested server glue. |
+| 11 | Visibility pill (Kaggle) NOT rendered. HF "Files declared in YAML" section NOT rendered. | Both were fidelity bugs caught in self-review pass 3. Kaggle's public page does not display `isPrivate` — showing it would mislead the maintainer about what public viewers see. HF's "Files declared in YAML" surfaced an internal concept (the configs[].data_files paths) that the configs dropdown already lists, while omitting most of the actual upload tree (manifest.json, tables/, feature_dictionary.csv, …). |
+
+## Link-resolution rule (test pin)
+
+Every Markdown link `](URL)` in the README body the renderer ingests
+must satisfy ONE of:
+
+1. Absolute `https://github.com/leadforge-dev/leadforge/...` URL (the
+   rewrite output of `_release_common.py::rewrite_release_links()`).
+2. External absolute URL on a known-OK domain (`https://huggingface.co`).
+3. Relative path that resolves to a file under the upload tree
+   (e.g. `LICENSE` → `release/<platform>/LICENSE`).
+4. In-document anchor (`#footnote-1` etc.).
+
+A `](../foo)` or `](validation/...)` link in the rendered HTML is a
+regression — those are exactly what the platform packagers' rewrite
+is supposed to canonicalise away. The test fires the moment the
+rewrite stops doing its job for the upstream artefact the preview
+renders.
+
+## What this PR does not touch
+
+- `BUNDLE_SCHEMA_VERSION` stays at 5.
+- `release/validation/validation_report.{json,md}` does not regenerate
+  (revert any timestamp drift before commit).
+- PR 7.3 (publish + tag) is a separate PR; the runbook there will cite
+  the two preview commands as a required pre-flight step.
+- No change to the platform packagers (`scripts/package_{kaggle,hf}_release.py`)
+  or `_release_common.py`. The preview reads what the packagers wrote.
+- Live Kaggle / HF API calls — pure local rendering only.
+- Pixel-perfect cloning of the live pages. The bar is structured
+  rendering of the upload artefacts; visual brand matching is out of
+  scope.
diff --git a/pyproject.toml b/pyproject.toml
index 79c250c..871cd2a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -40,6 +40,13 @@ dev = [
     "types-pyyaml>=6.0",
     "scikit-learn>=1.3",
     "matplotlib>=3.7",
+    # PR 7.2: the preview-page renderers (scripts/preview_{kaggle,hf}_page.py)
+    # call into markdown-it-py at test time via render_*_html().  Keeping
+    # the dep here as well as in [publish] means CI's "test" job (which
+    # installs only [dev]) does not ImportError mid-test.  pytest.importorskip
+    # would also work, but the rendering tests are the primary coverage of
+    # this PR — gating them off would defeat the purpose.
+    "markdown-it-py>=3.0",
 ]
 scripts = [
     "scikit-learn>=1.3",
@@ -49,10 +56,14 @@ scripts = [
 # this extra (``pip install -e ".[publish]"``) enables the gated
 # ``load_dataset()`` / Kaggle-CLI smoke tests that verify G11.3 (Kaggle
 # package) and G12.3 / G12.4 (HF load_dataset round-trip) without
-# pulling the heavy SDKs into the default dev install.
+# pulling the heavy SDKs into the default dev install.  PR 7.2 adds
+# ``markdown-it-py`` for the local Kaggle / HF preview pages
+# (``scripts/preview_{kaggle,hf}_page.py``) — same publish-extra
+# posture, missing import raises a clean error pointing at this extra.
 publish = [
     "datasets>=2.14",
     "kaggle>=1.6",
+    "markdown-it-py>=3.0",
 ]
 # Optional dependencies for executing the public release notebooks.
 # Installing this extra (``pip install -e ".[notebooks]"``) enables the
@@ -103,6 +114,13 @@ select = ["E", "F", "I", "N", "W", "UP", "B", "C4", "PT", "S"]
 # Line length is a property of the rendered cell, not the .py source,
 # so 100c is the wrong yardstick here.
 "scripts/build_release_notebook_*.py" = ["E501"]
+# Preview-page scripts (PR 7.2) carry inlined CSS + multi-attribute
+# HTML strings inside f-string templates; the rendered HTML is the
+# product, so wrapping the source CSS at 100c is line noise.
+"scripts/preview_kaggle_page.py" = ["E501"]
+"scripts/preview_hf_page.py" = ["E501"]
+# _preview_common is plain Python (no inline HTML / CSS); leaving
+# E501 enabled.
 
 [tool.mypy]
 python_version = "3.11"
diff --git a/release/_preview_committed/huggingface_instructor.html b/release/_preview_committed/huggingface_instructor.html
new file mode 100644
index 0000000..0d81395
--- /dev/null
+++ b/release/_preview_committed/huggingface_instructor.html
@@ -0,0 +1,276 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <title>HF preview — LeadForge: Synthetic B2B Lead Scoring (v1) — Instructor companion</title>
+  <style>:root { --bg:#fff; --fg:#1f2937; --muted:#6b7280; --accent:#ff9d00; --border:#e5e7eb; --pill-bg:#f3f4f6; --code-bg:#f9fafb; }
+body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, sans-serif; color: var(--fg); background: var(--bg); margin: 0; padding: 0; line-height: 1.6; }
+.container { max-width: 1100px; margin: 0 auto; padding: 24px 32px; }
+.dataset-header { border-bottom: 1px solid var(--border); padding-bottom: 16px; margin-bottom: 24px; }
+.dataset-header__namespace { color: var(--muted); font-size: 0.85em; font-family: monospace; margin-bottom: 4px; }
+.dataset-header__title { font-size: 1.8em; margin: 0 0 12px 0; }
+.dataset-header__pills { list-style: none; padding: 0; margin: 0; display: flex; flex-wrap: wrap; gap: 8px; }
+.pill { background: var(--pill-bg); border-radius: 12px; padding: 4px 12px; font-size: 0.85em; color: var(--fg); }
+.cover { margin: 0 0 24px 0; border: 1px solid var(--border); border-radius: 4px; overflow: hidden; }
+.cover__image { display: block; max-width: 100%; height: auto; }
+.tags { margin: 0 0 24px 0; }
+.chip { display: inline-block; background: var(--pill-bg); border-radius: 12px; padding: 2px 10px; margin: 2px 4px 2px 0; font-size: 0.85em; color: var(--fg); }
+.section__heading { font-size: 1.3em; border-bottom: 2px solid var(--accent); padding-bottom: 4px; margin-top: 32px; }
+.section__count { color: var(--muted); font-size: 0.7em; font-weight: normal; }
+.config { border: 1px solid var(--border); border-radius: 4px; padding: 8px 12px; margin: 8px 0; }
+.config__name { cursor: pointer; font-weight: 600; }
+.config__count { color: var(--muted); font-weight: normal; font-size: 0.85em; }
+.badge { display: inline-block; padding: 1px 8px; border-radius: 4px; font-size: 0.75em; font-weight: 600; vertical-align: middle; margin-left: 4px; }
+.badge--default { background: var(--accent); color: white; }
+.config__table { width: 100%; border-collapse: collapse; margin-top: 8px; font-size: 0.9em; }
+.config__table th, .config__table td { text-align: left; padding: 6px 8px; border-bottom: 1px solid var(--border); }
+.config__table th { background: var(--pill-bg); font-weight: 600; }
+.readme { margin: 24px 0; }
+.readme code { background: var(--code-bg); padding: 1px 4px; border-radius: 2px; font-size: 0.9em; }
+.readme pre { background: var(--code-bg); padding: 12px; border-radius: 4px; overflow-x: auto; }
+.readme pre code { background: none; padding: 0; }
+.readme table { border-collapse: collapse; margin: 12px 0; }
+.readme th, .readme td { border: 1px solid var(--border); padding: 6px 10px; text-align: left; }
+.readme blockquote { border-left: 3px solid var(--accent); padding-left: 12px; color: var(--muted); margin: 12px 0; }
+.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
+.dataset-footer__note { font-style: italic; margin-top: 8px; }
+</style>
+</head>
+<body>
+<main class="container">
+<header class="dataset-header">
+  <div class="dataset-header__namespace">huggingface.co/datasets</div>
+  <h1 class="dataset-header__title">LeadForge: Synthetic B2B Lead Scoring (v1) — Instructor companion</h1>
+  <ul class="dataset-header__pills">
+    <li class="pill pill--license">License: mit</li>
+    <li class="pill pill--task">Task: tabular-classification</li>
+    <li class="pill pill--size">Size: 1K&lt;n&lt;10K</li>
+    <li class="pill pill--language">Language: en</li>
+  </ul>
+</header>
+<section class="cover">
+  <img class="cover__image" src="dataset-cover-image.png" alt="Dataset cover image">
+</section>
+<section class="tags">
+  <span class="chip">b2b</span> <span class="chip">crm</span> <span class="chip">datasets</span> <span class="chip">lead-scoring</span> <span class="chip">pandas</span> <span class="chip">synthetic-data</span> <span class="chip">tabular</span>
+</section>
+<section class="configs">
+  <h2 class="section__heading">Configurations / Subsets <span class="section__count">(1 config)</span></h2>
+  <details class="config" open>
+    <summary class="config__name"><code>intermediate</code> <span class="badge badge--default">default</span> <span class="config__count">(3 splits)</span></summary>
+    <table class="config__table">
+      <thead><tr><th>Split</th><th>Path</th></tr></thead>
+      <tbody>
+      <tr><td>train</td><td><code>intermediate/tasks/converted_within_90_days/train.parquet</code></td></tr>
+      <tr><td>validation</td><td><code>intermediate/tasks/converted_within_90_days/valid.parquet</code></td></tr>
+      <tr><td>test</td><td><code>intermediate/tasks/converted_within_90_days/test.parquet</code></td></tr>
+      </tbody>
+    </table>
+  </details>
+</section>
+<section class="readme">
+<h1>LeadForge: Synthetic B2B Lead Scoring (v1) — Instructor companion</h1>
+<p>This is the <strong>research / instructor companion</strong> to the public
+<a href="https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1"><code>leadforge/leadforge-lead-scoring-v1</code></a>
+dataset.  It exposes the <strong>full-horizon</strong> view of a single difficulty
+tier (<code>intermediate</code>) plus the <strong>hidden causal structure</strong> that the
+public dataset deliberately redacts: the world graph (DAG), latent
+trait registry, mechanism summary, and full-horizon relational tables
+including <code>customers</code> and <code>subscriptions</code>.</p>
+<p>It exists for instructors who want to walk students through how the
+public dataset was generated, and for researchers who want to verify
+that the public redactions actually remove the leakage paths the
+dataset advertises.  <strong>It is not a replacement for the public dataset
+in any teaching or modelling context</strong> — students should still train
+on the public bundle.</p>
+<h2>What this companion contains</h2>
+<pre><code>.
+├── intermediate/                     # research_instructor companion: full-horizon
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── tables/*.parquet              # full-horizon tables (incl. customers, subscriptions)
+│   ├── tasks/converted_within_90_days/{train,valid,test}.parquet
+│   └── metadata/                     # world_spec, graph.{graphml,json}, latent_registry, etc.
+├── README.md                         # this file (HF dataset card)
+├── dataset-cover-image.png           # dataset thumbnail
+└── LICENSE
+</code></pre>
+<p>The single <code>intermediate</code> config exposes the same train/valid/test
+parquet splits as the public dataset's <code>intermediate</code> config — same
+seeds, same row counts (3,500 / 750 / 750), same target.  The
+difference lives in the relational tables and metadata:</p>
+<table>
+<thead>
+<tr>
+<th>File</th>
+<th>Public <code>intermediate</code></th>
+<th>Instructor companion</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>tables/leads.parquet</code></td>
+<td>redacted (label dropped)</td>
+<td>full (label retained)</td>
+</tr>
+<tr>
+<td><code>tables/opportunities.parquet</code></td>
+<td>snapshot-filtered + redacted</td>
+<td>full-horizon, full columns</td>
+</tr>
+<tr>
+<td><code>tables/customers.parquet</code></td>
+<td>omitted (would leak label)</td>
+<td>included</td>
+</tr>
+<tr>
+<td><code>tables/subscriptions.parquet</code></td>
+<td>omitted (would leak label)</td>
+<td>included</td>
+</tr>
+<tr>
+<td><code>tables/touches.parquet</code> etc.</td>
+<td>filtered to ≤ snapshot day</td>
+<td>full 90-day horizon</td>
+</tr>
+<tr>
+<td><code>metadata/world_spec.json</code></td>
+<td>absent</td>
+<td>included (DGP + recipe)</td>
+</tr>
+<tr>
+<td><code>metadata/graph.{graphml,json}</code></td>
+<td>absent</td>
+<td>included (hidden DAG)</td>
+</tr>
+<tr>
+<td><code>metadata/latent_registry.json</code></td>
+<td>absent</td>
+<td>included (latent traits)</td>
+</tr>
+<tr>
+<td><code>metadata/mechanism_summary.json</code></td>
+<td>absent</td>
+<td>included (per-edge mechanisms)</td>
+</tr>
+</tbody>
+</table>
+<p>The redaction contract is single-sourced in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/leadforge/validation/leakage_probes.py"><code>leadforge/validation/leakage_probes.py</code></a>
+and re-applied by
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/leadforge/render/relational_snapshot_safe.py"><code>leadforge/render/relational_snapshot_safe.py</code></a>
+when the public bundle is built; this companion is the unfiltered
+source view, so the two are always consistent by construction.</p>
+<h2>Quick start</h2>
+<pre><code class="language-python">from datasets import load_dataset
+
+# Loads the same train/valid/test splits as the public 'intermediate'
+# config; differs only in what `tables/` and `metadata/` provide.
+ds = load_dataset(
+    &quot;leadforge/leadforge-lead-scoring-v1-instructor&quot;,
+    name=&quot;intermediate&quot;,
+)
+train = ds[&quot;train&quot;].to_pandas()
+
+# Full-horizon relational tables — includes customers and subscriptions
+# (omitted from the public dataset because their existence reconstructs
+# the conversion label).
+import pandas as pd
+customers = pd.read_parquet(
+    &quot;hf://datasets/leadforge/leadforge-lead-scoring-v1-instructor/intermediate/tables/customers.parquet&quot;
+)
+</code></pre>
+<h2>Intended uses</h2>
+<ul>
+<li>Teaching the <strong>public-vs-instructor split</strong> itself: load both
+datasets side-by-side, show students which columns and tables were
+redacted, and walk through why each was a leakage path.</li>
+<li><strong>Verifying the redaction contract:</strong> train a model on the
+full-horizon tables, train another on the snapshot-safe public
+tables, compare AUC.  The gap is the redaction's effect.</li>
+<li>Teaching <strong>causal structure and DGP transparency</strong> using
+<code>metadata/world_spec.json</code> + <code>metadata/graph.json</code>.</li>
+<li>Reproducing the public dataset from the instructor view via
+<a href="https://github.com/leadforge-dev/leadforge/blob/main"><code>leadforge</code></a> source code.</li>
+</ul>
+<h2>Out-of-scope uses</h2>
+<ul>
+<li><strong>Production lead scoring.</strong>  Same as the public dataset; the
+company, product, and customers are fictional.</li>
+<li><strong>Modelling with the unredacted view as a baseline.</strong>  Models
+trained against the full-horizon tables look strong because they're
+directly seeing post-conversion events.  That number is not a
+baseline; it's the ceiling.</li>
+<li><strong>Demographic / fairness research.</strong>  v1 does not model protected
+attributes.</li>
+</ul>
+<h2>Composition</h2>
+<ul>
+<li><strong>Entities.</strong>  9 relational tables (accounts, contacts, leads,
+touches, sessions, sales_activities, opportunities, customers,
+subscriptions); per-row counts in <code>manifest.json</code>.</li>
+<li><strong>Splits.</strong>  Identical to the public <code>intermediate</code> config: 70/15/15
+train/valid/test, deterministic given seed 42, recorded in
+<code>tasks/converted_within_90_days/task_manifest.json</code>.</li>
+<li><strong>Provenance.</strong>  Recipe <code>b2b_saas_procurement_v1</code>, seed 42, package
+version stamped in <code>manifest.json</code> along with SHA-256 hashes for
+every parquet file.</li>
+<li><strong>Bundle schema version.</strong>  5 (matches the public dataset).</li>
+</ul>
+<h2>Maintenance, license</h2>
+<p>We <em>want</em> the dataset to be broken.  See the
+<a href="https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1">public dataset card</a>
+for the adversarial-framing pointers, the issue templates, and the
+break-me guide.  File issues at
+<a href="https://github.com/leadforge-dev/leadforge">leadforge-dev/leadforge</a>;
+PRs welcome.</p>
+<table>
+<thead>
+<tr>
+<th>Field</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Generator</td>
+<td>leadforge <code>1.0.0+</code></td>
+</tr>
+<tr>
+<td>Recipe</td>
+<td><code>b2b_saas_procurement_v1</code></td>
+</tr>
+<tr>
+<td>Canonical seed</td>
+<td>42</td>
+</tr>
+<tr>
+<td>Bundle schema version</td>
+<td>5</td>
+</tr>
+<tr>
+<td>Format</td>
+<td>Parquet (canonical)</td>
+</tr>
+<tr>
+<td>License</td>
+<td>MIT — see <a href="LICENSE">LICENSE</a></td>
+</tr>
+<tr>
+<td>Public dataset</td>
+<td><a href="https://huggingface.co/datasets/leadforge/leadforge-lead-scoring-v1">link</a></td>
+</tr>
+</tbody>
+</table>
+<p>Verify integrity with <code>leadforge validate &lt;bundle_dir&gt;</code>; every file is
+hashed in <code>manifest.json</code>.</p>
+</section>
+<footer class="dataset-footer">
+  <div class="dataset-footer__license">License: mit</div>
+  <div class="dataset-footer__variant">Variant: <code>instructor</code></div>
+  <div class="dataset-footer__note">Local Hugging Face publication-readiness preview rendered by scripts/preview_hf_page.py — not the live dataset page.</div>
+</footer>
+</main>
+</body>
+</html>
diff --git a/release/_preview_committed/huggingface_public.html b/release/_preview_committed/huggingface_public.html
new file mode 100644
index 0000000..3f1df70
--- /dev/null
+++ b/release/_preview_committed/huggingface_public.html
@@ -0,0 +1,466 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <title>HF preview — LeadForge: Synthetic B2B Lead Scoring (v1)</title>
+  <style>:root { --bg:#fff; --fg:#1f2937; --muted:#6b7280; --accent:#ff9d00; --border:#e5e7eb; --pill-bg:#f3f4f6; --code-bg:#f9fafb; }
+body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, sans-serif; color: var(--fg); background: var(--bg); margin: 0; padding: 0; line-height: 1.6; }
+.container { max-width: 1100px; margin: 0 auto; padding: 24px 32px; }
+.dataset-header { border-bottom: 1px solid var(--border); padding-bottom: 16px; margin-bottom: 24px; }
+.dataset-header__namespace { color: var(--muted); font-size: 0.85em; font-family: monospace; margin-bottom: 4px; }
+.dataset-header__title { font-size: 1.8em; margin: 0 0 12px 0; }
+.dataset-header__pills { list-style: none; padding: 0; margin: 0; display: flex; flex-wrap: wrap; gap: 8px; }
+.pill { background: var(--pill-bg); border-radius: 12px; padding: 4px 12px; font-size: 0.85em; color: var(--fg); }
+.cover { margin: 0 0 24px 0; border: 1px solid var(--border); border-radius: 4px; overflow: hidden; }
+.cover__image { display: block; max-width: 100%; height: auto; }
+.tags { margin: 0 0 24px 0; }
+.chip { display: inline-block; background: var(--pill-bg); border-radius: 12px; padding: 2px 10px; margin: 2px 4px 2px 0; font-size: 0.85em; color: var(--fg); }
+.section__heading { font-size: 1.3em; border-bottom: 2px solid var(--accent); padding-bottom: 4px; margin-top: 32px; }
+.section__count { color: var(--muted); font-size: 0.7em; font-weight: normal; }
+.config { border: 1px solid var(--border); border-radius: 4px; padding: 8px 12px; margin: 8px 0; }
+.config__name { cursor: pointer; font-weight: 600; }
+.config__count { color: var(--muted); font-weight: normal; font-size: 0.85em; }
+.badge { display: inline-block; padding: 1px 8px; border-radius: 4px; font-size: 0.75em; font-weight: 600; vertical-align: middle; margin-left: 4px; }
+.badge--default { background: var(--accent); color: white; }
+.config__table { width: 100%; border-collapse: collapse; margin-top: 8px; font-size: 0.9em; }
+.config__table th, .config__table td { text-align: left; padding: 6px 8px; border-bottom: 1px solid var(--border); }
+.config__table th { background: var(--pill-bg); font-weight: 600; }
+.readme { margin: 24px 0; }
+.readme code { background: var(--code-bg); padding: 1px 4px; border-radius: 2px; font-size: 0.9em; }
+.readme pre { background: var(--code-bg); padding: 12px; border-radius: 4px; overflow-x: auto; }
+.readme pre code { background: none; padding: 0; }
+.readme table { border-collapse: collapse; margin: 12px 0; }
+.readme th, .readme td { border: 1px solid var(--border); padding: 6px 10px; text-align: left; }
+.readme blockquote { border-left: 3px solid var(--accent); padding-left: 12px; color: var(--muted); margin: 12px 0; }
+.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
+.dataset-footer__note { font-style: italic; margin-top: 8px; }
+</style>
+</head>
+<body>
+<main class="container">
+<header class="dataset-header">
+  <div class="dataset-header__namespace">huggingface.co/datasets</div>
+  <h1 class="dataset-header__title">LeadForge: Synthetic B2B Lead Scoring (v1)</h1>
+  <ul class="dataset-header__pills">
+    <li class="pill pill--license">License: mit</li>
+    <li class="pill pill--task">Task: tabular-classification</li>
+    <li class="pill pill--size">Size: 1K&lt;n&lt;10K</li>
+    <li class="pill pill--language">Language: en</li>
+  </ul>
+</header>
+<section class="cover">
+  <img class="cover__image" src="dataset-cover-image.png" alt="Dataset cover image">
+</section>
+<section class="tags">
+  <span class="chip">b2b</span> <span class="chip">crm</span> <span class="chip">datasets</span> <span class="chip">lead-scoring</span> <span class="chip">pandas</span> <span class="chip">synthetic-data</span> <span class="chip">tabular</span>
+</section>
+<section class="configs">
+  <h2 class="section__heading">Configurations / Subsets <span class="section__count">(3 configs)</span></h2>
+  <details class="config" open>
+    <summary class="config__name"><code>intro</code> <span class="config__count">(3 splits)</span></summary>
+    <table class="config__table">
+      <thead><tr><th>Split</th><th>Path</th></tr></thead>
+      <tbody>
+      <tr><td>train</td><td><code>intro/tasks/converted_within_90_days/train.parquet</code></td></tr>
+      <tr><td>validation</td><td><code>intro/tasks/converted_within_90_days/valid.parquet</code></td></tr>
+      <tr><td>test</td><td><code>intro/tasks/converted_within_90_days/test.parquet</code></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="config" open>
+    <summary class="config__name"><code>intermediate</code> <span class="badge badge--default">default</span> <span class="config__count">(3 splits)</span></summary>
+    <table class="config__table">
+      <thead><tr><th>Split</th><th>Path</th></tr></thead>
+      <tbody>
+      <tr><td>train</td><td><code>intermediate/tasks/converted_within_90_days/train.parquet</code></td></tr>
+      <tr><td>validation</td><td><code>intermediate/tasks/converted_within_90_days/valid.parquet</code></td></tr>
+      <tr><td>test</td><td><code>intermediate/tasks/converted_within_90_days/test.parquet</code></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="config" open>
+    <summary class="config__name"><code>advanced</code> <span class="config__count">(3 splits)</span></summary>
+    <table class="config__table">
+      <thead><tr><th>Split</th><th>Path</th></tr></thead>
+      <tbody>
+      <tr><td>train</td><td><code>advanced/tasks/converted_within_90_days/train.parquet</code></td></tr>
+      <tr><td>validation</td><td><code>advanced/tasks/converted_within_90_days/valid.parquet</code></td></tr>
+      <tr><td>test</td><td><code>advanced/tasks/converted_within_90_days/test.parquet</code></td></tr>
+      </tbody>
+    </table>
+  </details>
+</section>
+<section class="readme">
+<h1>LeadForge: Synthetic B2B Lead Scoring Dataset (<code>leadforge-lead-scoring-v1</code>)</h1>
+<p>A relational, reproducible, three-tier synthetic CRM dataset family for
+teaching lead scoring at scale. Generated by
+<a href="https://github.com/leadforge-dev/leadforge">leadforge</a>, an
+open-source Python framework for synthetic CRM/funnel data. The
+framework version is decoupled from the dataset version: the package
+stays at <code>1.x</code>; the dataset is published under the explicit <code>…-v1</code>
+tag.</p>
+<h2>Why lead scoring matters in 2024–2026</h2>
+<p>Mid-market SaaS vendors entered 2024–2026 with growth slowing and
+customer-acquisition costs rising[^macro], so predicting <em>which</em> leads
+convert within a fixed window has moved from a marketing nicety to a
+survival skill. This dataset teaches that skill on a relational
+substrate, with the realistic confusions (snapshot-window discipline,
+leakage traps, channel signal weaker than vendor blogs imply) that
+students will hit when they finally get hands on real CRM data.</p>
+<p>[^macro]: Macroeconomic framing summarised in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/external_review/summaries/gemini_v2_summary.md"><code>docs/external_review/summaries/gemini_v2_summary.md</code></a>
+(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio
+rose materially in 2024).</p>
+<h2>What's inside</h2>
+<pre><code>.
+├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── lead_scoring.csv              # flat convenience CSV (all splits)
+│   ├── tables/*.parquet              # 7 snapshot-safe relational tables
+│   └── tasks/converted_within_90_days/{train,valid,test}.parquet
+├── README.md                         # this file (HF dataset card)
+├── dataset-cover-image.png           # dataset thumbnail
+└── LICENSE
+</code></pre>
+<p><code>student_public</code> bundles ship the snapshot-safe relational view;
+<code>research_instructor</code> companions ship the full-horizon view plus the
+hidden causal structure (DAG, latent registry, mechanism summary)
+under <code>metadata/</code>. The full layout is documented in each bundle's
+<code>manifest.json</code>.</p>
+<h2>Quick start</h2>
+<pre><code class="language-python"># Flat CSV
+df = pd.read_csv(&quot;intermediate/lead_scoring.csv&quot;)
+
+# Parquet task splits (recommended)
+train = pd.read_parquet(&quot;intermediate/tasks/converted_within_90_days/train.parquet&quot;)
+test  = pd.read_parquet(&quot;intermediate/tasks/converted_within_90_days/test.parquet&quot;)
+
+# Relational tables (feature engineering — example)
+leads   = pd.read_parquet(&quot;intermediate/tables/leads.parquet&quot;)
+touches = pd.read_parquet(&quot;intermediate/tables/touches.parquet&quot;)
+my_touch_count = (
+    touches.groupby(&quot;lead_id&quot;).size().rename(&quot;my_touch_count&quot;).reset_index()
+)
+features = leads.merge(my_touch_count, on=&quot;lead_id&quot;, how=&quot;left&quot;)
+
+# Reproduce from source
+# pip install leadforge
+# leadforge generate --recipe b2b_saas_procurement_v1 --seed 42 \
+#                    --mode student_public --difficulty intermediate --out my_bundle
+</code></pre>
+<p>The label <code>converted_within_90_days</code> resolves over a 90-day window;
+engagement features (<code>touch_count</code>, <code>session_count</code>, etc.) are
+computed strictly over events on days <code>[0, 30]</code>. The deliberate
+exception is <code>total_touches_all</code>, the leakage trap — flagged
+<code>leakage_risk=True</code> in <code>feature_dictionary.csv</code>. Drop it from your
+feature set unless you're demonstrating leakage detection.</p>
+<h2>Dataset summary</h2>
+<table>
+<thead>
+<tr>
+<th></th>
+<th>Intro</th>
+<th>Intermediate</th>
+<th>Advanced</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Leads</td>
+<td>5,000</td>
+<td>5,000</td>
+<td>5,000</td>
+</tr>
+<tr>
+<td>Accounts</td>
+<td>1,500</td>
+<td>1,500</td>
+<td>1,500</td>
+</tr>
+<tr>
+<td>Contacts</td>
+<td>4,200</td>
+<td>4,200</td>
+<td>4,200</td>
+</tr>
+<tr>
+<td>Snapshot columns</td>
+<td>32 / 34*</td>
+<td>32 / 34*</td>
+<td>32 / 34*</td>
+</tr>
+<tr>
+<td>Target</td>
+<td><code>converted_within_90_days</code></td>
+<td><code>converted_within_90_days</code></td>
+<td><code>converted_within_90_days</code></td>
+</tr>
+<tr>
+<td>Conversion rate (acceptance band, gate G7.*)</td>
+<td>24–61%</td>
+<td>12–31%</td>
+<td>4–12%</td>
+</tr>
+<tr>
+<td>Conversion rate (observed median, seeds 42–46)</td>
+<td>42.67%</td>
+<td>21.60%</td>
+<td>8.40%</td>
+</tr>
+<tr>
+<td>Signal strength</td>
+<td>0.90</td>
+<td>0.70</td>
+<td>0.50</td>
+</tr>
+<tr>
+<td>Noise scale</td>
+<td>0.10</td>
+<td>0.30</td>
+<td>0.55</td>
+</tr>
+<tr>
+<td>Missing rate</td>
+<td>2%</td>
+<td>8%</td>
+<td>18%</td>
+</tr>
+</tbody>
+</table>
+<p>* <code>student_public</code> / <code>research_instructor</code>. Difficulty is modulated
+by the simulation engine — signal strength on latent-trait weights,
+Gaussian noise on float features, MCAR missingness, outlier rate —
+not post-hoc label flipping. The acceptance band is the recipe
+gate's tolerance window (<code>v1_acceptance_gates_bands.yaml</code> G7.*),
+not the achievable range — observed five-seed spreads sit
+comfortably inside the band.</p>
+<h2>The scenario</h2>
+<p><strong>Veridian Technologies</strong> is a fictional Series B startup (Austin, US)
+selling <strong>Veridian Procure</strong>, a procurement / AP automation SaaS, to
+mid-market firms (200–2,000 employees) in the US and UK. The funnel
+runs through inbound marketing (45%), SDR outbound (35%), and
+partner referrals (20%); four personas drive deals (VP Finance, AP
+Manager, IT Director, Procurement Manager). <strong>Task:</strong> predict whether
+a lead converts (<code>closed_won</code>) within 90 days. ACV bands are
+$18k–$120k. See
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/generation_method.md"><code>docs/release/generation_method.md</code></a>
+for the full DGP, and the deeper &quot;what's modelled / approximate / not
+modelled&quot; breakdown that this README only summarises.</p>
+<h2>Public vs instructor: what's redacted</h2>
+<p>Filtering happens <strong>during rendering</strong>, not during simulation. The
+redaction contract is single-sourced in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/leadforge/validation/leakage_probes.py"><code>leadforge/validation/leakage_probes.py</code></a>;
+the snapshot-safe writer and the validator import the same constants,
+so they cannot drift apart.</p>
+<table>
+<thead>
+<tr>
+<th>Source-of-truth constant</th>
+<th>Public bundle treatment</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>BANNED_LEAD_COLUMNS = (&quot;converted_within_90_days&quot;, &quot;conversion_timestamp&quot;)</code></td>
+<td>Dropped from <code>tables/leads.parquet</code></td>
+</tr>
+<tr>
+<td><code>BANNED_OPP_COLUMNS = (&quot;close_outcome&quot;, &quot;closed_at&quot;)</code></td>
+<td>Dropped from <code>tables/opportunities.parquet</code></td>
+</tr>
+<tr>
+<td><code>BANNED_TABLES = (&quot;customers&quot;, &quot;subscriptions&quot;)</code></td>
+<td>Omitted from public bundles</td>
+</tr>
+<tr>
+<td><code>SNAPSHOT_FILTERED_TABLES</code> (touches, sessions, sales_activities, opportunities)</td>
+<td>Filtered per-lead by <code>lead_created_at + snapshot_day</code></td>
+</tr>
+<tr>
+<td>Snapshot redaction (<code>current_stage</code>, <code>is_sql</code>)</td>
+<td>Stripped from <code>tasks/</code> splits and <code>tables/leads.parquet</code></td>
+</tr>
+<tr>
+<td><code>total_touches_all</code> (deliberate trap)</td>
+<td><strong>Retained in both modes</strong>; flagged <code>leakage_risk=True</code></td>
+</tr>
+</tbody>
+</table>
+<p>Each bundle's <code>manifest.json</code> records <code>relational_snapshot_safe</code>,
+<code>redacted_columns</code>, and <code>snapshot_day</code>, so the bundle is
+self-describing.</p>
+<h2>Calibration</h2>
+<p>Every realism / calibration / difficulty claim in this README is
+backed by
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/release/validation/validation_report.md"><code>validation/validation_report.md</code></a>,
+regenerated by
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/scripts/validate_release_candidate.py"><code>scripts/validate_release_candidate.py</code></a>
+with bands declared in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/v1_acceptance_gates_bands.yaml"><code>docs/release/v1_acceptance_gates_bands.yaml</code></a>.
+Headline cross-seed medians (seeds 42–46):</p>
+<table>
+<thead>
+<tr>
+<th>Tier</th>
+<th>LR AUC</th>
+<th>AP</th>
+<th>P@100</th>
+<th>Brier</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>intro</td>
+<td>0.879</td>
+<td>0.761</td>
+<td>0.80</td>
+<td>0.130</td>
+</tr>
+<tr>
+<td>intermediate</td>
+<td>0.886</td>
+<td>0.575</td>
+<td>0.59</td>
+<td>0.110</td>
+</tr>
+<tr>
+<td>advanced</td>
+<td>0.886</td>
+<td>0.351</td>
+<td>0.34</td>
+<td>0.061</td>
+</tr>
+</tbody>
+</table>
+<p>AP, P@100, conversion-rate, and lift orderings hold across the
+intended difficulty axis (intro &gt; intermediate &gt; advanced).</p>
+<h2>Intended uses</h2>
+<ul>
+<li>Teaching baseline lead-scoring on a flat snapshot.</li>
+<li>Teaching relational feature engineering against snapshot-safe tables.</li>
+<li>Teaching leakage detection (the <code>total_touches_all</code> trap is
+designed to be discoverable).</li>
+<li>Teaching calibration, lift, P@K, value-aware ranking
+(<code>expected_acv × P(convert)</code>), and cohort-shift evaluation.</li>
+<li>Comparing model families under a controlled DGP.</li>
+</ul>
+<h2>Out-of-scope uses</h2>
+<ul>
+<li><strong>Production lead scoring.</strong> The company, product, and customers are
+fictional.</li>
+<li><strong>Vendor benchmarking / paper baselines.</strong> Difficulty tiers are
+calibrated for pedagogy, not cross-paper comparability.</li>
+<li><strong>Causal-inference research that requires recovery of the true DGP.</strong>
+The instructor companion exposes the hidden graph for teaching, not
+designed counterfactuals.</li>
+<li><strong>Demographic / fairness research.</strong> v1 does not model protected
+attributes.</li>
+</ul>
+<h2>Known limitations</h2>
+<ul>
+<li><strong>Difficulty signal on raw AUC is flat.</strong> LR AUC is ~0.88 across
+every tier. Difficulty is visible in AP, P@K, Brier, and value
+capture. Treat AUC as a sanity check, not a difficulty signal.</li>
+<li><strong>GBM does not consistently beat LR (gate G7.4.4).</strong> GBM−LR AUC delta
+is slightly negative in every tier (intro −0.0045, intermediate
+−0.0072, advanced −0.0133); v1's snapshot is dominated by linear
+features. v2 will inject non-linear interactions in the simulator.</li>
+<li><strong>Channel signal is weak.</strong> Per
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/channel_signal_audit.md"><code>docs/release/channel_signal_audit.md</code></a>,
+out-of-sample univariate AUC of <code>lead_source</code> is ≈0.50–0.52 across
+all tiers and the per-channel rate spread is ≤0.05. The simulator
+does not encode channel-conditional probabilities; channel-conditional
+encoding is post-v1 work.</li>
+<li><strong>Cohort-shift degradation is small.</strong> v1 has no time-of-year drift
+baked in; the cohort-shift gate (G6.4) is informational and will
+bite in v2.</li>
+</ul>
+<h2>Composition</h2>
+<ul>
+<li><strong>Entities.</strong> Accounts, contacts, leads, touches, sessions,
+sales_activities, opportunities (public); plus customers and
+subscriptions (instructor only). Per-row counts per bundle live in
+<code>manifest.json</code>.</li>
+<li><strong>Features.</strong> 32 public columns grouped by analytical role in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/feature_dictionary.md"><code>docs/release/feature_dictionary.md</code></a>;
+the per-bundle <code>feature_dictionary.csv</code> is the authoritative
+machine-readable spec.</li>
+<li><strong>Label.</strong> <code>converted_within_90_days</code> (boolean), event-derived from
+the simulator. Never sampled directly.</li>
+<li><strong>Splits.</strong> 70/15/15 train/valid/test, deterministic given seed;
+recorded in <code>tasks/converted_within_90_days/task_manifest.json</code>.
+<strong>Group-leakage warning:</strong> the splitter is keyed on <code>lead_id</code> only,
+not on <code>account_id</code> or <code>contact_id</code>. On the as-shipped intermediate
+bundle, <strong>518 of 557 test accounts (≈93 %) also appear in train</strong>;
+the contact-level overlap is similar in magnitude. A flat baseline
+trained on the random split rides account-level signal across the
+split boundary. For a generalisation-faithful number, retrain with
+<code>GroupKFold(account_id)</code> (or <code>contact_id</code>) and report both — see
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/break_me_guide.md"><code>break_me_guide.md</code></a> §5 for the
+detection recipe.</li>
+<li><strong>Provenance.</strong> Recipe <code>b2b_saas_procurement_v1</code>, seed 42, package
+version stamped in <code>manifest.json</code>.</li>
+</ul>
+<h2>Maintenance, adversarial framing, license</h2>
+<p>We <em>want</em> the dataset to be broken. The
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/break_me_guide.md">break-me guide</a> catalogues
+nine adversarial patterns to look for (leakage, split
+contamination, ranking inversions, calibration drift) with
+worked-example pointers back into the notebooks. Issue
+templates ship under <code>.github/ISSUE_TEMPLATE/</code>: a
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/.github/ISSUE_TEMPLATE/dataset_breakage_report.yml">breakage report</a>
+form for findings on the bundle itself, and a
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/.github/ISSUE_TEMPLATE/realism_feedback.yml">realism feedback</a>
+form for distributional critiques. Accepted findings are
+logged in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/v2_decision_log.md"><code>docs/release/v2_decision_log.md</code></a>.
+File issues at
+<a href="https://github.com/leadforge-dev/leadforge">leadforge-dev/leadforge</a>;
+PRs welcome.</p>
+<table>
+<thead>
+<tr>
+<th>Field</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Generator</td>
+<td>leadforge <code>1.0.0+</code></td>
+</tr>
+<tr>
+<td>Recipe</td>
+<td><code>b2b_saas_procurement_v1</code></td>
+</tr>
+<tr>
+<td>Canonical seed</td>
+<td>42 (cross-seed sweep: 42–46)</td>
+</tr>
+<tr>
+<td>Bundle schema version</td>
+<td>5</td>
+</tr>
+<tr>
+<td>Format</td>
+<td>Parquet (canonical) + CSV (convenience)</td>
+</tr>
+<tr>
+<td>License</td>
+<td>MIT — see <a href="LICENSE">LICENSE</a></td>
+</tr>
+</tbody>
+</table>
+<p>Verify integrity with <code>leadforge validate &lt;bundle_dir&gt;</code>; every file
+is hashed in <code>manifest.json</code>.</p>
+</section>
+<footer class="dataset-footer">
+  <div class="dataset-footer__license">License: mit</div>
+  <div class="dataset-footer__variant">Variant: <code>public</code></div>
+  <div class="dataset-footer__note">Local Hugging Face publication-readiness preview rendered by scripts/preview_hf_page.py — not the live dataset page.</div>
+</footer>
+</main>
+</body>
+</html>
diff --git a/release/_preview_committed/kaggle.html b/release/_preview_committed/kaggle.html
new file mode 100644
index 0000000..d0ee29a
--- /dev/null
+++ b/release/_preview_committed/kaggle.html
@@ -0,0 +1,1302 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <title>Kaggle preview — LeadForge: Synthetic B2B Lead Scoring (v1)</title>
+  <style>:root { --bg:#fff; --fg:#202124; --muted:#5f6368; --accent:#20beff; --border:#e0e0e0; --pill-bg:#f1f3f4; }
+body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; color: var(--fg); background: var(--bg); margin: 0; padding: 0; line-height: 1.5; }
+.container { max-width: 1100px; margin: 0 auto; padding: 24px 32px; }
+.dataset-header { border-bottom: 1px solid var(--border); padding-bottom: 16px; margin-bottom: 24px; }
+.dataset-header__id { color: var(--muted); font-size: 0.85em; font-family: monospace; margin-bottom: 4px; }
+.dataset-header__title { font-size: 1.8em; margin: 0 0 4px 0; }
+.dataset-header__subtitle { color: var(--muted); margin: 0 0 12px 0; }
+.dataset-header__pills { list-style: none; padding: 0; margin: 0; display: flex; flex-wrap: wrap; gap: 8px; }
+.pill { background: var(--pill-bg); border-radius: 12px; padding: 4px 12px; font-size: 0.85em; color: var(--fg); }
+.cover { margin: 0 0 24px 0; border: 1px solid var(--border); border-radius: 4px; overflow: hidden; }
+.cover__image { display: block; max-width: 100%; height: auto; }
+.section__heading { font-size: 1.3em; border-bottom: 2px solid var(--accent); padding-bottom: 4px; margin-top: 32px; }
+.section__count { color: var(--muted); font-size: 0.7em; font-weight: normal; }
+.tier, .schema { border: 1px solid var(--border); border-radius: 4px; padding: 8px 12px; margin: 8px 0; }
+.tier__name, .schema__path { cursor: pointer; font-weight: 600; }
+.tier__count, .schema__count { color: var(--muted); font-weight: normal; font-size: 0.85em; }
+.tier__files { list-style: none; padding: 8px 0 0 0; margin: 0; }
+.file { display: flex; gap: 12px; padding: 4px 0; border-bottom: 1px dotted var(--border); }
+.file:last-child { border-bottom: none; }
+.file__path { color: var(--accent); flex-shrink: 0; }
+.file__desc { color: var(--muted); font-size: 0.9em; }
+.schema__table { width: 100%; border-collapse: collapse; margin-top: 8px; font-size: 0.9em; }
+.schema__table th, .schema__table td { text-align: left; padding: 6px 8px; border-bottom: 1px solid var(--border); vertical-align: top; }
+.schema__table th { background: var(--pill-bg); font-weight: 600; }
+.col__name code { background: none; }
+.col__type { color: var(--muted); font-family: monospace; }
+.description { margin: 24px 0; }
+.description code { background: var(--pill-bg); padding: 1px 4px; border-radius: 2px; font-size: 0.9em; }
+.description pre { background: var(--pill-bg); padding: 12px; border-radius: 4px; overflow-x: auto; }
+.description pre code { background: none; padding: 0; }
+.description table { border-collapse: collapse; margin: 12px 0; }
+.description th, .description td { border: 1px solid var(--border); padding: 6px 10px; text-align: left; }
+.description blockquote { border-left: 3px solid var(--accent); padding-left: 12px; color: var(--muted); margin: 12px 0; }
+.sources__list { padding-left: 20px; }
+.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
+.dataset-footer__keywords { margin-bottom: 8px; }
+.chip { display: inline-block; background: var(--pill-bg); border-radius: 12px; padding: 2px 10px; margin: 2px; font-size: 0.85em; }
+.dataset-footer__note { font-style: italic; margin-top: 8px; }
+</style>
+</head>
+<body>
+<main class="container">
+<header class="dataset-header">
+  <div class="dataset-header__id">leadforge/leadforge-lead-scoring-v1</div>
+  <h1 class="dataset-header__title">LeadForge: Synthetic B2B Lead Scoring (v1)</h1>
+  <p class="dataset-header__subtitle">Three-tier synthetic CRM funnel for leakage-aware lead scoring</p>
+  <ul class="dataset-header__pills">
+    <li class="pill pill--license">License: MIT</li>
+    <li class="pill pill--frequency">Updates: never</li>
+  </ul>
+</header>
+<section class="cover">
+  <img class="cover__image" src="dataset-cover-image.png" alt="Dataset cover image">
+</section>
+<section class="description">
+<h1>LeadForge: Synthetic B2B Lead Scoring Dataset (<code>leadforge-lead-scoring-v1</code>)</h1>
+<p>A relational, reproducible, three-tier synthetic CRM dataset family for
+teaching lead scoring at scale. Generated by
+<a href="https://github.com/leadforge-dev/leadforge">leadforge</a>, an
+open-source Python framework for synthetic CRM/funnel data. The
+framework version is decoupled from the dataset version: the package
+stays at <code>1.x</code>; the dataset is published under the explicit <code>…-v1</code>
+tag.</p>
+<h2>Why lead scoring matters in 2024–2026</h2>
+<p>Mid-market SaaS vendors entered 2024–2026 with growth slowing and
+customer-acquisition costs rising[^macro], so predicting <em>which</em> leads
+convert within a fixed window has moved from a marketing nicety to a
+survival skill. This dataset teaches that skill on a relational
+substrate, with the realistic confusions (snapshot-window discipline,
+leakage traps, channel signal weaker than vendor blogs imply) that
+students will hit when they finally get hands on real CRM data.</p>
+<p>[^macro]: Macroeconomic framing summarised in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/external_review/summaries/gemini_v2_summary.md"><code>docs/external_review/summaries/gemini_v2_summary.md</code></a>
+(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio
+rose materially in 2024).</p>
+<h2>What's inside</h2>
+<pre><code>.
+├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── lead_scoring.csv              # flat convenience CSV (all splits)
+│   ├── tables/*.parquet              # 7 snapshot-safe relational tables
+│   └── tasks/converted_within_90_days/{train,valid,test}.parquet
+├── dataset-metadata.json             # Kaggle dataset metadata
+├── dataset-cover-image.png           # Kaggle cover image
+├── README.md                         # Kaggle package README
+└── LICENSE
+</code></pre>
+<p><code>student_public</code> bundles ship the snapshot-safe relational view;
+<code>research_instructor</code> companions ship the full-horizon view plus the
+hidden causal structure (DAG, latent registry, mechanism summary)
+under <code>metadata/</code>. The full layout is documented in each bundle's
+<code>manifest.json</code>.</p>
+<h2>Quick start</h2>
+<pre><code class="language-python"># Flat CSV
+df = pd.read_csv(&quot;intermediate/lead_scoring.csv&quot;)
+
+# Parquet task splits (recommended)
+train = pd.read_parquet(&quot;intermediate/tasks/converted_within_90_days/train.parquet&quot;)
+test  = pd.read_parquet(&quot;intermediate/tasks/converted_within_90_days/test.parquet&quot;)
+
+# Relational tables (feature engineering — example)
+leads   = pd.read_parquet(&quot;intermediate/tables/leads.parquet&quot;)
+touches = pd.read_parquet(&quot;intermediate/tables/touches.parquet&quot;)
+my_touch_count = (
+    touches.groupby(&quot;lead_id&quot;).size().rename(&quot;my_touch_count&quot;).reset_index()
+)
+features = leads.merge(my_touch_count, on=&quot;lead_id&quot;, how=&quot;left&quot;)
+
+# Reproduce from source
+# pip install leadforge
+# leadforge generate --recipe b2b_saas_procurement_v1 --seed 42 \
+#                    --mode student_public --difficulty intermediate --out my_bundle
+</code></pre>
+<p>The label <code>converted_within_90_days</code> resolves over a 90-day window;
+engagement features (<code>touch_count</code>, <code>session_count</code>, etc.) are
+computed strictly over events on days <code>[0, 30]</code>. The deliberate
+exception is <code>total_touches_all</code>, the leakage trap — flagged
+<code>leakage_risk=True</code> in <code>feature_dictionary.csv</code>. Drop it from your
+feature set unless you're demonstrating leakage detection.</p>
+<h2>Dataset summary</h2>
+<table>
+<thead>
+<tr>
+<th></th>
+<th>Intro</th>
+<th>Intermediate</th>
+<th>Advanced</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Leads</td>
+<td>5,000</td>
+<td>5,000</td>
+<td>5,000</td>
+</tr>
+<tr>
+<td>Accounts</td>
+<td>1,500</td>
+<td>1,500</td>
+<td>1,500</td>
+</tr>
+<tr>
+<td>Contacts</td>
+<td>4,200</td>
+<td>4,200</td>
+<td>4,200</td>
+</tr>
+<tr>
+<td>Snapshot columns</td>
+<td>32 / 34*</td>
+<td>32 / 34*</td>
+<td>32 / 34*</td>
+</tr>
+<tr>
+<td>Target</td>
+<td><code>converted_within_90_days</code></td>
+<td><code>converted_within_90_days</code></td>
+<td><code>converted_within_90_days</code></td>
+</tr>
+<tr>
+<td>Conversion rate (acceptance band, gate G7.*)</td>
+<td>24–61%</td>
+<td>12–31%</td>
+<td>4–12%</td>
+</tr>
+<tr>
+<td>Conversion rate (observed median, seeds 42–46)</td>
+<td>42.67%</td>
+<td>21.60%</td>
+<td>8.40%</td>
+</tr>
+<tr>
+<td>Signal strength</td>
+<td>0.90</td>
+<td>0.70</td>
+<td>0.50</td>
+</tr>
+<tr>
+<td>Noise scale</td>
+<td>0.10</td>
+<td>0.30</td>
+<td>0.55</td>
+</tr>
+<tr>
+<td>Missing rate</td>
+<td>2%</td>
+<td>8%</td>
+<td>18%</td>
+</tr>
+</tbody>
+</table>
+<p>* <code>student_public</code> / <code>research_instructor</code>. Difficulty is modulated
+by the simulation engine — signal strength on latent-trait weights,
+Gaussian noise on float features, MCAR missingness, outlier rate —
+not post-hoc label flipping. The acceptance band is the recipe
+gate's tolerance window (<code>v1_acceptance_gates_bands.yaml</code> G7.*),
+not the achievable range — observed five-seed spreads sit
+comfortably inside the band.</p>
+<h2>The scenario</h2>
+<p><strong>Veridian Technologies</strong> is a fictional Series B startup (Austin, US)
+selling <strong>Veridian Procure</strong>, a procurement / AP automation SaaS, to
+mid-market firms (200–2,000 employees) in the US and UK. The funnel
+runs through inbound marketing (45%), SDR outbound (35%), and
+partner referrals (20%); four personas drive deals (VP Finance, AP
+Manager, IT Director, Procurement Manager). <strong>Task:</strong> predict whether
+a lead converts (<code>closed_won</code>) within 90 days. ACV bands are
+$18k–$120k. See
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/generation_method.md"><code>docs/release/generation_method.md</code></a>
+for the full DGP, and the deeper &quot;what's modelled / approximate / not
+modelled&quot; breakdown that this README only summarises.</p>
+<h2>Public vs instructor: what's redacted</h2>
+<p>Filtering happens <strong>during rendering</strong>, not during simulation. The
+redaction contract is single-sourced in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/leadforge/validation/leakage_probes.py"><code>leadforge/validation/leakage_probes.py</code></a>;
+the snapshot-safe writer and the validator import the same constants,
+so they cannot drift apart.</p>
+<table>
+<thead>
+<tr>
+<th>Source-of-truth constant</th>
+<th>Public bundle treatment</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>BANNED_LEAD_COLUMNS = (&quot;converted_within_90_days&quot;, &quot;conversion_timestamp&quot;)</code></td>
+<td>Dropped from <code>tables/leads.parquet</code></td>
+</tr>
+<tr>
+<td><code>BANNED_OPP_COLUMNS = (&quot;close_outcome&quot;, &quot;closed_at&quot;)</code></td>
+<td>Dropped from <code>tables/opportunities.parquet</code></td>
+</tr>
+<tr>
+<td><code>BANNED_TABLES = (&quot;customers&quot;, &quot;subscriptions&quot;)</code></td>
+<td>Omitted from public bundles</td>
+</tr>
+<tr>
+<td><code>SNAPSHOT_FILTERED_TABLES</code> (touches, sessions, sales_activities, opportunities)</td>
+<td>Filtered per-lead by <code>lead_created_at + snapshot_day</code></td>
+</tr>
+<tr>
+<td>Snapshot redaction (<code>current_stage</code>, <code>is_sql</code>)</td>
+<td>Stripped from <code>tasks/</code> splits and <code>tables/leads.parquet</code></td>
+</tr>
+<tr>
+<td><code>total_touches_all</code> (deliberate trap)</td>
+<td><strong>Retained in both modes</strong>; flagged <code>leakage_risk=True</code></td>
+</tr>
+</tbody>
+</table>
+<p>Each bundle's <code>manifest.json</code> records <code>relational_snapshot_safe</code>,
+<code>redacted_columns</code>, and <code>snapshot_day</code>, so the bundle is
+self-describing.</p>
+<h2>Calibration</h2>
+<p>Every realism / calibration / difficulty claim in this README is
+backed by
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/release/validation/validation_report.md"><code>validation/validation_report.md</code></a>,
+regenerated by
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/scripts/validate_release_candidate.py"><code>scripts/validate_release_candidate.py</code></a>
+with bands declared in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/v1_acceptance_gates_bands.yaml"><code>docs/release/v1_acceptance_gates_bands.yaml</code></a>.
+Headline cross-seed medians (seeds 42–46):</p>
+<table>
+<thead>
+<tr>
+<th>Tier</th>
+<th>LR AUC</th>
+<th>AP</th>
+<th>P@100</th>
+<th>Brier</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>intro</td>
+<td>0.879</td>
+<td>0.761</td>
+<td>0.80</td>
+<td>0.130</td>
+</tr>
+<tr>
+<td>intermediate</td>
+<td>0.886</td>
+<td>0.575</td>
+<td>0.59</td>
+<td>0.110</td>
+</tr>
+<tr>
+<td>advanced</td>
+<td>0.886</td>
+<td>0.351</td>
+<td>0.34</td>
+<td>0.061</td>
+</tr>
+</tbody>
+</table>
+<p>AP, P@100, conversion-rate, and lift orderings hold across the
+intended difficulty axis (intro &gt; intermediate &gt; advanced).</p>
+<h2>Intended uses</h2>
+<ul>
+<li>Teaching baseline lead-scoring on a flat snapshot.</li>
+<li>Teaching relational feature engineering against snapshot-safe tables.</li>
+<li>Teaching leakage detection (the <code>total_touches_all</code> trap is
+designed to be discoverable).</li>
+<li>Teaching calibration, lift, P@K, value-aware ranking
+(<code>expected_acv × P(convert)</code>), and cohort-shift evaluation.</li>
+<li>Comparing model families under a controlled DGP.</li>
+</ul>
+<h2>Out-of-scope uses</h2>
+<ul>
+<li><strong>Production lead scoring.</strong> The company, product, and customers are
+fictional.</li>
+<li><strong>Vendor benchmarking / paper baselines.</strong> Difficulty tiers are
+calibrated for pedagogy, not cross-paper comparability.</li>
+<li><strong>Causal-inference research that requires recovery of the true DGP.</strong>
+The instructor companion exposes the hidden graph for teaching, not
+designed counterfactuals.</li>
+<li><strong>Demographic / fairness research.</strong> v1 does not model protected
+attributes.</li>
+</ul>
+<h2>Known limitations</h2>
+<ul>
+<li><strong>Difficulty signal on raw AUC is flat.</strong> LR AUC is ~0.88 across
+every tier. Difficulty is visible in AP, P@K, Brier, and value
+capture. Treat AUC as a sanity check, not a difficulty signal.</li>
+<li><strong>GBM does not consistently beat LR (gate G7.4.4).</strong> GBM−LR AUC delta
+is slightly negative in every tier (intro −0.0045, intermediate
+−0.0072, advanced −0.0133); v1's snapshot is dominated by linear
+features. v2 will inject non-linear interactions in the simulator.</li>
+<li><strong>Channel signal is weak.</strong> Per
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/channel_signal_audit.md"><code>docs/release/channel_signal_audit.md</code></a>,
+out-of-sample univariate AUC of <code>lead_source</code> is ≈0.50–0.52 across
+all tiers and the per-channel rate spread is ≤0.05. The simulator
+does not encode channel-conditional probabilities; channel-conditional
+encoding is post-v1 work.</li>
+<li><strong>Cohort-shift degradation is small.</strong> v1 has no time-of-year drift
+baked in; the cohort-shift gate (G6.4) is informational and will
+bite in v2.</li>
+</ul>
+<h2>Composition</h2>
+<ul>
+<li><strong>Entities.</strong> Accounts, contacts, leads, touches, sessions,
+sales_activities, opportunities (public); plus customers and
+subscriptions (instructor only). Per-row counts per bundle live in
+<code>manifest.json</code>.</li>
+<li><strong>Features.</strong> 32 public columns grouped by analytical role in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/feature_dictionary.md"><code>docs/release/feature_dictionary.md</code></a>;
+the per-bundle <code>feature_dictionary.csv</code> is the authoritative
+machine-readable spec.</li>
+<li><strong>Label.</strong> <code>converted_within_90_days</code> (boolean), event-derived from
+the simulator. Never sampled directly.</li>
+<li><strong>Splits.</strong> 70/15/15 train/valid/test, deterministic given seed;
+recorded in <code>tasks/converted_within_90_days/task_manifest.json</code>.
+<strong>Group-leakage warning:</strong> the splitter is keyed on <code>lead_id</code> only,
+not on <code>account_id</code> or <code>contact_id</code>. On the as-shipped intermediate
+bundle, <strong>518 of 557 test accounts (≈93 %) also appear in train</strong>;
+the contact-level overlap is similar in magnitude. A flat baseline
+trained on the random split rides account-level signal across the
+split boundary. For a generalisation-faithful number, retrain with
+<code>GroupKFold(account_id)</code> (or <code>contact_id</code>) and report both — see
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/break_me_guide.md"><code>break_me_guide.md</code></a> §5 for the
+detection recipe.</li>
+<li><strong>Provenance.</strong> Recipe <code>b2b_saas_procurement_v1</code>, seed 42, package
+version stamped in <code>manifest.json</code>.</li>
+</ul>
+<h2>Maintenance, adversarial framing, license</h2>
+<p>We <em>want</em> the dataset to be broken. The
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/break_me_guide.md">break-me guide</a> catalogues
+nine adversarial patterns to look for (leakage, split
+contamination, ranking inversions, calibration drift) with
+worked-example pointers back into the notebooks. Issue
+templates ship under <code>.github/ISSUE_TEMPLATE/</code>: a
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/.github/ISSUE_TEMPLATE/dataset_breakage_report.yml">breakage report</a>
+form for findings on the bundle itself, and a
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/.github/ISSUE_TEMPLATE/realism_feedback.yml">realism feedback</a>
+form for distributional critiques. Accepted findings are
+logged in
+<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/release/v2_decision_log.md"><code>docs/release/v2_decision_log.md</code></a>.
+File issues at
+<a href="https://github.com/leadforge-dev/leadforge">leadforge-dev/leadforge</a>;
+PRs welcome.</p>
+<table>
+<thead>
+<tr>
+<th>Field</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Generator</td>
+<td>leadforge <code>1.0.0+</code></td>
+</tr>
+<tr>
+<td>Recipe</td>
+<td><code>b2b_saas_procurement_v1</code></td>
+</tr>
+<tr>
+<td>Canonical seed</td>
+<td>42 (cross-seed sweep: 42–46)</td>
+</tr>
+<tr>
+<td>Bundle schema version</td>
+<td>5</td>
+</tr>
+<tr>
+<td>Format</td>
+<td>Parquet (canonical) + CSV (convenience)</td>
+</tr>
+<tr>
+<td>License</td>
+<td>MIT — see <a href="LICENSE">LICENSE</a></td>
+</tr>
+</tbody>
+</table>
+<p>Verify integrity with <code>leadforge validate &lt;bundle_dir&gt;</code>; every file
+is hashed in <code>manifest.json</code>.</p>
+</section>
+<section class="files">
+  <h2 class="section__heading">Data Files <span class="section__count">(42 total)</span></h2>
+  <details class="tier" open>
+    <summary class="tier__name">intro/ <span class="tier__count">(14 files)</span></summary>
+    <ul class="tier__files">
+    <li class="file"><code class="file__path">intro/lead_scoring.csv</code><span class="file__desc">Intro tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.</span></li>
+    <li class="file"><code class="file__path">intro/feature_dictionary.csv</code><span class="file__desc">Intro tier feature dictionary (canonical column spec).</span></li>
+    <li class="file"><code class="file__path">intro/tasks/converted_within_90_days/train.parquet</code><span class="file__desc">Intro tier train split for `converted_within_90_days` (3,500 rows).</span></li>
+    <li class="file"><code class="file__path">intro/tasks/converted_within_90_days/valid.parquet</code><span class="file__desc">Intro tier valid split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">intro/tasks/converted_within_90_days/test.parquet</code><span class="file__desc">Intro tier test split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">intro/tables/accounts.parquet</code><span class="file__desc">Intro tier `accounts` relational table (1,500 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/contacts.parquet</code><span class="file__desc">Intro tier `contacts` relational table (4,200 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/leads.parquet</code><span class="file__desc">Intro tier `leads` relational table (5,000 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/touches.parquet</code><span class="file__desc">Intro tier `touches` relational table (38,561 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/sessions.parquet</code><span class="file__desc">Intro tier `sessions` relational table (10,171 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/sales_activities.parquet</code><span class="file__desc">Intro tier `sales_activities` relational table (21,358 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/tables/opportunities.parquet</code><span class="file__desc">Intro tier `opportunities` relational table (4,426 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intro/dataset_card.md</code><span class="file__desc">Intro tier auto-rendered dataset card.</span></li>
+    <li class="file"><code class="file__path">intro/manifest.json</code><span class="file__desc">Intro tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).</span></li>
+    </ul>
+  </details>
+  <details class="tier" open>
+    <summary class="tier__name">intermediate/ <span class="tier__count">(14 files)</span></summary>
+    <ul class="tier__files">
+    <li class="file"><code class="file__path">intermediate/lead_scoring.csv</code><span class="file__desc">Intermediate tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.</span></li>
+    <li class="file"><code class="file__path">intermediate/feature_dictionary.csv</code><span class="file__desc">Intermediate tier feature dictionary (canonical column spec).</span></li>
+    <li class="file"><code class="file__path">intermediate/tasks/converted_within_90_days/train.parquet</code><span class="file__desc">Intermediate tier train split for `converted_within_90_days` (3,500 rows).</span></li>
+    <li class="file"><code class="file__path">intermediate/tasks/converted_within_90_days/valid.parquet</code><span class="file__desc">Intermediate tier valid split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">intermediate/tasks/converted_within_90_days/test.parquet</code><span class="file__desc">Intermediate tier test split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/accounts.parquet</code><span class="file__desc">Intermediate tier `accounts` relational table (1,500 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/contacts.parquet</code><span class="file__desc">Intermediate tier `contacts` relational table (4,200 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/leads.parquet</code><span class="file__desc">Intermediate tier `leads` relational table (5,000 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/touches.parquet</code><span class="file__desc">Intermediate tier `touches` relational table (38,724 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/sessions.parquet</code><span class="file__desc">Intermediate tier `sessions` relational table (10,012 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/sales_activities.parquet</code><span class="file__desc">Intermediate tier `sales_activities` relational table (20,679 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/tables/opportunities.parquet</code><span class="file__desc">Intermediate tier `opportunities` relational table (4,255 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">intermediate/dataset_card.md</code><span class="file__desc">Intermediate tier auto-rendered dataset card.</span></li>
+    <li class="file"><code class="file__path">intermediate/manifest.json</code><span class="file__desc">Intermediate tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).</span></li>
+    </ul>
+  </details>
+  <details class="tier" open>
+    <summary class="tier__name">advanced/ <span class="tier__count">(14 files)</span></summary>
+    <ul class="tier__files">
+    <li class="file"><code class="file__path">advanced/lead_scoring.csv</code><span class="file__desc">Advanced tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.</span></li>
+    <li class="file"><code class="file__path">advanced/feature_dictionary.csv</code><span class="file__desc">Advanced tier feature dictionary (canonical column spec).</span></li>
+    <li class="file"><code class="file__path">advanced/tasks/converted_within_90_days/train.parquet</code><span class="file__desc">Advanced tier train split for `converted_within_90_days` (3,500 rows).</span></li>
+    <li class="file"><code class="file__path">advanced/tasks/converted_within_90_days/valid.parquet</code><span class="file__desc">Advanced tier valid split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">advanced/tasks/converted_within_90_days/test.parquet</code><span class="file__desc">Advanced tier test split for `converted_within_90_days` (750 rows).</span></li>
+    <li class="file"><code class="file__path">advanced/tables/accounts.parquet</code><span class="file__desc">Advanced tier `accounts` relational table (1,500 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/contacts.parquet</code><span class="file__desc">Advanced tier `contacts` relational table (4,200 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/leads.parquet</code><span class="file__desc">Advanced tier `leads` relational table (5,000 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/touches.parquet</code><span class="file__desc">Advanced tier `touches` relational table (38,208 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/sessions.parquet</code><span class="file__desc">Advanced tier `sessions` relational table (9,942 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/sales_activities.parquet</code><span class="file__desc">Advanced tier `sales_activities` relational table (19,995 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/tables/opportunities.parquet</code><span class="file__desc">Advanced tier `opportunities` relational table (4,004 rows) — snapshot-safe.</span></li>
+    <li class="file"><code class="file__path">advanced/dataset_card.md</code><span class="file__desc">Advanced tier auto-rendered dataset card.</span></li>
+    <li class="file"><code class="file__path">advanced/manifest.json</code><span class="file__desc">Advanced tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).</span></li>
+    </ul>
+  </details>
+</section>
+<section class="schemas">
+  <h2 class="section__heading">Schema / Columns <span class="section__count">(534 columns across 33 tabular files)</span></h2>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/lead_scoring.csv</code> <span class="schema__count">(33 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>split</code></td><td class="col__type">string</td><td class="col__desc">Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.</td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque account identifier.</td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc">Industry vertical of the buying organization.</td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc">Geographic region of the account&#39;s headquarters.</td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc">Banded employee headcount of the account.</td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc">Banded estimated annual revenue of the account.</td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc">Banded internal process maturity score (latent).</td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque contact identifier.</td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc">Functional area of the primary contact (e.g. finance, ops).</td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc">Seniority band of the primary contact.</td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc">Buyer role classification (economic_buyer, champion, etc.).</td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque lead identifier.</td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc">ISO-8601 timestamp when the lead was created.</td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc">Origination source of the lead (e.g. inbound_form, sdr_outbound).</td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc">Marketing channel responsible for the first recorded touch.</td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Total number of marketing/sales touches recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of inbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of outbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of web/trial sessions recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative pricing page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative demo page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc">Sum of session durations (seconds) before snapshot.</td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the first 7 days after lead creation.</td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the last 7 days before snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc">Days between first touch and snapshot cutoff (NaN if no touches).</td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of sales activities logged before snapshot.</td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc">Days elapsed between most recent touch and snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc">Whether any opportunity was created by snapshot date (open or closed).</td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc">Whether an open opportunity existed at snapshot date.</td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc">Estimated ACV of the most recent open opportunity (NaN if none).</td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc">Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).</td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">integer</td><td class="col__desc">Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.</td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc">Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.</td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tasks/converted_within_90_days/train.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tasks/converted_within_90_days/valid.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tasks/converted_within_90_days/test.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/accounts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>company_name</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/contacts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>job_title</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>email_domain_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/leads.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>owner_rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/touches.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>touch_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_direction</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>campaign_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/sessions.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>session_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/sales_activities.parquet</code> <span class="schema__count">(6 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>activity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_outcome</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intro/tables/opportunities.parquet</code> <span class="schema__count">(5 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>opportunity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>stage</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_acv</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/lead_scoring.csv</code> <span class="schema__count">(33 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>split</code></td><td class="col__type">string</td><td class="col__desc">Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.</td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque account identifier.</td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc">Industry vertical of the buying organization.</td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc">Geographic region of the account&#39;s headquarters.</td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc">Banded employee headcount of the account.</td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc">Banded estimated annual revenue of the account.</td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc">Banded internal process maturity score (latent).</td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque contact identifier.</td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc">Functional area of the primary contact (e.g. finance, ops).</td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc">Seniority band of the primary contact.</td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc">Buyer role classification (economic_buyer, champion, etc.).</td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque lead identifier.</td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc">ISO-8601 timestamp when the lead was created.</td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc">Origination source of the lead (e.g. inbound_form, sdr_outbound).</td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc">Marketing channel responsible for the first recorded touch.</td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Total number of marketing/sales touches recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of inbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of outbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of web/trial sessions recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative pricing page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative demo page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc">Sum of session durations (seconds) before snapshot.</td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the first 7 days after lead creation.</td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the last 7 days before snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc">Days between first touch and snapshot cutoff (NaN if no touches).</td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of sales activities logged before snapshot.</td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc">Days elapsed between most recent touch and snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc">Whether any opportunity was created by snapshot date (open or closed).</td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc">Whether an open opportunity existed at snapshot date.</td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc">Estimated ACV of the most recent open opportunity (NaN if none).</td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc">Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).</td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">integer</td><td class="col__desc">Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.</td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc">Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.</td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tasks/converted_within_90_days/train.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tasks/converted_within_90_days/valid.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tasks/converted_within_90_days/test.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/accounts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>company_name</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/contacts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>job_title</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>email_domain_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/leads.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>owner_rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/touches.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>touch_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_direction</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>campaign_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/sessions.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>session_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/sales_activities.parquet</code> <span class="schema__count">(6 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>activity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_outcome</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>intermediate/tables/opportunities.parquet</code> <span class="schema__count">(5 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>opportunity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>stage</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_acv</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/lead_scoring.csv</code> <span class="schema__count">(33 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>split</code></td><td class="col__type">string</td><td class="col__desc">Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.</td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque account identifier.</td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc">Industry vertical of the buying organization.</td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc">Geographic region of the account&#39;s headquarters.</td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc">Banded employee headcount of the account.</td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc">Banded estimated annual revenue of the account.</td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc">Banded internal process maturity score (latent).</td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque contact identifier.</td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc">Functional area of the primary contact (e.g. finance, ops).</td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc">Seniority band of the primary contact.</td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc">Buyer role classification (economic_buyer, champion, etc.).</td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc">Opaque lead identifier.</td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc">ISO-8601 timestamp when the lead was created.</td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc">Origination source of the lead (e.g. inbound_form, sdr_outbound).</td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc">Marketing channel responsible for the first recorded touch.</td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Total number of marketing/sales touches recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of inbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of outbound touches before snapshot.</td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of web/trial sessions recorded before snapshot.</td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative pricing page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc">Cumulative demo page views across all sessions before snapshot.</td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc">Sum of session durations (seconds) before snapshot.</td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the first 7 days after lead creation.</td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">integer</td><td class="col__desc">Number of touches in the last 7 days before snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc">Days between first touch and snapshot cutoff (NaN if no touches).</td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">integer</td><td class="col__desc">Number of sales activities logged before snapshot.</td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc">Days elapsed between most recent touch and snapshot cutoff.</td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc">Whether any opportunity was created by snapshot date (open or closed).</td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc">Whether an open opportunity existed at snapshot date.</td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc">Estimated ACV of the most recent open opportunity (NaN if none).</td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc">Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).</td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">integer</td><td class="col__desc">Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.</td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc">Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.</td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tasks/converted_within_90_days/train.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tasks/converted_within_90_days/valid.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tasks/converted_within_90_days/test.parquet</code> <span class="schema__count">(32 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>inbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>outbound_touch_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_session_duration_seconds</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_week_1</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touches_last_7_days</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_first_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_count</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>days_since_last_touch</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_created</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>has_open_opportunity</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>opportunity_estimated_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>expected_acv</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>total_touches_all</code></td><td class="col__type">number</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>converted_within_90_days</code></td><td class="col__type">boolean</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/accounts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>company_name</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>industry</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>region</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>employee_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_revenue_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>process_maturity_band</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/contacts.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>job_title</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>role_function</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>seniority</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>buyer_role</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>email_domain_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/leads.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>contact_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>account_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_source</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>first_touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>owner_rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/touches.parquet</code> <span class="schema__count">(7 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>touch_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_channel</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>touch_direction</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>campaign_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/sessions.parquet</code> <span class="schema__count">(8 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>session_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>pricing_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>demo_page_views</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>session_duration_seconds</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/sales_activities.parquet</code> <span class="schema__count">(6 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>activity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>rep_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_timestamp</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_type</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>activity_outcome</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+  <details class="schema" open>
+    <summary class="schema__path"><code>advanced/tables/opportunities.parquet</code> <span class="schema__count">(5 columns)</span></summary>
+    <table class="schema__table">
+      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>
+      <tbody>
+      <tr><td class="col__name"><code>opportunity_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>lead_id</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>created_at</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>stage</code></td><td class="col__type">string</td><td class="col__desc"></td></tr>
+      <tr><td class="col__name"><code>estimated_acv</code></td><td class="col__type">integer</td><td class="col__desc"></td></tr>
+      </tbody>
+    </table>
+  </details>
+</section>
+<section class="sources">
+  <h2 class="section__heading">Sources</h2>
+  <ul class="sources__list">
+    <li><a href="https://github.com/leadforge-dev/leadforge" target="_blank" rel="noopener noreferrer">leadforge source repository</a></li>
+    <li><a href="https://github.com/leadforge-dev/leadforge/tree/main/release/validation" target="_blank" rel="noopener noreferrer">v1 release validation report</a></li>
+  </ul>
+</section>
+<footer class="dataset-footer">
+  <div class="dataset-footer__keywords"><span class="chip">b2b</span> <span class="chip">classification</span> <span class="chip">crm</span> <span class="chip">education</span> <span class="chip">lead-scoring</span> <span class="chip">saas</span> <span class="chip">synthetic-data</span> <span class="chip">tabular</span></div>
+  <div class="dataset-footer__license">License: MIT</div>
+  <div class="dataset-footer__note">Local Kaggle publication-readiness preview rendered by scripts/preview_kaggle_page.py — not the live dataset page.</div>
+</footer>
+</main>
+</body>
+</html>
diff --git a/scripts/_preview_common.py b/scripts/_preview_common.py
new file mode 100644
index 0000000..2338a1d
--- /dev/null
+++ b/scripts/_preview_common.py
@@ -0,0 +1,133 @@
+"""Shared primitives for the local Kaggle / HF preview-page scripts.
+
+PR 7.2 — both ``scripts/preview_kaggle_page.py`` and
+``scripts/preview_hf_page.py`` need to:
+
+* HTML-escape user-controlled strings the same way (and emit the
+  same entity form so committed sample HTML doesn't churn between
+  scripts);
+* construct an ``http.server.ThreadingHTTPServer`` rooted at a
+  preview-output directory (chosen for ``allow_reuse_address=True``
+  inheritance from ``HTTPServer``);
+* start serving + optionally pop a browser tab.
+
+Splitting ``make_server`` away from ``serve`` is what lets the test
+suite stand the server up on port 0 in a thread, GET ``/``, and
+shut down cleanly — the alternative (calling ``serve_forever``
+directly) would require subprocess management and a real port
+allocation race.
+"""
+
+from __future__ import annotations
+
+import http.server
+import sys
+import webbrowser
+from pathlib import Path
+from typing import Any
+
+
+def escape(value: str) -> str:
+    """HTML-escape a single attribute / text value.
+
+    Hand-rolled rather than using ``html.escape`` so the committed
+    sample HTML uses the decimal ``&#39;`` entity for ``'`` (matching
+    what the preview scripts emitted at PR-open time) — switching to
+    ``html.escape``'s ``&#x27;`` would force a regen of every
+    committed sample with no observable rendering difference.
+    """
+
+    return (
+        str(value)
+        .replace("&", "&amp;")
+        .replace("<", "&lt;")
+        .replace(">", "&gt;")
+        .replace('"', "&quot;")
+        .replace("'", "&#39;")
+    )
+
+
+def plural(n: int, singular: str, plural_form: str | None = None) -> str:
+    """Return ``f"{n} <word>"`` with ``<word>`` pluralised when ``n != 1``.
+
+    Default pluralisation is the trailing-``s`` rule; pass
+    ``plural_form`` for irregular cases (none today).  Used by the
+    preview-page section-heading counts so output reads as "1 config"
+    rather than "1 configs" — the latter was caught in PR review on
+    the instructor sample (Copilot finding COPILOT-3).
+    """
+
+    word = singular if n == 1 else (plural_form or singular + "s")
+    return f"{n} {word}"
+
+
+def render_cover(filename: str) -> str:
+    """Render a sibling-relative cover-image block.
+
+    Used by both preview scripts; the HF preview previously copied
+    the cover into the preview tree without ever rendering it
+    (Copilot finding COPILOT-2 — either drop the copy or display
+    it; we picked display for symmetry with Kaggle and because HF's
+    live page shows the dataset cover too).  Sibling-relative
+    ``src`` so the same HTML works for both the runtime preview
+    tree (where the image was copied in) and the committed sample
+    (which is byte-compared, not served).
+    """
+
+    src = escape(filename)
+    return f"""<section class="cover">
+  <img class="cover__image" src="{src}" alt="Dataset cover image">
+</section>"""
+
+
+def _make_handler_factory(directory: Path) -> type[http.server.SimpleHTTPRequestHandler]:
+    """Build a handler subclass that serves from ``directory``.
+
+    ``SimpleHTTPRequestHandler`` accepts a ``directory=`` kwarg in
+    Python 3.7+, but threading the path through ``ThreadingHTTPServer``'s
+    ``RequestHandlerClass`` requires either a ``functools.partial`` or
+    a subclass; subclassing keeps the import surface stdlib-only.
+    """
+
+    resolved = str(directory.resolve())
+
+    class _Handler(http.server.SimpleHTTPRequestHandler):
+        def __init__(self, *args: Any, **kwargs: Any) -> None:
+            super().__init__(*args, directory=resolved, **kwargs)
+
+    return _Handler
+
+
+def make_server(directory: Path, port: int) -> http.server.ThreadingHTTPServer:
+    """Build (don't start) an HTTP server rooted at ``directory``.
+
+    ``ThreadingHTTPServer`` (unlike bare ``socketserver.ThreadingTCPServer``)
+    inherits ``allow_reuse_address = True`` from ``HTTPServer`` —
+    matters because Ctrl-C → re-run within ~60s would otherwise raise
+    ``OSError [Errno 48] Address already in use`` while the socket
+    sits in TIME_WAIT.
+
+    Pass ``port=0`` to let the kernel pick a free port; the bound
+    port is then on ``server.server_address[1]``.  This is the seam
+    that makes ``_serve`` testable (test starts the server in a
+    thread, fetches one URL, shuts down).
+    """
+
+    return http.server.ThreadingHTTPServer(("", port), _make_handler_factory(directory))
+
+
+def serve(directory: Path, port: int, *, open_browser: bool) -> None:
+    """Start the HTTP server rooted at ``directory`` and block.
+
+    Blocks on ``serve_forever()``; KeyboardInterrupt (Ctrl-C) is the
+    documented exit path.  Untested by unit tests because it blocks;
+    ``make_server`` is the testable seam.
+    """
+
+    httpd = make_server(directory, port)
+    bound_port = httpd.server_address[1]
+    url = f"http://localhost:{bound_port}/"
+    print(f"serving {directory} at {url} — Ctrl-C to stop", file=sys.stderr)
+    if open_browser:
+        webbrowser.open(url)
+    httpd.serve_forever()
diff --git a/scripts/preview_hf_page.py b/scripts/preview_hf_page.py
new file mode 100644
index 0000000..91b5448
--- /dev/null
+++ b/scripts/preview_hf_page.py
@@ -0,0 +1,506 @@
+#!/usr/bin/env python3
+"""Local publication-readiness preview for the Hugging Face dataset page.
+
+PR 7.2.  Reads the artefact the publish PR will upload
+(``release/huggingface/README.md`` or ``release/huggingface-instructor/README.md``
+per ``--variant=public|instructor``), parses the YAML frontmatter +
+Markdown body, renders an offline HTML page that surfaces the
+published structure (header pills, tag chips, configs dropdown,
+README body, footer), and optionally serves it on
+``http://localhost:8766``.
+
+This is a *publication-readiness* preview — structured rendering of
+the upload artefact that helps catch link / config / YAML-rendering
+issues before the real ``huggingface-cli upload``.  It is
+deliberately NOT an HF look-alike: pixel fidelity is out of scope
+and the chrome is approximate.
+
+Design rationale + decision log: ``docs/release/preview_pages_design.md``.
+
+Usage::
+
+    python scripts/preview_hf_page.py --open-browser              # public variant
+    python scripts/preview_hf_page.py --variant=instructor        # companion repo
+    python scripts/preview_hf_page.py --no-serve                  # build only
+
+Exit codes: 0 success / 2 pre-flight error.
+"""
+
+from __future__ import annotations
+
+import argparse
+import re
+import sys
+from collections.abc import Sequence
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Final
+
+import yaml
+
+# Make ``scripts/`` importable regardless of how this file is loaded.
+sys.path.insert(0, str(Path(__file__).resolve().parent))
+
+from _preview_common import (  # noqa: E402 — must follow sys.path insert
+    escape,
+    plural,
+    render_cover,
+    serve,
+)
+from _release_common import replace_file  # noqa: E402
+
+# ---------------------------------------------------------------------------
+# Defaults
+# ---------------------------------------------------------------------------
+
+DEFAULT_RELEASE_DIR: Final[Path] = Path("release")
+DEFAULT_OUT_DIR_PUBLIC: Final[Path] = Path("release/_preview/huggingface")
+DEFAULT_OUT_DIR_INSTRUCTOR: Final[Path] = Path("release/_preview/huggingface-instructor")
+DEFAULT_PORT: Final[int] = 8766
+
+#: Per-variant relative path to the README (under ``release_dir``).
+_VARIANT_README_REL: Final[dict[str, Path]] = {
+    "public": Path("huggingface/README.md"),
+    "instructor": Path("huggingface-instructor/README.md"),
+}
+VALID_VARIANTS: Final[tuple[str, ...]] = ("public", "instructor")
+
+
+# ---------------------------------------------------------------------------
+# Markdown rendering (markdown-it-py is in [dev] AND [publish])
+# ---------------------------------------------------------------------------
+
+
+def _render_markdown(text: str) -> str:
+    """Render ``text`` to HTML.  See preview_kaggle_page._render_markdown."""
+
+    try:
+        from markdown_it import MarkdownIt
+    except ImportError as exc:  # pragma: no cover — dep is in [dev]
+        raise ImportError(
+            "markdown-it-py is required.  pip install -e '.[dev]' (or [publish])."
+        ) from exc
+    return MarkdownIt("gfm-like").disable("linkify").render(text)
+
+
+# ---------------------------------------------------------------------------
+# Frontmatter parsing
+# ---------------------------------------------------------------------------
+
+#: HF dataset cards open with a ``---`` block of YAML, then the body.
+#: ``re.DOTALL`` matters because the YAML spans multiple lines.
+_FRONTMATTER_RE: Final[re.Pattern[str]] = re.compile(
+    r"\A---\n(?P<yaml>.*?)\n---\n(?P<body>.*)\Z",
+    re.DOTALL,
+)
+
+
+@dataclass(frozen=True)
+class HuggingFaceDoc:
+    """Parsed HF README — frontmatter dict + body markdown."""
+
+    frontmatter: dict[str, Any]
+    body: str
+
+
+def parse_hf_readme(text: str) -> HuggingFaceDoc:
+    """Split an HF README into YAML frontmatter + Markdown body.
+
+    Raises ``ValueError`` if the document does not open with a
+    ``---``-delimited frontmatter block, or if the YAML is not a
+    mapping (every HF dataset card MUST satisfy both).
+    """
+
+    match = _FRONTMATTER_RE.match(text)
+    if not match:
+        raise ValueError(
+            "HF README is missing a YAML frontmatter block (expected '---\\n<yaml>\\n---\\n<body>')"
+        )
+    parsed = yaml.safe_load(match.group("yaml")) or {}
+    if not isinstance(parsed, dict):
+        raise ValueError(
+            f"HF README frontmatter is not a YAML mapping (got {type(parsed).__name__})"
+        )
+    return HuggingFaceDoc(frontmatter=parsed, body=match.group("body"))
+
+
+# ---------------------------------------------------------------------------
+# Section renderers — pure, deterministic
+# ---------------------------------------------------------------------------
+
+
+def _render_header(frontmatter: dict[str, Any]) -> str:
+    """Render the page header — pretty_name + licence / task / size pills."""
+
+    pretty_name = escape(str(frontmatter.get("pretty_name", "")))
+    license_id = escape(str(frontmatter.get("license", "")))
+    languages = ", ".join(escape(str(x)) for x in frontmatter.get("language", []) or [])
+    sizes = ", ".join(escape(str(x)) for x in frontmatter.get("size_categories", []) or [])
+    tasks = ", ".join(escape(str(x)) for x in frontmatter.get("task_categories", []) or [])
+    return f"""<header class="dataset-header">
+  <div class="dataset-header__namespace">huggingface.co/datasets</div>
+  <h1 class="dataset-header__title">{pretty_name}</h1>
+  <ul class="dataset-header__pills">
+    <li class="pill pill--license">License: {license_id}</li>
+    <li class="pill pill--task">Task: {tasks}</li>
+    <li class="pill pill--size">Size: {sizes}</li>
+    <li class="pill pill--language">Language: {languages}</li>
+  </ul>
+</header>"""
+
+
+def _render_tags(frontmatter: dict[str, Any]) -> str:
+    """Render the tag chip row (omitted when no tags)."""
+
+    tags = frontmatter.get("tags", []) or []
+    if not tags:
+        return ""
+    chips = " ".join(f'<span class="chip">{escape(str(t))}</span>' for t in tags)
+    return f'<section class="tags">\n  {chips}\n</section>'
+
+
+def _render_configs(frontmatter: dict[str, Any]) -> str:
+    """Render the configs dropdown — one entry per ``configs[]`` block.
+
+    This is the load-bearing inventory of what the YAML declares: each
+    config + its train/validation/test data_files.  HF's "Subset"
+    selector at the top of the dataset viewer maps to this.  Default
+    config is flagged with a single ``badge--default`` instance.
+    """
+
+    configs = frontmatter.get("configs", []) or []
+    if not configs:
+        return '<section class="configs"><p>No configs declared.</p></section>'
+    blocks: list[str] = []
+    for config in configs:
+        config_name = escape(str(config.get("config_name", "")))
+        is_default = bool(config.get("default"))
+        default_badge = ' <span class="badge badge--default">default</span>' if is_default else ""
+        data_files = config.get("data_files", []) or []
+        rows = "\n".join(
+            f"      <tr><td>{escape(str(df.get('split', '')))}</td>"
+            f"<td><code>{escape(str(df.get('path', '')))}</code></td></tr>"
+            for df in data_files
+        )
+        blocks.append(
+            f'  <details class="config" open>\n'
+            f'    <summary class="config__name"><code>{config_name}</code>{default_badge} '
+            f'<span class="config__count">({plural(len(data_files), "split")})</span>'
+            f"</summary>\n"
+            f'    <table class="config__table">\n'
+            f"      <thead><tr><th>Split</th><th>Path</th></tr></thead>\n"
+            f"      <tbody>\n{rows}\n      </tbody>\n"
+            f"    </table>\n"
+            f"  </details>"
+        )
+    return f"""<section class="configs">
+  <h2 class="section__heading">Configurations / Subsets <span class="section__count">({plural(len(configs), "config")})</span></h2>
+{chr(10).join(blocks)}
+</section>"""
+
+
+def _render_readme_body(body_md: str) -> str:
+    """Render the README body (everything after the YAML)."""
+
+    return f'<section class="readme">\n{_render_markdown(body_md)}</section>'
+
+
+def _render_footer(frontmatter: dict[str, Any], variant: str) -> str:
+    """Render the licence + variant note footer."""
+
+    license_id = escape(str(frontmatter.get("license", "")))
+    return f"""<footer class="dataset-footer">
+  <div class="dataset-footer__license">License: {license_id}</div>
+  <div class="dataset-footer__variant">Variant: <code>{escape(variant)}</code></div>
+  <div class="dataset-footer__note">Local Hugging Face publication-readiness preview rendered by scripts/preview_hf_page.py — not the live dataset page.</div>
+</footer>"""
+
+
+# ---------------------------------------------------------------------------
+# HTML wrapper + minimal CSS
+# ---------------------------------------------------------------------------
+
+_PAGE_CSS: Final[str] = """\
+:root { --bg:#fff; --fg:#1f2937; --muted:#6b7280; --accent:#ff9d00; --border:#e5e7eb; --pill-bg:#f3f4f6; --code-bg:#f9fafb; }
+body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, sans-serif; color: var(--fg); background: var(--bg); margin: 0; padding: 0; line-height: 1.6; }
+.container { max-width: 1100px; margin: 0 auto; padding: 24px 32px; }
+.dataset-header { border-bottom: 1px solid var(--border); padding-bottom: 16px; margin-bottom: 24px; }
+.dataset-header__namespace { color: var(--muted); font-size: 0.85em; font-family: monospace; margin-bottom: 4px; }
+.dataset-header__title { font-size: 1.8em; margin: 0 0 12px 0; }
+.dataset-header__pills { list-style: none; padding: 0; margin: 0; display: flex; flex-wrap: wrap; gap: 8px; }
+.pill { background: var(--pill-bg); border-radius: 12px; padding: 4px 12px; font-size: 0.85em; color: var(--fg); }
+.cover { margin: 0 0 24px 0; border: 1px solid var(--border); border-radius: 4px; overflow: hidden; }
+.cover__image { display: block; max-width: 100%; height: auto; }
+.tags { margin: 0 0 24px 0; }
+.chip { display: inline-block; background: var(--pill-bg); border-radius: 12px; padding: 2px 10px; margin: 2px 4px 2px 0; font-size: 0.85em; color: var(--fg); }
+.section__heading { font-size: 1.3em; border-bottom: 2px solid var(--accent); padding-bottom: 4px; margin-top: 32px; }
+.section__count { color: var(--muted); font-size: 0.7em; font-weight: normal; }
+.config { border: 1px solid var(--border); border-radius: 4px; padding: 8px 12px; margin: 8px 0; }
+.config__name { cursor: pointer; font-weight: 600; }
+.config__count { color: var(--muted); font-weight: normal; font-size: 0.85em; }
+.badge { display: inline-block; padding: 1px 8px; border-radius: 4px; font-size: 0.75em; font-weight: 600; vertical-align: middle; margin-left: 4px; }
+.badge--default { background: var(--accent); color: white; }
+.config__table { width: 100%; border-collapse: collapse; margin-top: 8px; font-size: 0.9em; }
+.config__table th, .config__table td { text-align: left; padding: 6px 8px; border-bottom: 1px solid var(--border); }
+.config__table th { background: var(--pill-bg); font-weight: 600; }
+.readme { margin: 24px 0; }
+.readme code { background: var(--code-bg); padding: 1px 4px; border-radius: 2px; font-size: 0.9em; }
+.readme pre { background: var(--code-bg); padding: 12px; border-radius: 4px; overflow-x: auto; }
+.readme pre code { background: none; padding: 0; }
+.readme table { border-collapse: collapse; margin: 12px 0; }
+.readme th, .readme td { border: 1px solid var(--border); padding: 6px 10px; text-align: left; }
+.readme blockquote { border-left: 3px solid var(--accent); padding-left: 12px; color: var(--muted); margin: 12px 0; }
+.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
+.dataset-footer__note { font-style: italic; margin-top: 8px; }
+"""
+
+
+def _wrap_html(*, title: str, body: str) -> str:
+    return f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <title>HF preview — {escape(title)}</title>
+  <style>{_PAGE_CSS}</style>
+</head>
+<body>
+<main class="container">
+{body}
+</main>
+</body>
+</html>
+"""
+
+
+# ---------------------------------------------------------------------------
+# Top-level renderer
+# ---------------------------------------------------------------------------
+
+
+#: Cover-image filename in the HF upload tree.  Pinned (not derived
+#: from the YAML — HF's dataset card doesn't reference the cover; the
+#: file lives at the root of the upload directory and is consumed by
+#: HF's UI, not the README body) so the preview's cover render is
+#: deterministic given just the parsed doc.
+HF_COVER_IMAGE_FILENAME: Final[str] = "dataset-cover-image.png"
+
+
+def render_hf_html(
+    doc: HuggingFaceDoc,
+    *,
+    variant: str,
+    cover_image_filename: str = HF_COVER_IMAGE_FILENAME,
+) -> str:
+    """Render the full HF preview HTML.
+
+    Pure: same ``(doc, variant, cover_image_filename)`` → byte-identical
+    HTML.  No I/O, no clock, no random.  The cover-image block was
+    added in self-review pass 4 (Copilot finding COPILOT-2 — the
+    driver was copying the cover into the preview tree without ever
+    rendering it).
+    """
+
+    body_parts = [
+        _render_header(doc.frontmatter),
+        render_cover(cover_image_filename),
+        _render_tags(doc.frontmatter),
+        _render_configs(doc.frontmatter),
+        _render_readme_body(doc.body),
+        _render_footer(doc.frontmatter, variant=variant),
+    ]
+    return _wrap_html(
+        title=str(doc.frontmatter.get("pretty_name", "")),
+        body="\n".join(p for p in body_parts if p),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Driver
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class PreviewConfig:
+    """Frozen driver config — built from CLI args or test input."""
+
+    release_dir: Path
+    out_dir: Path
+    port: int
+    variant: str
+    open_browser: bool
+    serve: bool
+
+
+@dataclass(frozen=True)
+class PreviewOutcome:
+    """Return value from :func:`run_preview`.
+
+    ``cover_path`` is always set on success — the driver always
+    copies the cover into the preview tree.
+    """
+
+    html_path: Path
+    cover_path: Path
+
+
+#: Required frontmatter keys the renderer indexes directly; validated
+#: up-front in ``run_preview`` so a malformed README surfaces as
+#: ``ValueError`` → CLI rc=2 rather than silently rendering empty
+#: pretty_name / license pills (Copilot finding COPILOT-1, applied
+#: symmetrically to the HF script).
+_REQUIRED_FRONTMATTER_KEYS: Final[tuple[str, ...]] = ("pretty_name", "license")
+
+
+def _validate_required_frontmatter(frontmatter: dict[str, Any], path: Path) -> None:
+    """Raise ``ValueError`` if required HF frontmatter keys are missing.
+
+    ``pretty_name`` and ``license`` are the two HF requires *and* the
+    two we display prominently; missing or empty values would render
+    a half-blank header that's easy to miss.
+    """
+
+    missing = sorted(
+        k for k in _REQUIRED_FRONTMATTER_KEYS if not str(frontmatter.get(k, "")).strip()
+    )
+    if missing:
+        raise ValueError(f"{path} frontmatter is missing required key(s): {', '.join(missing)}")
+
+
+def _resolve_cover_image(release_dir: Path, variant: str) -> Path:
+    """Locate the cover image for the variant.
+
+    Lookup order: variant-specific upload tree (assembled by the HF
+    packager — gitignored, absent on a fresh checkout) → committed
+    master copy under ``release_dir``.
+    """
+
+    variant_dir = "huggingface" if variant == "public" else "huggingface-instructor"
+    for candidate in (
+        release_dir / variant_dir / "dataset-cover-image.png",
+        release_dir / "dataset-cover-image.png",
+    ):
+        if candidate.is_file():
+            return candidate
+    return release_dir / variant_dir / "dataset-cover-image.png"
+
+
+def run_preview(config: PreviewConfig) -> PreviewOutcome:
+    """Render the preview HTML, optionally serve it.
+
+    Pre-flight failures (missing README, malformed YAML, missing
+    cover, unknown variant) raise; the CLI converts to rc=2.
+    """
+
+    if config.variant not in VALID_VARIANTS:
+        raise ValueError(f"unknown --variant {config.variant!r}; expected one of {VALID_VARIANTS}")
+
+    readme_path = config.release_dir / _VARIANT_README_REL[config.variant]
+    if not readme_path.is_file():
+        raise FileNotFoundError(
+            f"HF README not found at {readme_path}; "
+            f"regenerate via scripts/package_hf_release.py --variant={config.variant} first"
+        )
+    doc = parse_hf_readme(readme_path.read_text(encoding="utf-8"))
+    _validate_required_frontmatter(doc.frontmatter, readme_path)
+
+    cover_src = _resolve_cover_image(config.release_dir, config.variant)
+    if not cover_src.is_file():
+        raise FileNotFoundError(
+            f"cover image not found at {cover_src} "
+            f"(looked in the {config.variant} upload tree and {config.release_dir}/)"
+        )
+
+    config.out_dir.mkdir(parents=True, exist_ok=True)
+    html_path = config.out_dir / "index.html"
+    html_path.write_text(render_hf_html(doc, variant=config.variant), encoding="utf-8")
+
+    cover_dst = config.out_dir / "dataset-cover-image.png"
+    replace_file(cover_src, cover_dst)
+
+    if config.serve:
+        serve(config.out_dir, config.port, open_browser=config.open_browser)
+
+    return PreviewOutcome(html_path=html_path, cover_path=cover_dst)
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def parse_args(argv: Sequence[str] | None = None) -> argparse.Namespace:
+    """Parse the CLI.  Free function so tests can build a Namespace."""
+
+    parser = argparse.ArgumentParser(
+        prog="preview_hf_page",
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "--release-dir",
+        type=Path,
+        default=DEFAULT_RELEASE_DIR,
+        help="release tree containing huggingface[-instructor]/README.md (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--out-dir",
+        type=Path,
+        default=None,
+        help=(
+            "where to write the rendered preview "
+            "(default: release/_preview/huggingface for variant=public, "
+            "release/_preview/huggingface-instructor for variant=instructor)"
+        ),
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=DEFAULT_PORT,
+        help="port for the local HTTP server (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--variant",
+        choices=VALID_VARIANTS,
+        default="public",
+        help="public (3-tier) or instructor (companion repo); default: %(default)s",
+    )
+    parser.add_argument(
+        "--open-browser",
+        action="store_true",
+        help="pop a browser tab on the served URL after the page renders",
+    )
+    parser.add_argument(
+        "--no-serve",
+        action="store_true",
+        help="render the HTML and exit; don't start the server (CI / inspection mode)",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = parse_args(argv)
+    out_dir: Path = args.out_dir or (
+        DEFAULT_OUT_DIR_PUBLIC if args.variant == "public" else DEFAULT_OUT_DIR_INSTRUCTOR
+    )
+    config = PreviewConfig(
+        release_dir=args.release_dir,
+        out_dir=out_dir,
+        port=args.port,
+        variant=args.variant,
+        open_browser=args.open_browser,
+        serve=not args.no_serve,
+    )
+    try:
+        outcome = run_preview(config)
+    except FileNotFoundError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    except ValueError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    print(f"wrote {outcome.html_path}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/scripts/preview_kaggle_page.py b/scripts/preview_kaggle_page.py
new file mode 100644
index 0000000..de5a61b
--- /dev/null
+++ b/scripts/preview_kaggle_page.py
@@ -0,0 +1,512 @@
+#!/usr/bin/env python3
+"""Local publication-readiness preview for the Kaggle dataset page.
+
+PR 7.2.  Reads the artefacts the publish PR will upload
+(``release/kaggle/dataset-metadata.json`` + cover image), renders an
+offline HTML page that surfaces the published structure (header,
+cover, description, file tree, schema tables, sources, footer), and
+optionally serves it on ``http://localhost:8765``.
+
+This is a *publication-readiness* preview — structured rendering of
+the upload artefacts that helps catch link / config / column-listing
+issues before the real ``kaggle datasets create`` upload.  It is
+deliberately NOT a Kaggle look-alike: pixel fidelity is out of scope
+and the chrome (CSS palette, layout) is approximate.
+
+Design rationale + decision log: ``docs/release/preview_pages_design.md``.
+
+Usage::
+
+    python scripts/preview_kaggle_page.py --open-browser  # serve + browser
+    python scripts/preview_kaggle_page.py --no-serve      # build only
+
+Exit codes: 0 success / 2 pre-flight error.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from collections.abc import Sequence
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Final
+
+# Make ``scripts/`` importable regardless of how this file is loaded
+# (CLI entrypoint, ``importlib.util.spec_from_file_location`` from tests).
+sys.path.insert(0, str(Path(__file__).resolve().parent))
+
+from _preview_common import (  # noqa: E402 — must follow sys.path insert
+    escape,
+    plural,
+    render_cover,
+    serve,
+)
+from _release_common import replace_file  # noqa: E402
+
+# ---------------------------------------------------------------------------
+# Defaults
+# ---------------------------------------------------------------------------
+
+DEFAULT_RELEASE_DIR: Final[Path] = Path("release")
+DEFAULT_OUT_DIR: Final[Path] = Path("release/_preview/kaggle")
+DEFAULT_PORT: Final[int] = 8765
+
+
+# ---------------------------------------------------------------------------
+# Markdown rendering (markdown-it-py is in [dev] AND [publish])
+# ---------------------------------------------------------------------------
+
+
+def _render_markdown(text: str) -> str:
+    """Render ``text`` (the inlined README body) to HTML.
+
+    ``gfm-like`` preset gives tables / fenced code / strikethrough;
+    ``linkify`` is explicitly disabled so the optional
+    ``linkify-it-py`` transitive dep is not required.
+    """
+
+    try:
+        from markdown_it import MarkdownIt
+    except ImportError as exc:  # pragma: no cover — dep is in [dev]
+        raise ImportError(
+            "markdown-it-py is required.  pip install -e '.[dev]' (or [publish])."
+        ) from exc
+    return MarkdownIt("gfm-like").disable("linkify").render(text)
+
+
+# ---------------------------------------------------------------------------
+# Tier inference
+# ---------------------------------------------------------------------------
+
+
+def _tier_of(resource_path: str) -> str:
+    """Return the leading path segment of ``resource_path``, or ``""``.
+
+    Used to bucket resources by tier in the file tree.  Empty string
+    means top-level (none today, tolerated for forward compatibility).
+    """
+
+    parts = resource_path.split("/", 1)
+    return parts[0] if len(parts) > 1 else ""
+
+
+# ---------------------------------------------------------------------------
+# Section renderers — pure, deterministic
+# ---------------------------------------------------------------------------
+
+
+def _render_header(metadata: dict[str, Any]) -> str:
+    """Render the page header — title, subtitle, id, licence, frequency.
+
+    Visibility is intentionally NOT rendered: Kaggle's public dataset
+    page does not display ``isPrivate``, so showing it here would
+    misrepresent what public viewers see.
+    """
+
+    title = escape(metadata["title"])
+    subtitle = escape(metadata["subtitle"])
+    dataset_id = escape(metadata["id"])
+    license_name = escape(metadata["licenses"][0]["name"]) if metadata.get("licenses") else ""
+    update_freq = escape(metadata.get("expectedUpdateFrequency", ""))
+
+    return f"""<header class="dataset-header">
+  <div class="dataset-header__id">{dataset_id}</div>
+  <h1 class="dataset-header__title">{title}</h1>
+  <p class="dataset-header__subtitle">{subtitle}</p>
+  <ul class="dataset-header__pills">
+    <li class="pill pill--license">License: {license_name}</li>
+    <li class="pill pill--frequency">Updates: {update_freq}</li>
+  </ul>
+</header>"""
+
+
+def _render_description(description_md: str) -> str:
+    """Render the inlined README body as HTML."""
+
+    return f'<section class="description">\n{_render_markdown(description_md)}</section>'
+
+
+def _render_file_tree(resources: list[dict[str, Any]]) -> str:
+    """Render the file tree, grouped by tier (leading path segment)."""
+
+    by_tier: dict[str, list[dict[str, Any]]] = {}
+    for resource in resources:
+        by_tier.setdefault(_tier_of(resource["path"]), []).append(resource)
+
+    blocks: list[str] = []
+    for tier, tier_resources in by_tier.items():
+        tier_label = escape(tier) if tier else "(top-level)"
+        items: list[str] = []
+        for resource in tier_resources:
+            path = escape(resource["path"])
+            description = escape(resource.get("description", ""))
+            items.append(
+                f'    <li class="file"><code class="file__path">{path}</code>'
+                f'<span class="file__desc">{description}</span></li>'
+            )
+        blocks.append(
+            f'  <details class="tier" open>\n'
+            f'    <summary class="tier__name">{tier_label}/ '
+            f'<span class="tier__count">({plural(len(tier_resources), "file")})</span>'
+            f"</summary>\n"
+            f'    <ul class="tier__files">\n' + "\n".join(items) + "\n    </ul>\n"
+            "  </details>"
+        )
+    return f"""<section class="files">
+  <h2 class="section__heading">Data Files <span class="section__count">({len(resources)} total)</span></h2>
+{chr(10).join(blocks)}
+</section>"""
+
+
+def _render_schema_tables(resources: list[dict[str, Any]]) -> str:
+    """Render one schema/columns table per tabular resource."""
+
+    blocks: list[str] = []
+    total_columns = 0
+    for resource in resources:
+        schema = resource.get("schema")
+        if not schema:
+            continue
+        fields = schema.get("fields", [])
+        if not fields:
+            continue
+        total_columns += len(fields)
+        path = escape(resource["path"])
+        rows: list[str] = []
+        for fd in fields:
+            name = escape(fd.get("name", ""))
+            ftype = escape(fd.get("type", ""))
+            description = escape(fd.get("description", ""))
+            rows.append(
+                f"      <tr>"
+                f'<td class="col__name"><code>{name}</code></td>'
+                f'<td class="col__type">{ftype}</td>'
+                f'<td class="col__desc">{description}</td>'
+                f"</tr>"
+            )
+        blocks.append(
+            f'  <details class="schema" open>\n'
+            f'    <summary class="schema__path"><code>{path}</code> '
+            f'<span class="schema__count">({plural(len(fields), "column")})</span>'
+            f"</summary>\n"
+            f'    <table class="schema__table">\n'
+            f"      <thead><tr><th>Column</th><th>Type</th><th>Description</th></tr></thead>\n"
+            f"      <tbody>\n" + "\n".join(rows) + "\n      </tbody>\n"
+            "    </table>\n"
+            "  </details>"
+        )
+    return f"""<section class="schemas">
+  <h2 class="section__heading">Schema / Columns <span class="section__count">({plural(total_columns, "column")} across {plural(len(blocks), "tabular file")})</span></h2>
+{chr(10).join(blocks)}
+</section>"""
+
+
+def _render_sources(metadata: dict[str, Any]) -> str:
+    """Render the user-specified sources block (omitted when empty)."""
+
+    sources = metadata.get("userSpecifiedSources", []) or []
+    if not sources:
+        return ""
+    items = "\n".join(
+        f'    <li><a href="{escape(s["url"])}" target="_blank" rel="noopener noreferrer">'
+        f"{escape(s['title'])}</a></li>"
+        for s in sources
+    )
+    return f"""<section class="sources">
+  <h2 class="section__heading">Sources</h2>
+  <ul class="sources__list">
+{items}
+  </ul>
+</section>"""
+
+
+def _render_footer(metadata: dict[str, Any]) -> str:
+    """Render the licence + keywords footer."""
+
+    keywords = metadata.get("keywords", []) or []
+    keyword_chips = " ".join(f'<span class="chip">{escape(k)}</span>' for k in keywords)
+    license_name = escape(metadata["licenses"][0]["name"]) if metadata.get("licenses") else ""
+    return f"""<footer class="dataset-footer">
+  <div class="dataset-footer__keywords">{keyword_chips}</div>
+  <div class="dataset-footer__license">License: {license_name}</div>
+  <div class="dataset-footer__note">Local Kaggle publication-readiness preview rendered by scripts/preview_kaggle_page.py — not the live dataset page.</div>
+</footer>"""
+
+
+# ---------------------------------------------------------------------------
+# HTML wrapper + minimal CSS
+# ---------------------------------------------------------------------------
+
+#: Inlined for a single self-contained HTML file (easier inspection,
+#: simpler byte-compare in the regeneration-discipline test, works
+#: without a server).  Palette is approximate, not branded.
+_PAGE_CSS: Final[str] = """\
+:root { --bg:#fff; --fg:#202124; --muted:#5f6368; --accent:#20beff; --border:#e0e0e0; --pill-bg:#f1f3f4; }
+body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; color: var(--fg); background: var(--bg); margin: 0; padding: 0; line-height: 1.5; }
+.container { max-width: 1100px; margin: 0 auto; padding: 24px 32px; }
+.dataset-header { border-bottom: 1px solid var(--border); padding-bottom: 16px; margin-bottom: 24px; }
+.dataset-header__id { color: var(--muted); font-size: 0.85em; font-family: monospace; margin-bottom: 4px; }
+.dataset-header__title { font-size: 1.8em; margin: 0 0 4px 0; }
+.dataset-header__subtitle { color: var(--muted); margin: 0 0 12px 0; }
+.dataset-header__pills { list-style: none; padding: 0; margin: 0; display: flex; flex-wrap: wrap; gap: 8px; }
+.pill { background: var(--pill-bg); border-radius: 12px; padding: 4px 12px; font-size: 0.85em; color: var(--fg); }
+.cover { margin: 0 0 24px 0; border: 1px solid var(--border); border-radius: 4px; overflow: hidden; }
+.cover__image { display: block; max-width: 100%; height: auto; }
+.section__heading { font-size: 1.3em; border-bottom: 2px solid var(--accent); padding-bottom: 4px; margin-top: 32px; }
+.section__count { color: var(--muted); font-size: 0.7em; font-weight: normal; }
+.tier, .schema { border: 1px solid var(--border); border-radius: 4px; padding: 8px 12px; margin: 8px 0; }
+.tier__name, .schema__path { cursor: pointer; font-weight: 600; }
+.tier__count, .schema__count { color: var(--muted); font-weight: normal; font-size: 0.85em; }
+.tier__files { list-style: none; padding: 8px 0 0 0; margin: 0; }
+.file { display: flex; gap: 12px; padding: 4px 0; border-bottom: 1px dotted var(--border); }
+.file:last-child { border-bottom: none; }
+.file__path { color: var(--accent); flex-shrink: 0; }
+.file__desc { color: var(--muted); font-size: 0.9em; }
+.schema__table { width: 100%; border-collapse: collapse; margin-top: 8px; font-size: 0.9em; }
+.schema__table th, .schema__table td { text-align: left; padding: 6px 8px; border-bottom: 1px solid var(--border); vertical-align: top; }
+.schema__table th { background: var(--pill-bg); font-weight: 600; }
+.col__name code { background: none; }
+.col__type { color: var(--muted); font-family: monospace; }
+.description { margin: 24px 0; }
+.description code { background: var(--pill-bg); padding: 1px 4px; border-radius: 2px; font-size: 0.9em; }
+.description pre { background: var(--pill-bg); padding: 12px; border-radius: 4px; overflow-x: auto; }
+.description pre code { background: none; padding: 0; }
+.description table { border-collapse: collapse; margin: 12px 0; }
+.description th, .description td { border: 1px solid var(--border); padding: 6px 10px; text-align: left; }
+.description blockquote { border-left: 3px solid var(--accent); padding-left: 12px; color: var(--muted); margin: 12px 0; }
+.sources__list { padding-left: 20px; }
+.dataset-footer { margin-top: 48px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--muted); font-size: 0.9em; }
+.dataset-footer__keywords { margin-bottom: 8px; }
+.chip { display: inline-block; background: var(--pill-bg); border-radius: 12px; padding: 2px 10px; margin: 2px; font-size: 0.85em; }
+.dataset-footer__note { font-style: italic; margin-top: 8px; }
+"""
+
+
+def _wrap_html(*, title: str, body: str) -> str:
+    return f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <title>Kaggle preview — {escape(title)}</title>
+  <style>{_PAGE_CSS}</style>
+</head>
+<body>
+<main class="container">
+{body}
+</main>
+</body>
+</html>
+"""
+
+
+# ---------------------------------------------------------------------------
+# Top-level renderer
+# ---------------------------------------------------------------------------
+
+
+def render_kaggle_html(metadata: dict[str, Any], cover_image_filename: str) -> str:
+    """Render the full Kaggle preview HTML.
+
+    Pure: same ``(metadata, cover_image_filename)`` → byte-identical
+    HTML.  No I/O, no clock, no random.
+    """
+
+    body_parts = [
+        _render_header(metadata),
+        render_cover(cover_image_filename),
+        _render_description(metadata.get("description", "")),
+        _render_file_tree(metadata.get("resources", [])),
+        _render_schema_tables(metadata.get("resources", [])),
+        _render_sources(metadata),
+        _render_footer(metadata),
+    ]
+    return _wrap_html(title=metadata.get("title", ""), body="\n".join(p for p in body_parts if p))
+
+
+# ---------------------------------------------------------------------------
+# Driver
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class PreviewConfig:
+    """Frozen driver config — built from CLI args or test input."""
+
+    release_dir: Path
+    out_dir: Path
+    port: int
+    open_browser: bool
+    serve: bool
+
+
+@dataclass(frozen=True)
+class PreviewOutcome:
+    """Return value from :func:`run_preview`.
+
+    ``cover_path`` is always set on success — the driver always
+    copies the cover into the preview tree.
+    """
+
+    html_path: Path
+    cover_path: Path
+
+
+#: Required keys the renderer indexes directly (without ``.get``);
+#: validated up-front in ``run_preview`` so a malformed metadata file
+#: surfaces as ``ValueError`` → CLI rc=2 rather than a ``KeyError``
+#: traceback mid-render (Copilot finding COPILOT-1).
+_REQUIRED_METADATA_KEYS: Final[tuple[str, ...]] = (
+    "title",
+    "subtitle",
+    "id",
+    "expectedUpdateFrequency",
+    "image",
+)
+
+
+def _validate_required_metadata(metadata: dict[str, Any], path: Path) -> None:
+    """Raise ``ValueError`` if required Kaggle metadata keys are missing.
+
+    Catches the case where ``dataset-metadata.json`` is hand-edited or
+    produced by a future broken packager; the renderer's
+    ``_render_header`` / ``_render_footer`` index these directly and
+    would otherwise raise ``KeyError`` mid-render, bypassing
+    ``main()``'s rc=2 handling.
+    """
+
+    missing = sorted(k for k in _REQUIRED_METADATA_KEYS if k not in metadata)
+    licenses = metadata.get("licenses")
+    if (
+        not isinstance(licenses, list)
+        or not licenses
+        or not isinstance(licenses[0], dict)
+        or "name" not in licenses[0]
+    ):
+        missing.append("licenses[0].name")
+    if missing:
+        raise ValueError(f"{path} is missing required key(s): {', '.join(missing)}")
+
+
+def _resolve_cover_image(release_dir: Path, image_name: str) -> Path:
+    """Locate the cover image referenced by the metadata's ``image``.
+
+    Lookup order: ``release/kaggle/<image_name>`` (assembled
+    upload-tree copy, present after the maintainer runs the Kaggle
+    packager — gitignored, so absent on a fresh checkout) →
+    ``release/<image_name>`` (the committed master copy).
+    """
+
+    for candidate in (release_dir / "kaggle" / image_name, release_dir / image_name):
+        if candidate.is_file():
+            return candidate
+    return release_dir / "kaggle" / image_name  # surface the missing-file error here
+
+
+def run_preview(config: PreviewConfig) -> PreviewOutcome:
+    """Render the preview HTML, optionally serve it.
+
+    Validation discipline: build → validate → write.  Pre-flight
+    failures (missing metadata, malformed JSON, missing cover) raise;
+    the CLI converts to rc=2.
+    """
+
+    metadata_path = config.release_dir / "kaggle" / "dataset-metadata.json"
+    if not metadata_path.is_file():
+        raise FileNotFoundError(
+            f"Kaggle dataset metadata not found at {metadata_path}; "
+            f"regenerate via scripts/package_kaggle_release.py first"
+        )
+    metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
+    if not isinstance(metadata, dict):
+        raise ValueError(f"{metadata_path} is not a JSON object")
+    _validate_required_metadata(metadata, metadata_path)
+
+    cover_name = metadata["image"]
+    if not cover_name:
+        raise ValueError(f"{metadata_path} declares no 'image' (cover image filename)")
+    cover_src = _resolve_cover_image(config.release_dir, cover_name)
+    if not cover_src.is_file():
+        raise FileNotFoundError(f"cover image declared as {cover_name!r} not found at {cover_src}")
+
+    config.out_dir.mkdir(parents=True, exist_ok=True)
+    html_path = config.out_dir / "index.html"
+    html_path.write_text(render_kaggle_html(metadata, cover_name), encoding="utf-8")
+
+    cover_dst = config.out_dir / cover_name
+    replace_file(cover_src, cover_dst)
+
+    if config.serve:
+        serve(config.out_dir, config.port, open_browser=config.open_browser)
+
+    return PreviewOutcome(html_path=html_path, cover_path=cover_dst)
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def parse_args(argv: Sequence[str] | None = None) -> argparse.Namespace:
+    """Parse the CLI.  Free function so tests can build a Namespace."""
+
+    parser = argparse.ArgumentParser(
+        prog="preview_kaggle_page",
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "--release-dir",
+        type=Path,
+        default=DEFAULT_RELEASE_DIR,
+        help="release tree containing kaggle/dataset-metadata.json (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--out-dir",
+        type=Path,
+        default=DEFAULT_OUT_DIR,
+        help="where to write the rendered preview (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=DEFAULT_PORT,
+        help="port for the local HTTP server (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--open-browser",
+        action="store_true",
+        help="pop a browser tab on the served URL after the page renders",
+    )
+    parser.add_argument(
+        "--no-serve",
+        action="store_true",
+        help="render the HTML and exit; don't start the server (CI / inspection mode)",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = parse_args(argv)
+    config = PreviewConfig(
+        release_dir=args.release_dir,
+        out_dir=args.out_dir,
+        port=args.port,
+        open_browser=args.open_browser,
+        serve=not args.no_serve,
+    )
+    try:
+        outcome = run_preview(config)
+    except FileNotFoundError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    except ValueError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    print(f"wrote {outcome.html_path}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/scripts/test_preview_hf_page.py b/tests/scripts/test_preview_hf_page.py
new file mode 100644
index 0000000..7369e9b
--- /dev/null
+++ b/tests/scripts/test_preview_hf_page.py
@@ -0,0 +1,532 @@
+"""Tests for ``scripts/preview_hf_page.py`` (PR 7.2).
+
+Locks the local Hugging Face preview-page contract:
+
+* required field labels appear in the rendered HTML (pretty_name,
+  licence, configs, tags) — the four roadmap-mandated HF checks;
+* every Markdown link in the README body resolves to a non-404 URL
+  pattern (no ``](../`` survives, no ``](validation/...)``);
+* every ``configs[]`` block in the YAML round-trips through to the
+  rendered configs dropdown;
+* the renderer is byte-deterministic and the committed samples at
+  ``release/_preview_committed/huggingface_{public,instructor}.html``
+  match a fresh regeneration (audit-artefact-sync gate);
+* the ``--variant`` flag wires up the right input README, output
+  dir, and footer label;
+* the driver exits with rc=2 on missing artefacts (no live HTTP).
+
+No network. No live HTTP.
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import re
+import sys
+from pathlib import Path
+
+import pytest
+
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_SCRIPT_PATH = _REPO_ROOT / "scripts" / "preview_hf_page.py"
+_spec = importlib.util.spec_from_file_location("preview_hf_page", _SCRIPT_PATH)
+assert _spec is not None
+assert _spec.loader is not None
+preview = importlib.util.module_from_spec(_spec)
+sys.modules["preview_hf_page"] = preview
+_spec.loader.exec_module(preview)
+
+
+_RELEASE_DIR = _REPO_ROOT / "release"
+_PUBLIC_README = _RELEASE_DIR / "huggingface" / "README.md"
+_INSTRUCTOR_README = _RELEASE_DIR / "huggingface-instructor" / "README.md"
+_PUBLIC_SAMPLE = _REPO_ROOT / "release" / "_preview_committed" / "huggingface_public.html"
+_INSTRUCTOR_SAMPLE = _REPO_ROOT / "release" / "_preview_committed" / "huggingface_instructor.html"
+_PUBLIC_PRESENT = _PUBLIC_README.exists()
+_INSTRUCTOR_PRESENT = _INSTRUCTOR_README.exists()
+
+# Same allow-list rule as the Kaggle preview tests — see
+# ``test_preview_kaggle_page.py`` for rationale.
+_LINK_OK_PREFIXES = (
+    "https://github.com/leadforge-dev/leadforge",
+    "https://huggingface.co/datasets/leadforge",
+    "https://example.com",
+    "LICENSE",
+    "#",
+)
+
+
+# ---------------------------------------------------------------------------
+# Frontmatter parsing
+# ---------------------------------------------------------------------------
+
+
+def test_parse_hf_readme_extracts_yaml_and_body() -> None:
+    text = "---\npretty_name: Test\nlicense: mit\n---\n# Body\n\nText.\n"
+    doc = preview.parse_hf_readme(text)
+    assert doc.frontmatter == {"pretty_name": "Test", "license": "mit"}
+    assert doc.body == "# Body\n\nText.\n"
+
+
+def test_parse_hf_readme_rejects_missing_frontmatter() -> None:
+    with pytest.raises(ValueError, match="missing a YAML frontmatter"):
+        preview.parse_hf_readme("# No frontmatter here\n")
+
+
+def test_parse_hf_readme_rejects_non_mapping_frontmatter() -> None:
+    with pytest.raises(ValueError, match="not a YAML mapping"):
+        preview.parse_hf_readme("---\n- 1\n- 2\n---\nbody\n")
+
+
+# ---------------------------------------------------------------------------
+# Pure-renderer fixtures
+# ---------------------------------------------------------------------------
+
+
+def _minimal_doc() -> preview.HuggingFaceDoc:
+    """A minimum-viable HF doc exercising every renderer branch."""
+
+    return preview.HuggingFaceDoc(
+        frontmatter={
+            "pretty_name": "TestSet: Mock HF Dataset",
+            "license": "mit",
+            "language": ["en"],
+            "task_categories": ["tabular-classification"],
+            "size_categories": ["1K<n<10K"],
+            "tags": ["b2b", "tabular"],
+            "configs": [
+                {
+                    "config_name": "intro",
+                    "data_files": [
+                        {"split": "train", "path": "intro/train.parquet"},
+                        {"split": "validation", "path": "intro/valid.parquet"},
+                        {"split": "test", "path": "intro/test.parquet"},
+                    ],
+                },
+                {
+                    "config_name": "intermediate",
+                    "default": True,
+                    "data_files": [
+                        {"split": "train", "path": "intermediate/train.parquet"},
+                    ],
+                },
+            ],
+        },
+        body="# Mock\n\nA [link](https://github.com/leadforge-dev/leadforge).\n",
+    )
+
+
+# ---------------------------------------------------------------------------
+# Required field labels (the four roadmap-mandated HF checks)
+# ---------------------------------------------------------------------------
+
+
+def test_render_includes_pretty_name_and_license() -> None:
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert "TestSet: Mock HF Dataset" in html
+    assert "License: mit" in html
+    assert "Task: tabular-classification" in html
+    assert "Size: 1K&lt;n&lt;10K" in html  # HTML-escaped
+    assert "Language: en" in html
+
+
+def test_render_emits_one_chip_per_tag() -> None:
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert '<span class="chip">b2b</span>' in html
+    assert '<span class="chip">tabular</span>' in html
+
+
+def test_render_configs_dropdown_lists_every_config() -> None:
+    """The roadmap-mandated round-trip: every configs[] block from the
+    YAML appears in the rendered dropdown."""
+
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert "<code>intro</code>" in html
+    assert "<code>intermediate</code>" in html
+    assert "(2 configs)" in html
+
+
+def test_render_configs_flags_the_default() -> None:
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    # The default badge appears next to the default config.
+    assert '<span class="badge badge--default">default</span>' in html
+    # Exactly one badge instance — no other config gets it.
+    assert html.count("badge badge--default") == 1
+
+
+def test_render_data_files_appear_under_each_config() -> None:
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert "intro/train.parquet" in html
+    assert "intro/valid.parquet" in html
+    assert "intro/test.parquet" in html
+    assert "intermediate/train.parquet" in html
+
+
+def test_render_includes_cover_image_block() -> None:
+    """The HF preview must render the cover image (Copilot finding
+    COPILOT-2 — the driver was copying the cover into the preview
+    tree without ever rendering it; either drop the copy or display
+    it; we picked display for symmetry with Kaggle and because HF's
+    live page shows the dataset cover too)."""
+
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert 'class="cover"' in html
+    assert 'src="dataset-cover-image.png"' in html
+    assert 'alt="Dataset cover image"' in html
+
+
+def test_render_configs_heading_uses_singular_for_one_config() -> None:
+    """Instructor sample previously rendered "(1 configs)" — plural()
+    helper now uses the singular form when n == 1 (Copilot finding
+    COPILOT-3)."""
+
+    one_config_doc = preview.HuggingFaceDoc(
+        frontmatter={
+            "pretty_name": "T",
+            "license": "mit",
+            "configs": [
+                {
+                    "config_name": "intermediate",
+                    "default": True,
+                    "data_files": [{"split": "train", "path": "intermediate/train.parquet"}],
+                },
+            ],
+        },
+        body="body\n",
+    )
+    html = preview.render_hf_html(one_config_doc, variant="instructor")
+    assert "(1 config)" in html  # heading
+    assert "(1 split)" in html  # per-config splits count
+    assert "(1 configs)" not in html
+    assert "(1 splits)" not in html
+
+
+def test_render_does_not_emit_files_declared_section() -> None:
+    """Real HF doesn't surface a "files declared in YAML" section —
+    showing one would be an internal-concept leak that omits the bulk
+    of the actual upload tree (manifest.json, tables/*.parquet, etc.).
+    The configs dropdown already lists every YAML-declared path; a
+    parallel files section would be misleading duplicate noise.
+    Folded back from self-review pass 3.
+    """
+
+    html = preview.render_hf_html(_minimal_doc(), variant="public")
+    assert "Files declared" not in html
+    assert "files / variant:" not in html  # legacy heading text
+    assert 'class="files"' not in html
+
+
+def test_render_includes_variant_in_footer() -> None:
+    public = preview.render_hf_html(_minimal_doc(), variant="public")
+    instructor = preview.render_hf_html(_minimal_doc(), variant="instructor")
+    assert "Variant: <code>public</code>" in public
+    assert "Variant: <code>instructor</code>" in instructor
+    # Variant differences are localised to the footer; the rest of
+    # the output is identical between variants.  Replace via the
+    # full ``Variant: <code>X</code>`` marker (not the bare word)
+    # so this assertion does not match "public" inside "publication"
+    # in the footer note (regression caught + folded back during
+    # self-review pass 3 reframing).
+    public_normalised = public.replace("Variant: <code>public</code>", "Variant: <code>X</code>")
+    instructor_normalised = instructor.replace(
+        "Variant: <code>instructor</code>", "Variant: <code>X</code>"
+    )
+    assert public_normalised == instructor_normalised
+
+
+def test_render_handles_no_configs_gracefully() -> None:
+    """Edge case: a malformed dataset card with no ``configs`` should
+    still render rather than crash."""
+
+    doc = preview.HuggingFaceDoc(
+        frontmatter={"pretty_name": "X", "license": "mit"},
+        body="body\n",
+    )
+    html = preview.render_hf_html(doc, variant="public")
+    assert "No configs declared." in html
+
+
+def test_render_escapes_html_in_field_values() -> None:
+    """Same XSS-safety guard as the Kaggle preview."""
+
+    doc = preview.HuggingFaceDoc(
+        frontmatter={"pretty_name": "<script>x</script>", "license": "mit"},
+        body="body\n",
+    )
+    html = preview.render_hf_html(doc, variant="public")
+    assert "<script>x</script>" not in html
+    assert "&lt;script&gt;x&lt;/script&gt;" in html
+
+
+# ---------------------------------------------------------------------------
+# Markdown link resolution (the leakage / link-rewrite regression guard)
+# ---------------------------------------------------------------------------
+
+_HREF_RE = re.compile(r'href="([^"]+)"')
+
+
+@pytest.mark.skipif(not _PUBLIC_PRESENT, reason="public README not present")
+def test_public_readme_has_no_unrewritten_relative_links() -> None:
+    """Same source-side regression guard as the Kaggle preview."""
+
+    body = _PUBLIC_README.read_text(encoding="utf-8")
+    assert "](../" not in body, "unrewritten parent-relative link in public README"
+    assert "](validation/" not in body, "unrewritten validation-relative link in public README"
+
+
+@pytest.mark.skipif(not _PUBLIC_PRESENT, reason="public README not present")
+def test_public_rendered_links_point_at_known_targets() -> None:
+    """Every rendered href in the public preview points at one of the
+    allow-listed prefixes — anything else would 404 on the live HF
+    page."""
+
+    doc = preview.parse_hf_readme(_PUBLIC_README.read_text(encoding="utf-8"))
+    html = preview.render_hf_html(doc, variant="public")
+    bad: list[str] = []
+    for href in _HREF_RE.findall(html):
+        if any(href.startswith(prefix) for prefix in _LINK_OK_PREFIXES):
+            continue
+        bad.append(href)
+    assert not bad, f"non-allowlisted hrefs would 404 on HF: {bad[:5]}"
+
+
+@pytest.mark.skipif(not _INSTRUCTOR_PRESENT, reason="instructor README not present")
+def test_instructor_rendered_links_point_at_known_targets() -> None:
+    doc = preview.parse_hf_readme(_INSTRUCTOR_README.read_text(encoding="utf-8"))
+    html = preview.render_hf_html(doc, variant="instructor")
+    bad: list[str] = []
+    for href in _HREF_RE.findall(html):
+        if any(href.startswith(prefix) for prefix in _LINK_OK_PREFIXES):
+            continue
+        bad.append(href)
+    assert not bad, f"non-allowlisted hrefs would 404 on HF: {bad[:5]}"
+
+
+@pytest.mark.skipif(not _PUBLIC_PRESENT, reason="public README not present")
+def test_public_yaml_configs_round_trip_into_html() -> None:
+    """Every ``configs[].config_name`` declared in the YAML appears in
+    the rendered HTML — the round-trip the roadmap mandates."""
+
+    doc = preview.parse_hf_readme(_PUBLIC_README.read_text(encoding="utf-8"))
+    html = preview.render_hf_html(doc, variant="public")
+    for config in doc.frontmatter["configs"]:
+        name = config["config_name"]
+        assert f"<code>{name}</code>" in html, (
+            f"config {name!r} declared in YAML but missing from rendered HTML"
+        )
+
+
+# ---------------------------------------------------------------------------
+# Determinism + audit-artefact-sync (against committed samples)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _PUBLIC_PRESENT, reason="public README not present")
+def test_render_is_byte_deterministic() -> None:
+    doc = preview.parse_hf_readme(_PUBLIC_README.read_text(encoding="utf-8"))
+    a = preview.render_hf_html(doc, variant="public")
+    b = preview.render_hf_html(doc, variant="public")
+    assert a == b
+
+
+@pytest.mark.skipif(
+    not (_PUBLIC_PRESENT and _PUBLIC_SAMPLE.exists()),
+    reason="public README or committed sample missing",
+)
+def test_committed_public_sample_matches_fresh_regeneration() -> None:
+    """Audit-sync gate for the public variant.
+
+    Regenerate via::
+
+        python scripts/preview_hf_page.py --no-serve
+        cp release/_preview/huggingface/index.html \\
+            release/_preview_committed/huggingface_public.html
+    """
+
+    doc = preview.parse_hf_readme(_PUBLIC_README.read_text(encoding="utf-8"))
+    fresh = preview.render_hf_html(doc, variant="public")
+    committed = _PUBLIC_SAMPLE.read_text(encoding="utf-8")
+    assert fresh == committed
+
+
+@pytest.mark.skipif(
+    not (_INSTRUCTOR_PRESENT and _INSTRUCTOR_SAMPLE.exists()),
+    reason="instructor README or committed sample missing",
+)
+def test_committed_instructor_sample_matches_fresh_regeneration() -> None:
+    """Audit-sync gate for the instructor variant."""
+
+    doc = preview.parse_hf_readme(_INSTRUCTOR_README.read_text(encoding="utf-8"))
+    fresh = preview.render_hf_html(doc, variant="instructor")
+    committed = _INSTRUCTOR_SAMPLE.read_text(encoding="utf-8")
+    assert fresh == committed
+
+
+# ---------------------------------------------------------------------------
+# Driver — pre-flight error paths (no server start)
+# ---------------------------------------------------------------------------
+
+
+def _make_config(release_dir: Path, out_dir: Path, *, variant: str = "public") -> object:
+    return preview.PreviewConfig(
+        release_dir=release_dir,
+        out_dir=out_dir,
+        port=8766,
+        variant=variant,
+        open_browser=False,
+        serve=False,
+    )
+
+
+def test_run_preview_raises_on_unknown_variant(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    config = _make_config(fake_release, tmp_path / "preview", variant="bogus")
+    with pytest.raises(ValueError, match="unknown --variant"):
+        preview.run_preview(config)  # type: ignore[arg-type]
+
+
+def test_run_preview_raises_on_missing_readme(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    config = _make_config(fake_release, tmp_path / "preview")
+    with pytest.raises(FileNotFoundError, match="HF README not found"):
+        preview.run_preview(config)  # type: ignore[arg-type]
+
+
+def test_run_preview_raises_on_malformed_readme(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    (fake_release / "huggingface").mkdir(parents=True)
+    (fake_release / "huggingface" / "README.md").write_text("# No frontmatter\n", encoding="utf-8")
+    config = _make_config(fake_release, tmp_path / "preview")
+    with pytest.raises(ValueError, match="missing a YAML frontmatter"):
+        preview.run_preview(config)  # type: ignore[arg-type]
+
+
+def test_run_preview_raises_on_missing_required_frontmatter_keys(tmp_path: Path) -> None:
+    """Pre-flight required-key check (Copilot finding COPILOT-1,
+    applied symmetrically to the HF script).  Missing pretty_name /
+    license would otherwise render a half-blank header."""
+
+    fake_release = tmp_path / "release"
+    (fake_release / "huggingface").mkdir(parents=True)
+    (fake_release / "huggingface" / "README.md").write_text(
+        "---\nlanguage:\n  - en\n---\nbody\n", encoding="utf-8"
+    )
+    config = _make_config(fake_release, tmp_path / "preview")
+    with pytest.raises(ValueError, match="missing required key") as exc_info:
+        preview.run_preview(config)  # type: ignore[arg-type]
+    msg = str(exc_info.value)
+    assert "pretty_name" in msg
+    assert "license" in msg
+
+
+def test_validate_required_frontmatter_treats_empty_string_as_missing(tmp_path: Path) -> None:
+    """Whitespace-only or empty values count as missing — a blank
+    pretty_name renders an empty <h1>, which is what the validator
+    is supposed to prevent."""
+
+    with pytest.raises(ValueError, match="missing required key"):
+        preview._validate_required_frontmatter(
+            {"pretty_name": "   ", "license": ""}, tmp_path / "any.md"
+        )
+
+
+def test_run_preview_raises_on_missing_cover(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    (fake_release / "huggingface").mkdir(parents=True)
+    (fake_release / "huggingface" / "README.md").write_text(
+        "---\npretty_name: T\nlicense: mit\n---\nbody\n", encoding="utf-8"
+    )
+    config = _make_config(fake_release, tmp_path / "preview")
+    with pytest.raises(FileNotFoundError, match="cover image"):
+        preview.run_preview(config)  # type: ignore[arg-type]
+
+
+def test_run_preview_writes_html_and_copies_cover(tmp_path: Path) -> None:
+    """End-to-end no-serve: HTML lands at out_dir/index.html and the
+    cover image is copied as a real file."""
+
+    fake_release = tmp_path / "release"
+    (fake_release / "huggingface").mkdir(parents=True)
+    (fake_release / "huggingface" / "README.md").write_text(
+        "---\npretty_name: T\nlicense: mit\n---\nbody\n", encoding="utf-8"
+    )
+    cover = fake_release / "huggingface" / "dataset-cover-image.png"
+    cover.write_bytes(b"\x89PNG\r\n\x1a\nfake")
+    out_dir = tmp_path / "preview"
+    outcome = preview.run_preview(_make_config(fake_release, out_dir))  # type: ignore[arg-type]
+    assert outcome.html_path == out_dir / "index.html"
+    assert outcome.html_path.is_file()
+    assert outcome.cover_path.is_file()
+    assert not outcome.cover_path.is_symlink()
+
+
+def test_run_preview_instructor_variant_uses_companion_paths(tmp_path: Path) -> None:
+    """``--variant=instructor`` reads the companion README and writes
+    to the companion-flavoured out_dir."""
+
+    fake_release = tmp_path / "release"
+    (fake_release / "huggingface-instructor").mkdir(parents=True)
+    (fake_release / "huggingface-instructor" / "README.md").write_text(
+        "---\npretty_name: I\nlicense: mit\n---\nbody\n", encoding="utf-8"
+    )
+    cover = fake_release / "huggingface-instructor" / "dataset-cover-image.png"
+    cover.write_bytes(b"\x89PNG\r\n\x1a\nfake")
+    out_dir = tmp_path / "preview-instructor"
+    outcome = preview.run_preview(
+        _make_config(fake_release, out_dir, variant="instructor")  # type: ignore[arg-type]
+    )
+    assert outcome.html_path.is_file()
+    assert "Variant: <code>instructor</code>" in outcome.html_path.read_text(encoding="utf-8")
+
+
+def test_main_returns_2_on_missing_release(
+    tmp_path: Path, capsys: pytest.CaptureFixture[str]
+) -> None:
+    rc = preview.main(
+        [
+            "--release-dir",
+            str(tmp_path / "missing"),
+            "--out-dir",
+            str(tmp_path / "preview"),
+            "--no-serve",
+        ]
+    )
+    assert rc == 2
+    captured = capsys.readouterr()
+    assert "HF README not found" in captured.err
+
+
+def test_main_default_out_dir_depends_on_variant(tmp_path: Path) -> None:
+    """``--out-dir`` defaults to the variant-flavoured location."""
+
+    args_public = preview.parse_args(["--no-serve"])
+    args_instructor = preview.parse_args(["--no-serve", "--variant=instructor"])
+    assert args_public.out_dir is None  # resolved in main()
+    assert args_instructor.out_dir is None
+    # Sanity: ``main`` resolves the default per variant.
+    rc = preview.main(
+        [
+            "--release-dir",
+            str(tmp_path / "missing"),
+            "--variant=instructor",
+            "--no-serve",
+        ]
+    )
+    assert rc == 2  # missing README; we just want to confirm CLI parsing didn't crash
+
+
+def test_parse_args_defaults() -> None:
+    args = preview.parse_args(["--no-serve"])
+    assert args.release_dir == preview.DEFAULT_RELEASE_DIR
+    assert args.out_dir is None  # variant-resolved in main()
+    assert args.port == preview.DEFAULT_PORT
+    assert args.variant == "public"
+    assert args.open_browser is False
+    assert args.no_serve is True
+
+
+def test_parse_args_rejects_unknown_variant() -> None:
+    with pytest.raises(SystemExit):
+        preview.parse_args(["--variant=bogus"])
diff --git a/tests/scripts/test_preview_kaggle_page.py b/tests/scripts/test_preview_kaggle_page.py
new file mode 100644
index 0000000..e056a7b
--- /dev/null
+++ b/tests/scripts/test_preview_kaggle_page.py
@@ -0,0 +1,545 @@
+"""Tests for ``scripts/preview_kaggle_page.py`` (PR 7.2).
+
+Locks the local Kaggle preview-page contract:
+
+* required field labels appear in the rendered HTML (title, subtitle,
+  licence, file count, schema column count) — the four roadmap-mandated
+  Kaggle checks;
+* every Markdown link in the inlined description resolves to a
+  non-404 URL pattern (no ``](../`` survives the rewrite, no
+  ``](validation/...)`` lives at a relative path on the upload tree);
+* the Kaggle schema table lists every CSV / parquet column declared
+  in ``dataset-metadata.json::resources[].schema.fields``;
+* the renderer is byte-deterministic and the committed sample at
+  ``release/_preview_committed/kaggle.html`` matches a fresh
+  regeneration (audit-artefact-sync gate, mirrors PR 5.1 / 5.2 / 7.1);
+* the driver exits with rc=2 on missing artefacts (no live HTTP).
+
+No network. No live HTTP. Everything goes through the pure
+``render_kaggle_html()`` or the in-process ``run_preview()`` driver.
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import json
+import re
+import sys
+from pathlib import Path
+
+import pytest
+
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_SCRIPT_PATH = _REPO_ROOT / "scripts" / "preview_kaggle_page.py"
+_spec = importlib.util.spec_from_file_location("preview_kaggle_page", _SCRIPT_PATH)
+assert _spec is not None
+assert _spec.loader is not None
+preview = importlib.util.module_from_spec(_spec)
+sys.modules["preview_kaggle_page"] = preview
+_spec.loader.exec_module(preview)
+
+
+_RELEASE_DIR = _REPO_ROOT / "release"
+_COMMITTED_METADATA = _RELEASE_DIR / "kaggle" / "dataset-metadata.json"
+_COMMITTED_COVER = _RELEASE_DIR / "dataset-cover-image.png"
+_COMMITTED_SAMPLE = _REPO_ROOT / "release" / "_preview_committed" / "kaggle.html"
+_RELEASE_PRESENT = _COMMITTED_METADATA.exists()
+
+# Allow-listed link patterns the audit-sync test accepts.  Anything else
+# in the rendered description is a regression — either the source
+# README leaked a relative ``../`` link or the GitHub blob rewrite
+# stopped firing.  The whitelist is intentionally narrow.
+_LINK_OK_PREFIXES = (
+    "https://github.com/leadforge-dev/leadforge",
+    "https://huggingface.co/datasets/leadforge",
+    "https://example.com",  # used by unit tests only
+    "LICENSE",  # sibling-relative, resolves under the upload tree
+    "#",  # in-document anchor (footnotes, etc.)
+)
+
+
+# ---------------------------------------------------------------------------
+# Pure-renderer fixtures
+# ---------------------------------------------------------------------------
+
+
+def _minimal_metadata() -> dict[str, object]:
+    """A minimum-viable metadata payload exercising every renderer
+    branch (header pills, file tree, schema table, sources, footer)."""
+
+    return {
+        "title": "TestSet: Lead Scoring Mock",
+        "id": "testorg/testset-lead-scoring",
+        "subtitle": "A mock metadata payload exercising the renderer.",
+        "description": (
+            "# Mock dataset\n\n"
+            "This is a [test link](https://github.com/leadforge-dev/leadforge).\n\n"
+            "| Col | Notes |\n|---|---|\n| a | b |\n"
+        ),
+        "isPrivate": True,
+        "licenses": [{"name": "MIT"}],
+        "keywords": ["b2b", "tabular"],
+        "collaborators": [],
+        "expectedUpdateFrequency": "never",
+        "userSpecifiedSources": [
+            {"title": "source repo", "url": "https://github.com/leadforge-dev/leadforge"},
+        ],
+        "image": "dataset-cover-image.png",
+        "resources": [
+            {
+                "path": "intro/lead_scoring.csv",
+                "description": "Intro flat CSV.",
+                "schema": {
+                    "fields": [
+                        {"name": "lead_id", "type": "string", "description": "Opaque id."},
+                        {"name": "label", "type": "boolean", "description": "Outcome."},
+                    ]
+                },
+            },
+            {
+                "path": "intro/manifest.json",
+                "description": "Provenance manifest (no schema).",
+            },
+        ],
+    }
+
+
+# ---------------------------------------------------------------------------
+# Required field labels (one of the four roadmap-mandated Kaggle checks)
+# ---------------------------------------------------------------------------
+
+
+def test_render_includes_title_subtitle_id_and_license() -> None:
+    html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    assert "TestSet: Lead Scoring Mock" in html
+    assert "A mock metadata payload exercising the renderer." in html
+    assert "testorg/testset-lead-scoring" in html
+    assert "License: MIT" in html
+    assert "Updates: never" in html
+
+
+def test_render_does_not_include_visibility_pill() -> None:
+    """Kaggle's public page does NOT display ``isPrivate``; rendering
+    a ``Visibility:`` pill in the preview would misrepresent what
+    public viewers see (folded back in self-review pass 3)."""
+
+    private_html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    public_html = preview.render_kaggle_html(
+        {**_minimal_metadata(), "isPrivate": False},
+        "dataset-cover-image.png",
+    )
+    for html in (private_html, public_html):
+        assert "Visibility:" not in html
+        assert "pill--visibility" not in html
+
+
+def test_render_file_tree_lists_every_resource_path() -> None:
+    """File tree shows every resource path declared in metadata."""
+
+    html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    assert "intro/lead_scoring.csv" in html
+    assert "intro/manifest.json" in html
+    assert "(2 total)" in html  # file count appears in the heading
+
+
+def test_render_schema_table_lists_every_column() -> None:
+    """The schema table lists every column from every tabular resource."""
+
+    html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    assert "<code>lead_id</code>" in html
+    assert "<code>label</code>" in html
+    assert "Opaque id." in html
+    assert "(2 columns)" in html  # per-table column count
+    # Resources without a schema (manifest.json) do not appear in the table.
+    # Note singular "tabular file" — the plural() helper kicks in only when
+    # n != 1 (Copilot finding COPILOT-3).
+    assert "(2 columns across 1 tabular file)" in html
+
+
+def test_render_keywords_appear_as_chips_in_footer() -> None:
+    html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    assert '<span class="chip">b2b</span>' in html
+    assert '<span class="chip">tabular</span>' in html
+
+
+def test_render_sources_block_renders_when_present() -> None:
+    html = preview.render_kaggle_html(_minimal_metadata(), "dataset-cover-image.png")
+    assert "source repo" in html
+    assert 'href="https://github.com/leadforge-dev/leadforge"' in html
+
+
+def test_render_sources_block_omitted_when_empty() -> None:
+    metadata = {**_minimal_metadata(), "userSpecifiedSources": []}
+    html = preview.render_kaggle_html(metadata, "dataset-cover-image.png")
+    assert '<h2 class="section__heading">Sources</h2>' not in html
+
+
+def test_render_escapes_html_in_field_values() -> None:
+    """User-controlled strings are HTML-escaped — guards against XSS
+    if a recipe ever surfaces ``<script>`` in a description."""
+
+    metadata = {**_minimal_metadata(), "title": "evil <script>alert(1)</script>"}
+    html = preview.render_kaggle_html(metadata, "dataset-cover-image.png")
+    assert "<script>alert(1)</script>" not in html
+    assert "&lt;script&gt;" in html
+
+
+# ---------------------------------------------------------------------------
+# Schema-fields exhaustiveness (audit-style, against committed metadata)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_PRESENT, reason="release bundles not present")
+def test_committed_metadata_schema_is_fully_listed() -> None:
+    """The roadmap-mandated check: the Kaggle schema table lists every
+    CSV / parquet column declared in dataset-metadata.json."""
+
+    metadata = json.loads(_COMMITTED_METADATA.read_text(encoding="utf-8"))
+    html = preview.render_kaggle_html(metadata, metadata["image"])
+    for resource in metadata["resources"]:
+        schema = resource.get("schema")
+        if not schema:
+            continue
+        for field in schema["fields"]:
+            name = field["name"]
+            # Every column name appears as a ``<code>`` cell in the table.
+            assert f"<code>{name}</code>" in html, (
+                f"schema column {name!r} from {resource['path']!r} not rendered"
+            )
+
+
+# ---------------------------------------------------------------------------
+# Markdown link resolution (the leakage / link-rewrite regression guard)
+# ---------------------------------------------------------------------------
+
+#: Match ``href="X"`` in the rendered HTML — markdown-it-py emits
+#: double-quoted hrefs.  Inline ``](X)`` would slip past this and stay
+#: as escaped text rather than a real link, so we also assert against
+#: those separately.
+_HREF_RE = re.compile(r'href="([^"]+)"')
+
+
+@pytest.mark.skipif(not _RELEASE_PRESENT, reason="release bundles not present")
+def test_committed_metadata_description_has_no_unrewritten_relative_links() -> None:
+    """Source-side regression guard.
+
+    The Kaggle packager runs ``rewrite_release_links()`` on the
+    inlined README; if a future README adds a ``](../foo)`` link or a
+    ``](validation/...)`` link AND someone updates the rewriter to
+    miss it, the rendered description would carry a 404-bound href.
+    Catch it here, before the publish runbook.
+    """
+
+    metadata = json.loads(_COMMITTED_METADATA.read_text(encoding="utf-8"))
+    description = metadata["description"]
+    # Source-form check: no parent-relative or validation-relative
+    # markdown links remain in the inlined description.
+    assert "](../" not in description, (
+        "unrewritten parent-relative markdown link in inlined description"
+    )
+    assert "](validation/" not in description, (
+        "unrewritten validation-relative markdown link in inlined description"
+    )
+
+
+@pytest.mark.skipif(not _RELEASE_PRESENT, reason="release bundles not present")
+def test_committed_metadata_rendered_links_point_at_known_targets() -> None:
+    """Every rendered href in the description body points at one of:
+
+    * a GitHub blob URL (the rewriter's output);
+    * a known external service (huggingface.co/datasets/leadforge);
+    * a sibling-relative path that resolves under the upload tree
+      (LICENSE), or an in-document anchor (#footnote-1 etc.).
+
+    Anything else is a 404 risk on the live page.
+    """
+
+    metadata = json.loads(_COMMITTED_METADATA.read_text(encoding="utf-8"))
+    html = preview.render_kaggle_html(metadata, metadata["image"])
+    bad: list[str] = []
+    for href in _HREF_RE.findall(html):
+        if any(href.startswith(prefix) for prefix in _LINK_OK_PREFIXES):
+            continue
+        bad.append(href)
+    assert not bad, (
+        f"rendered HTML carries non-allowlisted hrefs that would 404 on Kaggle: {bad[:5]}"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Determinism + audit-artefact-sync (against committed sample)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_PRESENT, reason="release bundles not present")
+def test_render_is_byte_deterministic() -> None:
+    """Two back-to-back renders against the same metadata produce
+    byte-identical HTML — the determinism contract this script relies
+    on for the sync test below."""
+
+    metadata = json.loads(_COMMITTED_METADATA.read_text(encoding="utf-8"))
+    a = preview.render_kaggle_html(metadata, metadata["image"])
+    b = preview.render_kaggle_html(metadata, metadata["image"])
+    assert a == b
+
+
+@pytest.mark.skipif(
+    not (_RELEASE_PRESENT and _COMMITTED_SAMPLE.exists()),
+    reason="release bundles or committed preview sample missing",
+)
+def test_committed_sample_matches_fresh_regeneration() -> None:
+    """The audit-artefact-sync gate.
+
+    A fresh render of the committed Kaggle metadata must equal
+    ``release/_preview_committed/kaggle.html`` byte-for-byte.  If
+    this fails, either the renderer changed or the upstream metadata
+    drifted without re-running the preview script.  Regenerate via::
+
+        python scripts/preview_kaggle_page.py --no-serve
+        cp release/_preview/kaggle/index.html release/_preview_committed/kaggle.html
+    """
+
+    metadata = json.loads(_COMMITTED_METADATA.read_text(encoding="utf-8"))
+    fresh = preview.render_kaggle_html(metadata, metadata["image"])
+    committed = _COMMITTED_SAMPLE.read_text(encoding="utf-8")
+    assert fresh == committed
+
+
+# ---------------------------------------------------------------------------
+# Driver — pre-flight error paths (no server start)
+# ---------------------------------------------------------------------------
+
+
+def test_run_preview_raises_on_missing_metadata(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    config = preview.PreviewConfig(
+        release_dir=fake_release,
+        out_dir=tmp_path / "preview",
+        port=8765,
+        open_browser=False,
+        serve=False,
+    )
+    with pytest.raises(FileNotFoundError, match="dataset metadata not found"):
+        preview.run_preview(config)
+
+
+def test_run_preview_raises_on_malformed_metadata(tmp_path: Path) -> None:
+    fake_release = tmp_path / "release"
+    (fake_release / "kaggle").mkdir(parents=True)
+    (fake_release / "kaggle" / "dataset-metadata.json").write_text(
+        '"not-an-object"', encoding="utf-8"
+    )
+    config = preview.PreviewConfig(
+        release_dir=fake_release,
+        out_dir=tmp_path / "preview",
+        port=8765,
+        open_browser=False,
+        serve=False,
+    )
+    with pytest.raises(ValueError, match="not a JSON object"):
+        preview.run_preview(config)
+
+
+def test_run_preview_raises_on_missing_required_metadata_keys(tmp_path: Path) -> None:
+    """Pre-flight required-key check (Copilot finding COPILOT-1).
+
+    The renderer's _render_header / _render_footer / _render_cover
+    index ``title`` / ``subtitle`` / ``id`` / ``image`` /
+    ``licenses[0].name`` / ``expectedUpdateFrequency`` directly; a
+    malformed metadata file would otherwise raise ``KeyError``
+    mid-render and bypass main()'s rc=2 translation.  The validator
+    surfaces every missing key in one message, not just the first.
+    """
+
+    fake_release = tmp_path / "release"
+    (fake_release / "kaggle").mkdir(parents=True)
+    # Drop several required keys at once.
+    (fake_release / "kaggle" / "dataset-metadata.json").write_text(
+        json.dumps(
+            {
+                "subtitle": "only the subtitle survives",
+                "licenses": [{"NOT_NAME": "MIT"}],  # malformed: no 'name' inside [0]
+                "image": "dataset-cover-image.png",
+            }
+        ),
+        encoding="utf-8",
+    )
+    config = preview.PreviewConfig(
+        release_dir=fake_release,
+        out_dir=tmp_path / "preview",
+        port=8765,
+        open_browser=False,
+        serve=False,
+    )
+    with pytest.raises(ValueError, match="missing required key") as exc_info:
+        preview.run_preview(config)
+    msg = str(exc_info.value)
+    # All four missing keys reported in one error, alphabetised.
+    assert "expectedUpdateFrequency" in msg
+    assert "id" in msg
+    assert "title" in msg
+    assert "licenses[0].name" in msg
+
+
+def test_validate_required_metadata_accepts_well_formed_payload(tmp_path: Path) -> None:
+    """Sanity gate the validator does not over-fire on the canonical fixture."""
+
+    preview._validate_required_metadata(_minimal_metadata(), tmp_path / "any.json")
+
+
+def test_run_preview_raises_on_missing_cover_image(tmp_path: Path) -> None:
+    """A well-formed metadata payload that points at a missing cover
+    image surfaces FileNotFoundError, not a required-key ValueError.
+
+    The required-key validator (Copilot finding COPILOT-1) runs
+    BEFORE the cover-existence check, so the fixture must include
+    every required key for this assertion to test the cover-path
+    rather than the validator.
+    """
+
+    fake_release = tmp_path / "release"
+    (fake_release / "kaggle").mkdir(parents=True)
+    well_formed = {
+        **_minimal_metadata(),
+        "image": "missing.png",  # the file does not exist on disk
+    }
+    (fake_release / "kaggle" / "dataset-metadata.json").write_text(
+        json.dumps(well_formed), encoding="utf-8"
+    )
+    config = preview.PreviewConfig(
+        release_dir=fake_release,
+        out_dir=tmp_path / "preview",
+        port=8765,
+        open_browser=False,
+        serve=False,
+    )
+    with pytest.raises(FileNotFoundError, match="cover image"):
+        preview.run_preview(config)
+
+
+def test_run_preview_writes_html_and_copies_cover(tmp_path: Path) -> None:
+    """End-to-end no-serve path: HTML lands at ``out_dir/index.html``;
+    cover image is copied as a real file (not a symlink)."""
+
+    fake_release = tmp_path / "release"
+    (fake_release / "kaggle").mkdir(parents=True)
+    cover_src = fake_release / "kaggle" / "dataset-cover-image.png"
+    cover_src.write_bytes(b"\x89PNG\r\n\x1a\nfake")
+    (fake_release / "kaggle" / "dataset-metadata.json").write_text(
+        json.dumps(_minimal_metadata()), encoding="utf-8"
+    )
+    out_dir = tmp_path / "preview"
+    outcome = preview.run_preview(
+        preview.PreviewConfig(
+            release_dir=fake_release,
+            out_dir=out_dir,
+            port=8765,
+            open_browser=False,
+            serve=False,
+        )
+    )
+    assert outcome.html_path == out_dir / "index.html"
+    assert outcome.html_path.is_file()
+    assert outcome.cover_path.is_file()
+    assert not outcome.cover_path.is_symlink()
+    # The HTML references the cover image by sibling-relative name.
+    assert 'src="dataset-cover-image.png"' in outcome.html_path.read_text(encoding="utf-8")
+
+
+def test_main_returns_2_on_missing_release(
+    tmp_path: Path, capsys: pytest.CaptureFixture[str]
+) -> None:
+    rc = preview.main(
+        [
+            "--release-dir",
+            str(tmp_path / "missing"),
+            "--out-dir",
+            str(tmp_path / "preview"),
+            "--no-serve",
+        ]
+    )
+    assert rc == 2
+    captured = capsys.readouterr()
+    assert "dataset metadata not found" in captured.err
+
+
+def test_parse_args_defaults() -> None:
+    """``parse_args`` is a free function so tests can exercise the
+    flag wiring without invoking the full driver."""
+
+    args = preview.parse_args(["--no-serve"])
+    assert args.release_dir == preview.DEFAULT_RELEASE_DIR
+    assert args.out_dir == preview.DEFAULT_OUT_DIR
+    assert args.port == preview.DEFAULT_PORT
+    assert args.open_browser is False
+    assert args.no_serve is True
+
+
+def test_tier_of_extracts_leading_path_segment() -> None:
+    """``_tier_of`` is the load-bearing helper that buckets resources
+    by tier in the file tree — pin its contract."""
+
+    assert preview._tier_of("intro/lead_scoring.csv") == "intro"
+    assert preview._tier_of("intermediate/tasks/converted/train.parquet") == "intermediate"
+    assert preview._tier_of("toplevel.json") == ""
+
+
+# ---------------------------------------------------------------------------
+# Server smoke test — covers _preview_common.make_server / serve glue
+# (folded back from self-review pass 3 — _serve was previously untested)
+# ---------------------------------------------------------------------------
+
+
+def test_plural_helper_handles_singular_zero_and_n() -> None:
+    """``_preview_common.plural`` is the one helper behind every count
+    heading in both preview scripts.  Pin n=1 → singular, n=0/2/N →
+    plural (Copilot finding COPILOT-3 — instructor sample previously
+    rendered "(1 configs)" because the plural was always ``+ 's'``)."""
+
+    import _preview_common  # noqa: PLC0415 — local import for the helper test
+
+    assert _preview_common.plural(1, "config") == "1 config"
+    assert _preview_common.plural(2, "config") == "2 configs"
+    assert _preview_common.plural(0, "config") == "0 configs"  # zero is plural in English
+    assert _preview_common.plural(1, "tabular file") == "1 tabular file"
+    assert _preview_common.plural(5, "tabular file") == "5 tabular files"
+    # Irregular plural form is supported via explicit override (none today).
+    assert _preview_common.plural(1, "child", "children") == "1 child"
+    assert _preview_common.plural(3, "child", "children") == "3 children"
+
+
+def test_make_server_binds_and_serves_index(tmp_path: Path) -> None:
+    """Stand the server up on port 0 (kernel-picked), GET ``/``,
+    assert 200 + body shape, shut down cleanly.
+
+    Covers every path inside ``_preview_common.make_server`` and
+    ``_make_handler_factory`` (handler subclass with ``directory=``,
+    ``ThreadingHTTPServer`` instantiation, address-reuse posture,
+    static-file serving).  ``serve`` itself is the blocking caller
+    that wraps this and is exercised manually.
+    """
+
+    import threading
+    import urllib.request
+
+    import _preview_common  # noqa: PLC0415 — local import for the smoke test
+
+    (tmp_path / "index.html").write_text(
+        "<html><body><h1>preview-smoke-token</h1></body></html>", encoding="utf-8"
+    )
+    httpd = _preview_common.make_server(tmp_path, port=0)
+    bound_port = httpd.server_address[1]
+    assert bound_port > 0
+    thread = threading.Thread(target=httpd.serve_forever, daemon=True)
+    thread.start()
+    try:
+        with urllib.request.urlopen(f"http://localhost:{bound_port}/", timeout=5) as resp:  # noqa: S310 — localhost smoke
+            assert resp.status == 200
+            body = resp.read().decode("utf-8")
+        assert "preview-smoke-token" in body
+    finally:
+        httpd.shutdown()
+        httpd.server_close()
+        thread.join(timeout=5)
+    assert not thread.is_alive()