Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agent-plan.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -233,3 +233,9 @@ release/huggingface/*
!release/huggingface/README.md
release/huggingface-instructor/*
!release/huggingface-instructor/README.md

# Generated local preview-page output (PR 7.2) — runtime HTML rendered
# by scripts/preview_{kaggle,hf}_page.py. The committed sample HTML
# under release/_preview_committed/ is the audit-artefact-sync gate
# and is checked into git separately.
release/_preview/
68 changes: 68 additions & 0 deletions docs/release/preview_pages_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# PR 7.2 — Local Kaggle / HF preview-page design notes

Working notes for `scripts/preview_kaggle_page.py`,
`scripts/preview_hf_page.py`, the shared `scripts/_preview_common.py`,
their tests, and the committed sample HTML used as the
regeneration-discipline gate. Captured before implementation; revised
after self-review pass 3 to be honest about scope. Kept short on
purpose.

The PR's pedagogical role is the *staging gate* before PR 7.3: the
maintainer renders both platforms locally from the same artefacts the
publish PR will upload, clicks through them in a browser, and catches
link / config / column-listing issues before they hit cached
previews on the live page.

This is a **publication-readiness preview**, not a Kaggle / HF
look-alike. Pixel fidelity is explicitly out of scope; the chrome
(CSS palette, layout) is approximate. The tool's job is structured
rendering of the upload artefacts so a maintainer can verify the
content; visual brand matching is not its job.

## Decisions

| # | Decision | Why |
|---|---|---|
| 1 | Two scripts, one per platform, sharing `scripts/_preview_common.py` for `escape` / `make_server` / `serve`. | Inputs are different (`dataset-metadata.json` vs YAML-frontmatter README) and page structures differ enough that one renderer per platform reads better. The 3-helper shared module replaces what was duplicated byte-for-byte. |
| 2 | Server: stdlib `http.server.ThreadingHTTPServer` via the shared `make_server(directory, port) -> ThreadingHTTPServer`. | `ThreadingHTTPServer` inherits `allow_reuse_address=True` from `HTTPServer` (bare `socketserver.ThreadingTCPServer` does not — Ctrl-C → re-run within ~60s would TIME_WAIT). The `make_server` seam is what lets the tests bind on port 0, GET `/`, and shut down cleanly without forking a subprocess. |
| 3 | Templates: f-string helpers, not Jinja2. | Layout is layout-stable; two pages don't justify a templating engine. |
| 4 | Markdown→HTML via `markdown-it-py` (in `[dev]` AND `[publish]`). `gfm-like` preset; `linkify` disabled to avoid the `linkify-it-py` transitive dep. | The dep is small and pure-Python; the renderer is the test surface (not a smoke), so gating it behind `[publish]` would mean CI's `[dev]`-only test job ImportErrors on every render. Listed in both extras so neither path breaks. |
| 5 | Output dir: `release/_preview/<platform>/` (gitignored). Committed regeneration samples at `release/_preview_committed/{kaggle,huggingface_public,huggingface_instructor}.html`. | Mirrors `release/_release_quality/` convention. The committed samples double as human-inspectable rendered output for code reviewers who don't want to install the dep and run the script. |
| 6 | Cover image copied into the preview tree (sibling-relative `<img src=>`). | Both platforms inline-display the cover image; serving it under the preview root means the rendered HTML works without absolute paths. |
| 7 | HF `--variant=public|instructor` reads either `release/huggingface/README.md` or `release/huggingface-instructor/README.md`. Kaggle has no instructor variant. | Matches the PR 5.2 publish reality. |
| 8 | CLI mirrors `validate_release_candidate.py` / `run_llm_critique.py`: free-function `parse_args`, frozen `Config`, `run_preview(config) -> Outcome`, `main(argv) -> int`. Exit codes 0 / 2. Flags: `--release-dir`, `--port` (8765 Kaggle / 8766 HF), `--out-dir`, `--variant` (HF only), `--open-browser`, `--no-serve`. | Maintainer muscle memory + small surface. `--no-serve` is the CI / inspection mode; `--open-browser` pops a tab on startup. |
| 9 | The byte-equality test against the committed sample is a **regeneration-discipline gate**, NOT a renderer audit. The renderer-audit work is done by the structural tests (schema-column exhaustiveness, link allow-list, configs round-trip), each of which compares rendered output against an independent source of truth. | A bug in the renderer propagates to both the committed sample (regenerated by the same code) and the test, so byte-equality alone catches "someone forgot to regenerate", not correctness. Calling it "audit-artefact-sync" oversells; the structural tests are the real audits. |
| 10 | Test posture: in-process. No live HTTP for the rendering tests; `_preview_common.make_server` is exercised by a single port-0 smoke test that GETs `/` and shuts down. The render functions are pure and tested via substring + regex on the rendered string. | No new test deps (no BeautifulSoup). Substring assertions on deterministic rendered HTML give the same coverage with less surface. The smoke test covers the previously-untested server glue. |
| 11 | Visibility pill (Kaggle) NOT rendered. HF "Files declared in YAML" section NOT rendered. | Both were fidelity bugs caught in self-review pass 3. Kaggle's public page does not display `isPrivate` — showing it would mislead the maintainer about what public viewers see. HF's "Files declared in YAML" surfaced an internal concept (the configs[].data_files paths) that the configs dropdown already lists, while omitting most of the actual upload tree (manifest.json, tables/, feature_dictionary.csv, …). |

## Link-resolution rule (test pin)

Every Markdown link `](URL)` in the README body the renderer ingests
must satisfy ONE of:

1. Absolute `https://github.com/leadforge-dev/leadforge/...` URL (the
rewrite output of `_release_common.py::rewrite_release_links()`).
2. External absolute URL on a known-OK domain (`https://huggingface.co`).
3. Relative path that resolves to a file under the upload tree
(e.g. `LICENSE` → `release/<platform>/LICENSE`).
4. In-document anchor (`#footnote-1` etc.).

A `](../foo)` or `](validation/...)` link in the rendered HTML is a
regression — those are exactly what the platform packagers' rewrite
is supposed to canonicalise away. The test fires the moment the
rewrite stops doing its job for the upstream artefact the preview
renders.

## What this PR does not touch

- `BUNDLE_SCHEMA_VERSION` stays at 5.
- `release/validation/validation_report.{json,md}` does not regenerate
(revert any timestamp drift before commit).
- PR 7.3 (publish + tag) is a separate PR; the runbook there will cite
the two preview commands as a required pre-flight step.
- No change to the platform packagers (`scripts/package_{kaggle,hf}_release.py`)
or `_release_common.py`. The preview reads what the packagers wrote.
- Live Kaggle / HF API calls — pure local rendering only.
- Pixel-perfect cloning of the live pages. The bar is structured
rendering of the upload artefacts; visual brand matching is out of
scope.
20 changes: 19 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@ dev = [
"types-pyyaml>=6.0",
"scikit-learn>=1.3",
"matplotlib>=3.7",
# PR 7.2: the preview-page renderers (scripts/preview_{kaggle,hf}_page.py)
# call into markdown-it-py at test time via render_*_html(). Keeping
# the dep here as well as in [publish] means CI's "test" job (which
# installs only [dev]) does not ImportError mid-test. pytest.importorskip
# would also work, but the rendering tests are the primary coverage of
# this PR — gating them off would defeat the purpose.
"markdown-it-py>=3.0",
]
scripts = [
"scikit-learn>=1.3",
Expand All @@ -49,10 +56,14 @@ scripts = [
# this extra (``pip install -e ".[publish]"``) enables the gated
# ``load_dataset()`` / Kaggle-CLI smoke tests that verify G11.3 (Kaggle
# package) and G12.3 / G12.4 (HF load_dataset round-trip) without
# pulling the heavy SDKs into the default dev install.
# pulling the heavy SDKs into the default dev install. PR 7.2 adds
# ``markdown-it-py`` for the local Kaggle / HF preview pages
# (``scripts/preview_{kaggle,hf}_page.py``) — same publish-extra
# posture, missing import raises a clean error pointing at this extra.
publish = [
"datasets>=2.14",
"kaggle>=1.6",
"markdown-it-py>=3.0",
]
# Optional dependencies for executing the public release notebooks.
# Installing this extra (``pip install -e ".[notebooks]"``) enables the
Expand Down Expand Up @@ -103,6 +114,13 @@ select = ["E", "F", "I", "N", "W", "UP", "B", "C4", "PT", "S"]
# Line length is a property of the rendered cell, not the .py source,
# so 100c is the wrong yardstick here.
"scripts/build_release_notebook_*.py" = ["E501"]
# Preview-page scripts (PR 7.2) carry inlined CSS + multi-attribute
# HTML strings inside f-string templates; the rendered HTML is the
# product, so wrapping the source CSS at 100c is line noise.
"scripts/preview_kaggle_page.py" = ["E501"]
"scripts/preview_hf_page.py" = ["E501"]
# _preview_common is plain Python (no inline HTML / CSS); leaving
# E501 enabled.

[tool.mypy]
python_version = "3.11"
Expand Down
Loading
Loading