Skip to content

PR 6.1: notebook 01 refresh + notebook 02 relational feature engineering + G13.1 CI gate#73

Merged
shaypal5 merged 6 commits into
mainfrom
feat/notebooks-baseline-and-relational
May 7, 2026
Merged

PR 6.1: notebook 01 refresh + notebook 02 relational feature engineering + G13.1 CI gate#73
shaypal5 merged 6 commits into
mainfrom
feat/notebooks-baseline-and-relational

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 6, first PR. Two notebooks land together with a new CI gate and helper test suite. Updated for the rev-2 + rev-3 self-review fixes (rev-3 = the Copilot pass; see revisions below).

  • Notebook 01 (refresh): release/notebooks/01_baseline_lead_scoring.ipynb actually reproduces the intermediate-tier metrics in release/validation/validation_report.json (G13.2) by using the same feature set as the report — drops only IDs and the label, mirrors release_quality._partition_columns. That means notebook 01 keeps total_touches_all (the documented leakage trap); a narrative cell calls this out explicitly and forward-points to notebook 03 (PR 6.2), which dissects what dropping the trap does to performance. Reports AUC / AP / P@K(50/100/200) / Brier / top_decile_rate (the metric the report tracks); assert_within_tolerance fails loud on drift before rendering the decile-lift bar chart and 10-bin reliability diagram. Hard-codes BUNDLE = Path("../intermediate") and asserts manifest.exposure_mode == "student_public" so the public-only path discipline (G13.3) is enforced inline. Targets are loaded at runtime from release/notebooks/_release_targets.json (audit-synced against validation_report.json); per-metric tolerances replace the original flat ±0.05 — AUC/Brier ±0.02, AP / top-decile ±0.05.
  • Notebook 02 (new): release/notebooks/02_relational_feature_engineering.ipynb loads the seven snapshot-safe public tables (customers / subscriptions are not in public bundles per BANNED_TABLES), asserts every timestamp <= lead_created_at + snapshot_day for all four event tables (with real min-headroom-under-cutoff readings, not a hardcoded literal), demonstrates four legal joins (touch-channel breakdown, account-level density fit on train leads only, sales-activity recency, train-only industry target encoding), trains LR + GBM on flat-baseline-only and flat+relational features, and prints a 4-row metric panel + delta panel. Drops the leakage trap so the relational lift attribution stays clean (different goal from notebook 01: this notebook is teaching feature engineering, not reproducing the report). Pinned via assert_within_tolerance on the four model AUCs and the headline GBM(eng) − GBM(flat) lift, plus a sign-aware assert lift > 0. Honest takeaway cell frames the +0.0147 AUC lift as suggestive, not conclusive — the cross-seed gbm_auc spread on this bundle is ~0.027.
  • Helpers + tests: release/notebooks/_notebook_utils.py ships inside the public bundle alongside the parquet tables — Kaggle/HF consumers download both together, so the notebooks can't depend on leadforge being installed. Provides precision_at_k and top_decile_rate (mirror release_quality byte-for-byte; locked in by mirror tests) and assert_within_tolerance (hardened against silent-pass paths: non-finite observed/target fail loudly; per-metric tolerance maps must declare every key in target — missing entries raise instead of defaulting to +inf). 18 unit tests under tests/release/notebooks/test_notebook_utils.py cover ties, invalid-k, per-metric vs scalar tolerances, missing keys, label propagation, multi-failure aggregation, extra-observed-key tolerance, NaN/inf failure modes, and the two release_quality mirror tests.
  • CI gate (G13.1): new [notebooks] extra in pyproject.toml (nbclient, nbformat, scikit-learn, matplotlib) keeps the dev install lean. New notebooks job in .github/workflows/ci.yml installs [dev,scripts,notebooks], regenerates the intermediate public bundle via python scripts/build_public_release.py release --tier intermediate (only tier the notebooks need; new flag), and runs tests/release/notebooks/test_execute_notebooks.py — a parametrised nbclient end-to-end execution gated on bundles-present (matches test_package_* skip pattern).
  • Builders: scripts/build_release_notebook_{01,02}.py (sharing scripts/_release_notebook_common.py) are nbformat-driven one-shot builders so notebook source stays reviewable in the diff (instead of hand-edited JSON). Cell IDs are deterministic (cell_NNN) so re-running the builder produces a byte-identical .ipynb. Each builder accepts --out PATH so the byte-stability test can build into tmp_path rather than mutate the committed file. Audit-artifact-sync is enforced by tests/scripts/test_release_notebook_builders.py, which builds twice into tmp_path and diffs against the committed file (no working-tree mutation, parallel-safe under pytest-xdist). Each builder shells out to ruff format on its emitted file so builder output and the project's pre-commit hook agree byte-for-byte.

Per-file ruff ignore for S101 on release/notebooks/**/*.ipynb (G13 contract asserts are deliberate) and for E501 on scripts/build_release_notebook_*.py (dedented heredocs render as notebook content; 100c is the wrong yardstick).

Revisions

  • rev-1 — initial drop (notebook 01 dropped the trap; flat ±0.05 tolerance; nondeterministic cell IDs; one builder per file).
  • rev-2 (self-review pass) — folded in the brutal self-review I posted on this PR before requesting human review. Fixes: deterministic cell IDs, real min-headroom receipt in notebook 02 cell 6, account_density fit on train leads only, notebook 02 metric gate, per-metric tolerances, JSON-loaded targets + top_decile_rate swap, audit-sync test, precision_at_k mirror test, builder de-duplication, industry redundancy fix, --tier filter on build_public_release.py, builder rename (drop _ prefix), .values.to_numpy(), broken cross-references rewritten as plain text, the agent-plan + PR description rewritten to actually match the code.
  • rev-3 (Copilot pass) — folded in Copilot's six review threads: builders accept --out PATH so the byte-stability test never touches the committed file (race-safe and worktree-safe under interrupted runs); assert_within_tolerance rejects non-finite values and incomplete per-metric tolerance maps instead of silently passing; new top_decile_rate mirror test (parallel to precision_at_k); 4 new tests covering NaN/inf observed, NaN target, and missing-tolerance-key failure modes; agent-plan + PR body brought back into sync (this rewrite).

Acceptance gate results on this branch (rev-3 head)

  • ruff check . / ruff format --check . — clean
  • mypy leadforge/ — clean (82 source files)
  • pytest1255 passed, 5 skipped (publish-extra-gated HF/Kaggle round-trips)
  • python scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.650/3 paths flag rate on every tier
  • python scripts/verify_hash_determinism.pyPASS 67/67
  • python scripts/validate_release_candidate.py --no-rebuildPASS, 0 leakage findings
  • BUNDLE_SCHEMA_VERSION — unchanged at 5 (no bundle-shape changes in this PR)

Test plan

  • CI lint/typecheck/tests jobs pass on Python 3.11 + 3.12.
  • CI notebooks job runs the new gate end-to-end (regenerate intermediate bundle via --tier intermediate → execute both notebooks → tolerance assertions hold).
  • pip install -e ".[notebooks]" works on a clean Python 3.11+ env.
  • Open both notebooks in Jupyter Lab from release/notebooks/ cwd; "Run All" succeeds; output cells match committed metrics within tolerance.
  • Self-review on the diff (followed by Copilot pass; see revisions above).

🤖 Generated with Claude Code

shaypal5 and others added 2 commits May 7, 2026 07:38
…ing + G13.1 CI gate

Refreshes ``release/notebooks/01_baseline_lead_scoring.ipynb`` and adds
``release/notebooks/02_relational_feature_engineering.ipynb``, the first
two notebooks of the Phase 6 sequence.  Both ship inside the public
bundle alongside the parquet tables (Kaggle/HF consumers download them
together), so they import a sibling ``_notebook_utils.py`` rather than
relying on the ``leadforge`` package being installed.

Notebook 01 (refresh) — Reproduces ``release/validation/validation_report.md``
intermediate-tier metrics within +/-0.05 (G13.2): trains LR + HistGBM on
the public ``intermediate`` bundle, reports AUC / AP / P@K(50/100/200) /
Brier, drops the documented ``total_touches_all`` leakage trap (still
mentions it for pedagogy + forward-points to notebook 03), then calls
``assert_within_tolerance`` to fail loud on drift before rendering the
decile-lift bar chart and 10-bin reliability diagram.  Hard-codes
``BUNDLE = Path("../intermediate")`` and asserts
``manifest.exposure_mode == "student_public"`` so the public-only path
discipline (G13.3) is enforced inline.

Notebook 02 (new) — Loads the seven snapshot-safe public tables
(``customers`` / ``subscriptions`` are not in public bundles per
``BANNED_TABLES``), asserts every ``timestamp <= lead_created_at +
snapshot_day`` for all four event tables, demonstrates four legal joins
(touch-channel breakdown, account-level density, sales-activity recency,
train-only industry target encoding), trains LR + GBM on
flat-baseline-only and flat+relational features, and prints a
4-row metric panel + delta panel.  The honest takeaway cell quotes
G7.4.4: GBM(eng)-GBM(flat) is the headline lift, GBM(eng)-LR(flat) is
still slightly negative, consistent with the v1 finding that flat-CSV
LR is hard to beat.

Helpers + tests:
- ``release/notebooks/_notebook_utils.py``: ``precision_at_k`` (stable
  argsort, mirrors ``release_quality._precision_at_k``), ``top_decile_rate``,
  ``assert_within_tolerance`` (per-metric or scalar tolerance, aggregates
  failures).  Imported by both notebooks via
  ``sys.path.insert(0, str(Path.cwd())); from _notebook_utils import ...``.
- ``tests/release/notebooks/test_notebook_utils.py``: 12 unit tests on
  the helpers (loaded via ``importlib.util`` since the module isn't on
  the package import path).
- ``tests/release/notebooks/test_execute_notebooks.py``: parametrised
  nbclient end-to-end execution of both notebooks, gated on the public
  release bundles being on disk (matches ``test_package_*`` skip pattern).

CI wiring (G13.1):
- ``[notebooks]`` extra in ``pyproject.toml`` (nbclient, nbformat,
  scikit-learn, matplotlib) -- keeps the dev install lean while letting
  the dedicated CI job run the gate.
- New ``notebooks`` job in ``.github/workflows/ci.yml`` that installs
  ``[dev,scripts,notebooks]``, regenerates the public release bundles
  via ``scripts/build_public_release.py`` (~6 s wall clock), then runs
  the nbclient execution test.

Builders:
- ``scripts/_build_release_notebook_01.py`` and ``_build_release_notebook_02.py``
  are nbformat-driven one-shot builders so notebook source stays
  reviewable in the PR diff (instead of hand-edited JSON).  Each
  builder shells out to ``ruff format`` on its emitted file, so the
  builder output and pre-commit hook agree byte-for-byte (preserves
  the audit-artifact-sync invariant PR 4.1 / 5.1 / 5.2 established).
  Per-file ruff ignore for E501 because dedented heredoc lines render
  as notebook content (markdown tables, print-statement output) where
  100c is the wrong yardstick.

Acceptance results on this branch:
- ruff check . / ruff format --check . -- clean
- mypy leadforge/ -- clean (82 source files)
- pytest -- 1246 passed, 5 skipped (publish-extra-gated)
- ``probe_relational_leakage`` on intro/intermediate/advanced -- 0/3 paths
  flag rate on every tier
- ``verify_hash_determinism`` -- PASS 67/67
- ``validate_release_candidate --no-rebuild`` -- PASS, 0 leakage findings
- ``BUNDLE_SCHEMA_VERSION`` unchanged at 5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Marks notebook 01 refresh and notebook 02 (relational feature
engineering) done; leaves 03 (leakage + time windows) and 04 (lift /
calibration / value ranking) plus the adversarial-framing artefacts
(issue templates, break-me guide, v2 decision log) for later PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaypal5 shaypal5 added type: feature New capability layer: validation validation/ invariants and checks layer: recipes recipes/ recipe assets and registry labels May 7, 2026
@github-actions

This comment has been minimized.

…only joins, JSON-loaded targets

Folds in the brutal self-review I posted on this PR before it goes to
human review.  Highlights:

HIGH severity
- **Builders are now byte-stable.**  Cell IDs are assigned deterministically
  in ``scripts/_release_notebook_common.assemble_notebook`` (was
  ``nbformat.v4.new_*_cell``'s random UUID), and a new
  ``tests/scripts/test_release_notebook_builders.py`` runs each builder
  twice + diffs against the committed ``.ipynb``.  The audit-artifact-sync
  invariant the docstring already advertised is now actually enforced —
  the test would have caught the previous revision's per-run diff.
- **Snapshot-safe verification cell prints a real number.**  The previous
  cell computed ``max_offset_days`` and threw it away, then printed a
  hardcoded ``(max offset 0.0 days)`` literal in every row.  Replaced
  with ``min headroom under cutoff: {min_headroom_days:6.2f} days`` —
  formatted from ``(cutoff - timestamp).min()``.  On the as-shipped
  intermediate bundle the receipt now reveals 735 events that land
  exactly on the cutoff (headroom 0.00 days), which is a real and
  teachable property of the data.
- **``account_density`` is fit on train leads only.**  Cell 4.2 was
  rolling up touches across train+test pooled — a leak path two cells
  above the train-only target encoding that loudly avoided the same
  mistake.  Now filters ``tch_with_acct`` to ``train["lead_id"]`` before
  the groupby; test rows in train-only-empty accounts fall back to 0
  via fillna.
- **Notebook 02 has a tolerance gate.**  Pinned the four model AUCs
  (LR-flat / GBM-flat / LR-eng / GBM-eng) and the headline
  ``GBM(eng) - GBM(flat)`` lift to per-metric ±0.02 (±0.015 for the
  lift), plus a sign-aware ``assert lift > 0``.  Without it, a future
  regression that flipped the lift negative would have passed the
  nbclient gate silently.
- **Per-metric tolerances replace the flat ±0.05.**  Notebook 01 now uses
  ±0.02 on AUC/Brier (cross-seed std is well under that) and ±0.05 on AP
  / top-decile (which have higher seed variance).
- **Notebook 01 actually reproduces the validation report.**  The
  previous revision dropped ``total_touches_all`` (the leakage trap)
  while the report's panel keeps it — the metrics differed by ~0.03 on
  GBM AUC and the ±0.05 tolerance was hiding the discrepancy.  Notebook
  01 now keeps the trap (matches ``release_quality._partition_columns``)
  with explicit narrative + forward-pointer to notebook 03; notebook 02
  drops the trap and explains why (relational lift attribution stays
  clean).

MED severity
- **``release/notebooks/_release_targets.json``** is the new source of
  truth for cross-seed-median targets.  Notebook 01 loads it at runtime
  (no more hardcoded constants) and a new
  ``tests/release/notebooks/test_release_targets_match_report.py`` asserts
  byte equality against ``release/validation/validation_report.json``'s
  ``tiers.<tier>.medians``.  Drift now fails CI.  Discovered along the
  way: the previous ``lr_precision_at_100: 0.59`` constant wasn't even
  *in* the validation report — the report's ranking metric is
  ``top_decile_rate``.  Notebook 01 now computes and gates
  ``top_decile_rate`` (matches the report).
- **``precision_at_k`` mirror test.**  Asserts ``_notebook_utils.precision_at_k``
  agrees with ``release_quality._precision_at_k`` over random + tied-score
  fixtures, locking the docstring claim in code.
- **Builder duplication eliminated.**  ``scripts/_release_notebook_common.py``
  hosts ``md`` / ``code`` / ``assemble_notebook`` / ``write_notebook``
  (incl. the deterministic-ID assignment and the ``ruff format``
  subprocess); each per-notebook builder shrinks to imports + cell list
  + one ``main()``.
- **Industry redundancy fixed.**  The ``+rel`` pipelines drop the raw
  ``industry`` categorical so LR no longer sees one-hot industry *and*
  its train-only target encoding for the same column.
- **Honest takeaway softened.**  +0.0147 AUC framed as
  "suggestive, not conclusive" with a pointer to PR 6.2's seed-sweep
  notebook 04 + the cross-seed gbm_auc spread (0.027) for context.
- **ΔBrier sign convention** spelled out under the delta panel.

LOW severity
- **Builder rename.**  ``scripts/_build_release_notebook_*.py`` →
  ``scripts/build_release_notebook_*.py`` (drops the leading underscore
  that's inconsistent with every other script in ``scripts/``).
- **CI builds only ``intermediate``.**  ``build_public_release.py`` got
  a ``--tier`` filter; the notebooks job uses ``--tier intermediate``,
  saving 3 redundant bundle generations per CI run.
- **Whitespace + idiom cleanups.**  Fixed ``[[ "lead_id"`` extra space;
  replaced ``.values`` (pandas-deprecated) with ``.to_numpy()`` in both
  notebooks.
- **Broken cross-references** to PR 6.2 / 6.3 / break_me_guide.md
  rewritten as plain-text "(coming in PR 6.x)" caveats — no link
  syntax to non-existent files.

Verification on this branch:
- ``ruff check . / ruff format --check .`` — clean
- ``mypy leadforge/`` — clean (82 source files)
- ``pytest`` — 1250 passed, 5 skipped (publish-extra-gated)
- ``probe_relational_leakage`` on intro/intermediate/advanced — 0/3
  paths flag rate on every tier
- ``verify_hash_determinism`` — PASS 67/67
- ``validate_release_candidate --no-rebuild`` — PASS, 0 leakage findings
- ``BUNDLE_SCHEMA_VERSION`` unchanged at 5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 05:25
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a CI “notebook execution” gate and supporting utilities/tests to keep the public release notebooks reproducible, byte-stable, and aligned with the published validation metrics.

Changes:

  • Introduce notebook builders + shared builder scaffolding to generate deterministic, reviewable .ipynb artifacts.
  • Add release/notebooks/_notebook_utils.py plus unit tests and audit-sync tests for pinned targets.
  • Add a GitHub Actions notebooks job, plus a --tier option to build only the needed bundle for CI.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/scripts/test_release_notebook_builders.py Verifies notebook builders produce byte-identical notebooks and match committed artifacts.
tests/release/notebooks/test_release_targets_match_report.py Ensures _release_targets.json stays in sync with validation_report.json medians.
tests/release/notebooks/test_notebook_utils.py Unit tests for notebook helper utilities (precision@k, top-decile rate, tolerance gate).
tests/release/notebooks/test_execute_notebooks.py End-to-end nbclient execution test for release notebooks.
scripts/build_release_notebook_01.py Builder script emitting notebook 01 content deterministically.
scripts/build_release_notebook_02.py Builder script emitting notebook 02 content deterministically.
scripts/build_public_release.py Adds --tier flag to build only a selected bundle (used by CI notebooks job).
scripts/_release_notebook_common.py Shared notebook builder helpers, deterministic cell IDs, and ruff-format step.
release/notebooks/_release_targets.json Pins notebook 01 targets (audited against validation report JSON).
release/notebooks/_notebook_utils.py Notebook-local metrics + tolerance assertion helpers (no package dependency).
release/notebooks/01_baseline_lead_scoring.ipynb Refreshed committed notebook 01 artifact (repro gate + plots).
release/notebooks/02_relational_feature_engineering.ipynb New committed notebook 02 artifact (relational FE + lift panel + gate).
pyproject.toml Adds [notebooks] optional dependencies and per-file ruff ignores for notebooks/builders.
.github/workflows/ci.yml Adds notebooks job to build bundle and execute notebooks.
.agent-plan.md Updates project plan entry for PR 6.1 deliverables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/scripts/test_release_notebook_builders.py Outdated
Comment thread release/notebooks/_notebook_utils.py Outdated
Comment thread release/notebooks/_notebook_utils.py
Comment thread scripts/build_release_notebook_01.py
Comment thread tests/release/notebooks/test_execute_notebooks.py Outdated
Comment thread tests/release/notebooks/test_notebook_utils.py
…rate mirror, doc/code alignment

Folds in all six unresolved Copilot review threads on this PR.

C1 — ``--out`` flag on the notebook builders (was: byte-stability test
mutated the committed file in place).  Both builders now expose
``--out PATH`` via a shared ``_release_notebook_common.builder_arg_parser``;
``tests/scripts/test_release_notebook_builders.py`` builds twice into
``tmp_path`` and diffs against the committed file.  Side effect: the
test is now safe under pytest-xdist parallel execution and an
interrupted run can no longer leave a corrupted ``release/notebooks/<name>.ipynb``.

C2 — ``assert_within_tolerance`` rejects silent-pass paths (was: NaN
observed/target slipped through because ``NaN > tol`` is ``False``;
per-metric tolerance maps missing keys defaulted to ``+inf``,
disabling the gate for those metrics).  Both paths now fail with
clear messages; the docstring describes the contract.

C3 — ``top_decile_rate`` gets the mirror test ``precision_at_k``
already had.  Both already use the same ``max(1, int(round(n * 0.1)))``
rule as ``release_quality._top_decile_rate``, so no impl change — the
test locks that in across small-``n`` (banker's rounding) and the
exact ``n_test = 750`` the intermediate tier ships.  Drift between
``_notebook_utils`` and ``release_quality`` would now break CI.

C4 — ``.agent-plan.md`` and the GitHub PR body both rewritten to
match reality after rev-2: notebook 01 *keeps* the trap (matches
``release_quality._partition_columns``, so the G13.2 reproduction
gate is meaningful); notebook 02 *drops* the trap (clean lift
attribution).  The PR body also documents the rev-1 → rev-2 → rev-3
revision history.

C5 — ``test_execute_notebooks.py`` docstring rewritten: the gate
flow loads ``_release_targets.json`` (audit-synced against
``validation_report.json``), not ``validation_report.md`` directly,
and tolerances are per-metric (not flat ±0.05).

C6 — four new tests in ``test_notebook_utils.py`` cover the failure
modes C2 plugs: NaN observed → fail, inf observed → fail, NaN
target → fail, missing-tolerance-key → fail.

Acceptance gate results on this branch:
- ruff check . / ruff format --check . — clean
- mypy leadforge/ — clean (82 source files)
- pytest — 1255 passed, 5 skipped (publish-extra-gated)
- probe_relational_leakage on intro/intermediate/advanced — 0/3 paths
  flag rate on every tier
- verify_hash_determinism — PASS 67/67
- validate_release_candidate --no-rebuild — PASS, 0 leakage findings
- BUNDLE_SCHEMA_VERSION unchanged at 5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Two real CI failures on the rev-3 commit, both fixed here:

1. **Lint job** failed with ``Would reformat:
   release/notebooks/02_relational_feature_engineering.ipynb``.  CI's
   ``pip install ruff`` resolved to 0.15.12; my local was 0.11.13.
   Newer ruff has slightly different ``.ipynb`` cell formatting rules,
   so the committed notebook (formatted by 0.11.13) didn't pass
   0.15.12's check.  Fix: pin ``ruff==0.15.12`` in the CI lint job and
   rebuild ``02_relational_feature_engineering.ipynb`` against 0.15.12
   so committed bytes match.  Pin lives only in CI (not in
   ``pyproject.toml`` ``[dev]``) to keep contributor laptops flexible;
   when CI bumps, re-run the builder locally with the matching ruff
   and commit the updated ``.ipynb``.

2. **Notebooks job** failed with ``jupyter_client.kernelspec.NoSuchKernel:
   No such kernel named python3``.  GitHub-hosted runners don't ship
   with any Jupyter kernelspecs; ``[notebooks]`` only pulled
   ``nbclient`` / ``nbformat``, neither of which provides a kernelspec.
   The execution test passes ``kernel_name="python3"`` so it needs an
   actual ``python3`` spec on disk.  Fix: add ``ipykernel>=6.0`` to
   the ``[notebooks]`` extra, plus a CI step
   ``python -m ipykernel install --user --name python3`` that
   registers the spec before nbclient boots a kernel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Tests (Python 3.11/3.12) failed on the previous push with
``ModuleNotFoundError: No module named 'nbformat'``.  The byte-
stability test in ``tests/scripts/test_release_notebook_builders.py``
shells out to the builders, which import ``nbformat`` — but the main
``test`` CI job installs ``[dev]`` only, not ``[notebooks]``.

Fix:
- ``pytest.importorskip("nbformat")`` at the top of the byte-stability
  test, so the main ``test`` job skips it cleanly.
- Have the dedicated ``notebooks`` CI job (which already installs
  ``[dev,scripts,notebooks]``) run both ``test_execute_notebooks.py``
  and ``test_release_notebook_builders.py``.  That keeps the byte-
  stability gate in CI without bloating the default dev install.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #73 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25478665327 attempt 1
Comment timestamp: 2026-05-07T06:01:01.688760+00:00
PR head commit: 10e791f42088d4eb053150907de5847af9c1e565

@shaypal5 shaypal5 merged commit 918fc49 into main May 7, 2026
9 checks passed
@shaypal5 shaypal5 deleted the feat/notebooks-baseline-and-relational branch May 7, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: recipes recipes/ recipe assets and registry layer: validation validation/ invariants and checks type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants