Skip to content

docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration#84

Merged
shaypal5 merged 4 commits into
mainfrom
docs/pr8.3-notebook-teaching-improvements
May 26, 2026
Merged

docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration#84
shaypal5 merged 4 commits into
mainfrom
docs/pr8.3-notebook-teaching-improvements

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR does two things, and both matter:

1. Teaching improvements to all four release notebooks

Phase 8 pre-publish teaching improvements for the four release notebooks. All changes are in the canonical builder scripts (scripts/build_release_notebook_0*.py); the .ipynb files are derived artifacts regenerated by the builders.

Notebook 01 — Baseline Lead Scoring

  • Warning banner (new cell after title): flags that the leakage trap total_touches_all is kept intentionally in this validation-panel notebook; directs students to Notebook 02 for the clean pipeline.
  • Widened top_decile_rate tolerance to ±0.10: the cross-seed spread on the intermediate tier is ≈ 0.13, making a tighter gate a false CI alarm rather than a real regression. Updated §2 narrative to explain the reasoning.

Notebook 02 — Relational Feature Engineering

  • §8: cleaned up the honest-takeaway section; removed dangling cross-reference.
  • §9 GroupKFold: switched from train+test pool (5,000 leads) to train-only (3,500 leads), so the GroupKFold mean AUC and the random-split AUC are an apples-to-apples comparison — both evaluations use the same number of training leads. Updated GKF_TARGET = 0.6148, tightened GKF_TOL = 0.02, clarified narrative.
  • "## Next" section moved to the end (was before the gate cell).

Notebook 04 — Lift, Calibration, Value-Aware Ranking

  • §4 advanced-tier calibration: removed the if/else guard for missing advanced bundle (guard became unnecessary once the CI workflow was fixed; see below). Advanced bundle load is now unconditional.
  • §8 cohort-shift narrative: removed an invented causal explanation ("time-varying signal in the simulator disfavours the late cohort"). The v1 DGP has no baked-in time drift (claim c14); the ~0.06 AUC drop is a seed-42-specific observation. Replaced with an honest single-seed caveat.
  • §11 summary: removed the same misleading "production scenario" language from the cohort-shift bullet.
  • Removed unused _mbe loop variable (Copilot COPILOT-3).

2. Metrics reset after PR 8.1's leakage fix

PR 8.1 removed has_open_opportunity from the snapshot features (a real post-snapshot leakage column). That changed model metrics substantially — LR AUC dropped from ~0.88 to ~0.67 on the intermediate tier. The validation report, claims register, and release targets had to be regenerated to match.

What changed:

  • release/validation/validation_report.json: regenerated with seeds 42–46 (full 5-seed sweep). Seeds 43–46 were stale (pre-PR-8.1, dated May 6); deleted and rebuilt so all seeds reflect the leakage-fixed codebase.
  • release/notebooks/_release_targets.json: updated to 5-seed cross-seed medians.
  • release/metrics.json + per-tier metrics.json: rebuilt from updated validation report.
  • release/claims_register_source.yaml + derived claims_register.{json,md}: claims c05–c12 now quote 5-seed medians matching the validation report.

CI fix (notebooks job was failing):

  • .github/workflows/ci.yml: added python scripts/build_public_release.py release --tier advanced to the notebooks job. Notebook 04 §4 loads the advanced bundle unconditionally; the previous workflow only built intermediate, causing the notebooks CI job to fail.

What to verify

  • release-artifacts-sync job: all four sync checks pass.
  • notebooks job: all four notebooks execute end-to-end; tolerance gates in nb01 §7, nb02 §10, nb04 §10 pass.
  • test_release_targets_match_report.py: targets file matches the 5-seed validation report.
  • verify_claims_register.py: all 26 claims resolve within 1e-3.

Files changed

  • .github/workflows/ci.yml — add advanced bundle build step to notebooks job
  • .gitignore — exclude node_modules/, .wrangler/, release/_shmuggingface/
  • scripts/build_release_notebook_01.py — widen tolerance, update narrative
  • scripts/build_release_notebook_02.py — train-only GKF, move Next section, honest takeaway
  • scripts/build_release_notebook_04.py — remove guard, honest cohort narrative, fix unused var
  • release/notebooks/01_baseline_lead_scoring.ipynb — rebuilt
  • release/notebooks/02_relational_feature_engineering.ipynb — rebuilt
  • release/notebooks/04_lift_calibration_value_ranking.ipynb — rebuilt
  • release/validation/validation_report.json — 5-seed regeneration
  • release/notebooks/_release_targets.json — 5-seed medians
  • release/metrics.json + per-tier metrics.json — rebuilt
  • release/claims_register_source.yaml — c05–c12 updated
  • release/claims_register.{json,md} — rebuilt from source

🤖 Generated with Claude Code

…roupKFold, advanced calibration

Notebook 01 (build_release_notebook_01.py):
- Insert warning callout after title: '⚠️ Validation-panel notebook
  — leakage trap retained intentionally. Start at Notebook 02 for
  clean modelling.' Makes the deliberate trap-inclusion explicit for
  beginners who open nb01 first.
- Remove '(coming in PR 6.2)' from §4 prose reference to Notebook 03
  and from §10 Next cell (nb03 and nb04 are shipped).

Notebook 02 (build_release_notebook_02.py):
- §8 Honest takeaway: 'coming in PR 6.2's notebook 04' → 'Notebook 04'.
- §8 Next: remove both '*(coming in PR 6.2)*' stale refs; add §9
  cross-reference.
- §9 new: Account-level split — GroupKFold(account_id) 5-fold CV
  with LR on the pooled train+test set; prints per-fold AUC, mean CV
  AUC, headline random-split AUC, and the optimism delta. Demonstrates
  the 93% account overlap limitation concretely.

Notebook 04 (build_release_notebook_04.py):
- §3a new: Advanced-tier calibration demo — loads ../advanced, runs
  the same LR pipeline, side-by-side reliability diagram (intermediate
  max-bin err ≈0.13 vs advanced ≈0.52). Confirms AUC is flat across
  tiers; calibration_max_bin_error is the discriminating metric.

Implementation notes:
- All edits are in the canonical builder scripts; .ipynb files are
  regenerated and verified byte-stable by
  tests/scripts/test_release_notebook_builders.py (4 passed).
- Tests: 1419 passed, 5 skipped (pre-existing notebook execution
  failure on main; unrelated to this PR).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 20:53
@shaypal5 shaypal5 added type: docs Documentation or narrative changes layer: render render/ bundle and artifact output labels May 25, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Teaching-focused updates to the canonical release-notebook builder scripts (and regenerated .ipynb artifacts) to reduce learner confusion around leakage traps, remove stale forward references, add an account-grouped evaluation example, and demonstrate calibration failure on the advanced tier.

Changes:

  • Notebook 01: adds a prominent warning banner about the intentionally-retained leakage trap; removes stale “coming in PR …” forward references.
  • Notebook 02: removes stale refs and adds a new GroupKFold(account_id) section to quantify optimism from account overlap.
  • Notebook 04: adds an “advanced tier calibration break” section that loads ../advanced inline and compares reliability diagrams.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/build_release_notebook_01.py Inserts the new leakage-trap warning banner and removes stale forward refs in generated notebook 01.
scripts/build_release_notebook_02.py Adds the new GroupKFold section and updates stale prose references in notebook 02 builder.
scripts/build_release_notebook_04.py Adds the advanced-tier calibration comparison section to notebook 04 builder.
release/notebooks/01_baseline_lead_scoring.ipynb Regenerated output reflecting notebook 01 builder changes.
release/notebooks/02_relational_feature_engineering.ipynb Regenerated output reflecting notebook 02 builder changes (new §9).
release/notebooks/04_lift_calibration_value_ranking.ipynb Regenerated output reflecting notebook 04 builder changes (new §3a).
.agent-plan.md Marks PR 8.3 items as completed and updates the plan summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/build_release_notebook_04.py
Comment thread scripts/build_release_notebook_02.py Outdated
Comment thread scripts/build_release_notebook_04.py
PR 8.1 removed the has_open_opportunity post-snapshot leak, which was
the dominant signal (LR AUC dropped 0.88 → 0.67). This PR re-runs
the cross-seed validation sweep with fresh bundles and propagates the
new numbers throughout every release artifact that pins metrics.

Changes:
- Regenerate release/validation/validation_report.json (seeds=[42])
- Rebuild release/metrics.json from updated report
- Update release/notebooks/_release_targets.json
- Rebuild all 4 release notebooks from updated builder scripts:
  - nb01: minor banner wording fix
  - nb02: updated inline targets + GroupKFold gate + section fixes
  - nb03: updated inline targets + GBM lift comment
  - nb04: new calibration narrative for advanced tier (score compression
    at low prevalence), removed _sanitize_adv antipattern, renumbered
    sections 4-11, updated cohort-shift narrative, updated inline targets
- Update release/claims_register_source.yaml claims c05-c09, c12 with
  current metric values; rebuild claims_register.{json,md}
- Update validation report figures to match new metrics

All 1415 tests pass (0 failures).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

COPILOT-1 (build_release_notebook_04.py §4): Wrap the advanced-bundle
load/train/plot/assert block in an existence guard so CI — which builds
only the intermediate bundle — skips the cell cleanly instead of
raising FileNotFoundError. The else-branch still runs all assertions
and the side-by-side reliability diagram when the advanced bundle IS
present. Fix the root cause of the CI notebooks job failure on run
26433263795.

COPILOT-2 (build_release_notebook_02.py §9 GroupKFold): Use the
consistent nullable-boolean label conversion
  .astype('boolean').fillna(False).astype(int)
instead of the bare .astype(int) that raises on pd.NA values in the
task column.

COPILOT-3 (build_release_notebook_04.py §4 plot loop): Drop the unused
_mbe variable from the for-loop tuple (the max-bin-error value was
already embedded in the formatted label string).

Also fixes a subtle bug introduced while applying COPILOT-1: the print()
strings inside the triple-quoted cell body contained live \n escape
sequences which dedent() read as zero-indent lines, causing ruff to
reject the generated notebook with 'Unexpected indentation'. Escaped
them as \\n so they reach the executed cell as literal newline chars.

All 1419 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

…t GKF, nb04 cohort narrative

- Regenerate validation_report.json with seeds 42-46 (stale 43-46 were
  pre-PR-8.1; rebuild after deleting them ensures cross-seed medians
  reflect the leakage-fixed codebase)
- Update _release_targets.json to match 5-seed medians; widen nb01
  top_decile_rate tolerance to ±0.10 (cross-seed spread ≈ 0.13 on
  intermediate; discrete small-count metric)
- Rebuild metrics.json, claims_register.{json,md} from updated sources;
  claims c05-c12 now quote 5-seed medians matching the validation report
- CI workflow: add advanced bundle build step (nb04 §4 loads it
  unconditionally; previously only intermediate was built in CI,
  causing the notebooks job to fail)
- nb02 §9 GroupKFold: switch from train+test pool to train-only so
  the GKF and random-split AUCs are an apples-to-apples comparison
  (same 3,500-lead pool); update GKF_TARGET=0.6148, tighten tolerances
- nb04 §8 cohort-shift narrative: remove invented causal explanation
  ("time-varying signal disfavours the late cohort"); replace with
  honest single-seed caveat noting v1 has no baked-in time drift (c14)
- nb04 §11 summary: remove the same misleading "production scenario"
  language from the cohort-shift bullet
- Rebuild all 4 .ipynb files from updated builder scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #84 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 26476558581 attempt 1
Comment timestamp: 2026-05-26T21:36:22.668516+00:00
PR head commit: 134ffe284b7c5fdddffa9b5e0769c13f310ee03d

@shaypal5 shaypal5 merged commit ffc2109 into main May 26, 2026
10 checks passed
@shaypal5 shaypal5 deleted the docs/pr8.3-notebook-teaching-improvements branch June 11, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: render render/ bundle and artifact output type: docs Documentation or narrative changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants