docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration by shaypal5 · Pull Request #84 · leadforge-dev/leadforge

shaypal5 · 2026-05-25T20:53:23Z

Summary

This PR does two things, and both matter:

1. Teaching improvements to all four release notebooks

Phase 8 pre-publish teaching improvements for the four release notebooks. All changes are in the canonical builder scripts (scripts/build_release_notebook_0*.py); the .ipynb files are derived artifacts regenerated by the builders.

Notebook 01 — Baseline Lead Scoring

Warning banner (new cell after title): flags that the leakage trap total_touches_all is kept intentionally in this validation-panel notebook; directs students to Notebook 02 for the clean pipeline.
Widened top_decile_rate tolerance to ±0.10: the cross-seed spread on the intermediate tier is ≈ 0.13, making a tighter gate a false CI alarm rather than a real regression. Updated §2 narrative to explain the reasoning.

Notebook 02 — Relational Feature Engineering

§8: cleaned up the honest-takeaway section; removed dangling cross-reference.
§9 GroupKFold: switched from train+test pool (5,000 leads) to train-only (3,500 leads), so the GroupKFold mean AUC and the random-split AUC are an apples-to-apples comparison — both evaluations use the same number of training leads. Updated GKF_TARGET = 0.6148, tightened GKF_TOL = 0.02, clarified narrative.
"## Next" section moved to the end (was before the gate cell).

Notebook 04 — Lift, Calibration, Value-Aware Ranking

§4 advanced-tier calibration: removed the if/else guard for missing advanced bundle (guard became unnecessary once the CI workflow was fixed; see below). Advanced bundle load is now unconditional.
§8 cohort-shift narrative: removed an invented causal explanation ("time-varying signal in the simulator disfavours the late cohort"). The v1 DGP has no baked-in time drift (claim c14); the ~0.06 AUC drop is a seed-42-specific observation. Replaced with an honest single-seed caveat.
§11 summary: removed the same misleading "production scenario" language from the cohort-shift bullet.
Removed unused _mbe loop variable (Copilot COPILOT-3).

2. Metrics reset after PR 8.1's leakage fix

PR 8.1 removed has_open_opportunity from the snapshot features (a real post-snapshot leakage column). That changed model metrics substantially — LR AUC dropped from ~0.88 to ~0.67 on the intermediate tier. The validation report, claims register, and release targets had to be regenerated to match.

What changed:

release/validation/validation_report.json: regenerated with seeds 42–46 (full 5-seed sweep). Seeds 43–46 were stale (pre-PR-8.1, dated May 6); deleted and rebuilt so all seeds reflect the leakage-fixed codebase.
release/notebooks/_release_targets.json: updated to 5-seed cross-seed medians.
release/metrics.json + per-tier metrics.json: rebuilt from updated validation report.
release/claims_register_source.yaml + derived claims_register.{json,md}: claims c05–c12 now quote 5-seed medians matching the validation report.

CI fix (notebooks job was failing):

.github/workflows/ci.yml: added python scripts/build_public_release.py release --tier advanced to the notebooks job. Notebook 04 §4 loads the advanced bundle unconditionally; the previous workflow only built intermediate, causing the notebooks CI job to fail.

What to verify

release-artifacts-sync job: all four sync checks pass.
notebooks job: all four notebooks execute end-to-end; tolerance gates in nb01 §7, nb02 §10, nb04 §10 pass.
test_release_targets_match_report.py: targets file matches the 5-seed validation report.
verify_claims_register.py: all 26 claims resolve within 1e-3.

Files changed

.github/workflows/ci.yml — add advanced bundle build step to notebooks job
.gitignore — exclude node_modules/, .wrangler/, release/_shmuggingface/
scripts/build_release_notebook_01.py — widen tolerance, update narrative
scripts/build_release_notebook_02.py — train-only GKF, move Next section, honest takeaway
scripts/build_release_notebook_04.py — remove guard, honest cohort narrative, fix unused var
release/notebooks/01_baseline_lead_scoring.ipynb — rebuilt
release/notebooks/02_relational_feature_engineering.ipynb — rebuilt
release/notebooks/04_lift_calibration_value_ranking.ipynb — rebuilt
release/validation/validation_report.json — 5-seed regeneration
release/notebooks/_release_targets.json — 5-seed medians
release/metrics.json + per-tier metrics.json — rebuilt
release/claims_register_source.yaml — c05–c12 updated
release/claims_register.{json,md} — rebuilt from source

🤖 Generated with Claude Code

…roupKFold, advanced calibration Notebook 01 (build_release_notebook_01.py): - Insert warning callout after title: '⚠️ Validation-panel notebook — leakage trap retained intentionally. Start at Notebook 02 for clean modelling.' Makes the deliberate trap-inclusion explicit for beginners who open nb01 first. - Remove '(coming in PR 6.2)' from §4 prose reference to Notebook 03 and from §10 Next cell (nb03 and nb04 are shipped). Notebook 02 (build_release_notebook_02.py): - §8 Honest takeaway: 'coming in PR 6.2's notebook 04' → 'Notebook 04'. - §8 Next: remove both '*(coming in PR 6.2)*' stale refs; add §9 cross-reference. - §9 new: Account-level split — GroupKFold(account_id) 5-fold CV with LR on the pooled train+test set; prints per-fold AUC, mean CV AUC, headline random-split AUC, and the optimism delta. Demonstrates the 93% account overlap limitation concretely. Notebook 04 (build_release_notebook_04.py): - §3a new: Advanced-tier calibration demo — loads ../advanced, runs the same LR pipeline, side-by-side reliability diagram (intermediate max-bin err ≈0.13 vs advanced ≈0.52). Confirms AUC is flat across tiers; calibration_max_bin_error is the discriminating metric. Implementation notes: - All edits are in the canonical builder scripts; .ipynb files are regenerated and verified byte-stable by tests/scripts/test_release_notebook_builders.py (4 passed). - Tests: 1419 passed, 5 skipped (pre-existing notebook execution failure on main; unrelated to this PR). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Teaching-focused updates to the canonical release-notebook builder scripts (and regenerated .ipynb artifacts) to reduce learner confusion around leakage traps, remove stale forward references, add an account-grouped evaluation example, and demonstrate calibration failure on the advanced tier.

Changes:

Notebook 01: adds a prominent warning banner about the intentionally-retained leakage trap; removes stale “coming in PR …” forward references.
Notebook 02: removes stale refs and adds a new GroupKFold(account_id) section to quantify optimism from account overlap.
Notebook 04: adds an “advanced tier calibration break” section that loads ../advanced inline and compares reliability diagrams.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
scripts/build_release_notebook_01.py	Inserts the new leakage-trap warning banner and removes stale forward refs in generated notebook 01.
scripts/build_release_notebook_02.py	Adds the new GroupKFold section and updates stale prose references in notebook 02 builder.
scripts/build_release_notebook_04.py	Adds the advanced-tier calibration comparison section to notebook 04 builder.
release/notebooks/01_baseline_lead_scoring.ipynb	Regenerated output reflecting notebook 01 builder changes.
release/notebooks/02_relational_feature_engineering.ipynb	Regenerated output reflecting notebook 02 builder changes (new §9).
release/notebooks/04_lift_calibration_value_ranking.ipynb	Regenerated output reflecting notebook 04 builder changes (new §3a).
.agent-plan.md	Marks PR 8.3 items as completed and updates the plan summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

PR 8.1 removed the has_open_opportunity post-snapshot leak, which was the dominant signal (LR AUC dropped 0.88 → 0.67). This PR re-runs the cross-seed validation sweep with fresh bundles and propagates the new numbers throughout every release artifact that pins metrics. Changes: - Regenerate release/validation/validation_report.json (seeds=[42]) - Rebuild release/metrics.json from updated report - Update release/notebooks/_release_targets.json - Rebuild all 4 release notebooks from updated builder scripts: - nb01: minor banner wording fix - nb02: updated inline targets + GroupKFold gate + section fixes - nb03: updated inline targets + GBM lift comment - nb04: new calibration narrative for advanced tier (score compression at low prevalence), removed _sanitize_adv antipattern, renumbered sections 4-11, updated cohort-shift narrative, updated inline targets - Update release/claims_register_source.yaml claims c05-c09, c12 with current metric values; rebuild claims_register.{json,md} - Update validation report figures to match new metrics All 1415 tests pass (0 failures). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

COPILOT-1 (build_release_notebook_04.py §4): Wrap the advanced-bundle load/train/plot/assert block in an existence guard so CI — which builds only the intermediate bundle — skips the cell cleanly instead of raising FileNotFoundError. The else-branch still runs all assertions and the side-by-side reliability diagram when the advanced bundle IS present. Fix the root cause of the CI notebooks job failure on run 26433263795. COPILOT-2 (build_release_notebook_02.py §9 GroupKFold): Use the consistent nullable-boolean label conversion .astype('boolean').fillna(False).astype(int) instead of the bare .astype(int) that raises on pd.NA values in the task column. COPILOT-3 (build_release_notebook_04.py §4 plot loop): Drop the unused _mbe variable from the for-loop tuple (the max-bin-error value was already embedded in the formatted label string). Also fixes a subtle bug introduced while applying COPILOT-1: the print() strings inside the triple-quoted cell body contained live \n escape sequences which dedent() read as zero-indent lines, causing ruff to reject the generated notebook with 'Unexpected indentation'. Escaped them as \\n so they reach the executed cell as literal newline chars. All 1419 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t GKF, nb04 cohort narrative - Regenerate validation_report.json with seeds 42-46 (stale 43-46 were pre-PR-8.1; rebuild after deleting them ensures cross-seed medians reflect the leakage-fixed codebase) - Update _release_targets.json to match 5-seed medians; widen nb01 top_decile_rate tolerance to ±0.10 (cross-seed spread ≈ 0.13 on intermediate; discrete small-count metric) - Rebuild metrics.json, claims_register.{json,md} from updated sources; claims c05-c12 now quote 5-seed medians matching the validation report - CI workflow: add advanced bundle build step (nb04 §4 loads it unconditionally; previously only intermediate was built in CI, causing the notebooks job to fail) - nb02 §9 GroupKFold: switch from train+test pool to train-only so the GKF and random-split AUCs are an apples-to-apples comparison (same 3,500-lead pool); update GKF_TARGET=0.6148, tighten tolerances - nb04 §8 cohort-shift narrative: remove invented causal explanation ("time-varying signal disfavours the late cohort"); replace with honest single-seed caveat noting v1 has no baked-in time drift (c14) - nb04 §11 summary: remove the same misleading "production scenario" language from the cohort-shift bullet - Rebuild all 4 .ipynb files from updated builder scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-26T21:37:11Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #84 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 26476558581 attempt 1
Comment timestamp: 2026-05-26T21:36:22.668516+00:00
PR head commit: 134ffe284b7c5fdddffa9b5e0769c13f310ee03d

Copilot AI review requested due to automatic review settings May 25, 2026 20:53

shaypal5 added this to the v1.0.0 — Polished OSS release milestone May 25, 2026

shaypal5 added type: docs Documentation or narrative changes layer: render render/ bundle and artifact output labels May 25, 2026

Copilot started reviewing on behalf of shaypal5 May 25, 2026 20:53 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread scripts/build_release_notebook_04.py

Comment thread scripts/build_release_notebook_02.py Outdated

Comment thread scripts/build_release_notebook_04.py

This comment has been minimized.

Sign in to view

shaypal5 merged commit ffc2109 into main May 26, 2026
10 checks passed

shaypal5 deleted the docs/pr8.3-notebook-teaching-improvements branch June 11, 2026 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration#84

docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration#84
shaypal5 merged 4 commits into
mainfrom
docs/pr8.3-notebook-teaching-improvements

shaypal5 commented May 25, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Teaching improvements to all four release notebooks

2. Metrics reset after PR 8.1's leakage fix

What to verify

Files changed

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 25, 2026 •

edited

Loading