docs(notebooks): PR 8.3 teaching improvements — banner, stale refs, GroupKFold, advanced calibration#84
Merged
Conversation
…roupKFold, advanced calibration Notebook 01 (build_release_notebook_01.py): - Insert warning callout after title: '⚠️ Validation-panel notebook — leakage trap retained intentionally. Start at Notebook 02 for clean modelling.' Makes the deliberate trap-inclusion explicit for beginners who open nb01 first. - Remove '(coming in PR 6.2)' from §4 prose reference to Notebook 03 and from §10 Next cell (nb03 and nb04 are shipped). Notebook 02 (build_release_notebook_02.py): - §8 Honest takeaway: 'coming in PR 6.2's notebook 04' → 'Notebook 04'. - §8 Next: remove both '*(coming in PR 6.2)*' stale refs; add §9 cross-reference. - §9 new: Account-level split — GroupKFold(account_id) 5-fold CV with LR on the pooled train+test set; prints per-fold AUC, mean CV AUC, headline random-split AUC, and the optimism delta. Demonstrates the 93% account overlap limitation concretely. Notebook 04 (build_release_notebook_04.py): - §3a new: Advanced-tier calibration demo — loads ../advanced, runs the same LR pipeline, side-by-side reliability diagram (intermediate max-bin err ≈0.13 vs advanced ≈0.52). Confirms AUC is flat across tiers; calibration_max_bin_error is the discriminating metric. Implementation notes: - All edits are in the canonical builder scripts; .ipynb files are regenerated and verified byte-stable by tests/scripts/test_release_notebook_builders.py (4 passed). - Tests: 1419 passed, 5 skipped (pre-existing notebook execution failure on main; unrelated to this PR). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Teaching-focused updates to the canonical release-notebook builder scripts (and regenerated .ipynb artifacts) to reduce learner confusion around leakage traps, remove stale forward references, add an account-grouped evaluation example, and demonstrate calibration failure on the advanced tier.
Changes:
- Notebook 01: adds a prominent warning banner about the intentionally-retained leakage trap; removes stale “coming in PR …” forward references.
- Notebook 02: removes stale refs and adds a new
GroupKFold(account_id)section to quantify optimism from account overlap. - Notebook 04: adds an “advanced tier calibration break” section that loads
../advancedinline and compares reliability diagrams.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/build_release_notebook_01.py | Inserts the new leakage-trap warning banner and removes stale forward refs in generated notebook 01. |
| scripts/build_release_notebook_02.py | Adds the new GroupKFold section and updates stale prose references in notebook 02 builder. |
| scripts/build_release_notebook_04.py | Adds the advanced-tier calibration comparison section to notebook 04 builder. |
| release/notebooks/01_baseline_lead_scoring.ipynb | Regenerated output reflecting notebook 01 builder changes. |
| release/notebooks/02_relational_feature_engineering.ipynb | Regenerated output reflecting notebook 02 builder changes (new §9). |
| release/notebooks/04_lift_calibration_value_ranking.ipynb | Regenerated output reflecting notebook 04 builder changes (new §3a). |
| .agent-plan.md | Marks PR 8.3 items as completed and updates the plan summary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR 8.1 removed the has_open_opportunity post-snapshot leak, which was
the dominant signal (LR AUC dropped 0.88 → 0.67). This PR re-runs
the cross-seed validation sweep with fresh bundles and propagates the
new numbers throughout every release artifact that pins metrics.
Changes:
- Regenerate release/validation/validation_report.json (seeds=[42])
- Rebuild release/metrics.json from updated report
- Update release/notebooks/_release_targets.json
- Rebuild all 4 release notebooks from updated builder scripts:
- nb01: minor banner wording fix
- nb02: updated inline targets + GroupKFold gate + section fixes
- nb03: updated inline targets + GBM lift comment
- nb04: new calibration narrative for advanced tier (score compression
at low prevalence), removed _sanitize_adv antipattern, renumbered
sections 4-11, updated cohort-shift narrative, updated inline targets
- Update release/claims_register_source.yaml claims c05-c09, c12 with
current metric values; rebuild claims_register.{json,md}
- Update validation report figures to match new metrics
All 1415 tests pass (0 failures).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
COPILOT-1 (build_release_notebook_04.py §4): Wrap the advanced-bundle
load/train/plot/assert block in an existence guard so CI — which builds
only the intermediate bundle — skips the cell cleanly instead of
raising FileNotFoundError. The else-branch still runs all assertions
and the side-by-side reliability diagram when the advanced bundle IS
present. Fix the root cause of the CI notebooks job failure on run
26433263795.
COPILOT-2 (build_release_notebook_02.py §9 GroupKFold): Use the
consistent nullable-boolean label conversion
.astype('boolean').fillna(False).astype(int)
instead of the bare .astype(int) that raises on pd.NA values in the
task column.
COPILOT-3 (build_release_notebook_04.py §4 plot loop): Drop the unused
_mbe variable from the for-loop tuple (the max-bin-error value was
already embedded in the formatted label string).
Also fixes a subtle bug introduced while applying COPILOT-1: the print()
strings inside the triple-quoted cell body contained live \n escape
sequences which dedent() read as zero-indent lines, causing ruff to
reject the generated notebook with 'Unexpected indentation'. Escaped
them as \\n so they reach the executed cell as literal newline chars.
All 1419 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…t GKF, nb04 cohort narrative
- Regenerate validation_report.json with seeds 42-46 (stale 43-46 were
pre-PR-8.1; rebuild after deleting them ensures cross-seed medians
reflect the leakage-fixed codebase)
- Update _release_targets.json to match 5-seed medians; widen nb01
top_decile_rate tolerance to ±0.10 (cross-seed spread ≈ 0.13 on
intermediate; discrete small-count metric)
- Rebuild metrics.json, claims_register.{json,md} from updated sources;
claims c05-c12 now quote 5-seed medians matching the validation report
- CI workflow: add advanced bundle build step (nb04 §4 loads it
unconditionally; previously only intermediate was built in CI,
causing the notebooks job to fail)
- nb02 §9 GroupKFold: switch from train+test pool to train-only so
the GKF and random-split AUCs are an apples-to-apples comparison
(same 3,500-lead pool); update GKF_TARGET=0.6148, tighten tolerances
- nb04 §8 cohort-shift narrative: remove invented causal explanation
("time-varying signal disfavours the late cohort"); replace with
honest single-seed caveat noting v1 has no baked-in time drift (c14)
- nb04 §11 summary: remove the same misleading "production scenario"
language from the cohort-shift bullet
- Rebuild all 4 .ipynb files from updated builder scripts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #84 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR does two things, and both matter:
1. Teaching improvements to all four release notebooks
Phase 8 pre-publish teaching improvements for the four release notebooks. All changes are in the canonical builder scripts (
scripts/build_release_notebook_0*.py); the.ipynbfiles are derived artifacts regenerated by the builders.Notebook 01 — Baseline Lead Scoring
total_touches_allis kept intentionally in this validation-panel notebook; directs students to Notebook 02 for the clean pipeline.top_decile_ratetolerance to ±0.10: the cross-seed spread on the intermediate tier is ≈ 0.13, making a tighter gate a false CI alarm rather than a real regression. Updated §2 narrative to explain the reasoning.Notebook 02 — Relational Feature Engineering
GKF_TARGET = 0.6148, tightenedGKF_TOL = 0.02, clarified narrative.Notebook 04 — Lift, Calibration, Value-Aware Ranking
if/elseguard for missing advanced bundle (guard became unnecessary once the CI workflow was fixed; see below). Advanced bundle load is now unconditional._mbeloop variable (Copilot COPILOT-3).2. Metrics reset after PR 8.1's leakage fix
PR 8.1 removed
has_open_opportunityfrom the snapshot features (a real post-snapshot leakage column). That changed model metrics substantially — LR AUC dropped from ~0.88 to ~0.67 on the intermediate tier. The validation report, claims register, and release targets had to be regenerated to match.What changed:
release/validation/validation_report.json: regenerated with seeds 42–46 (full 5-seed sweep). Seeds 43–46 were stale (pre-PR-8.1, dated May 6); deleted and rebuilt so all seeds reflect the leakage-fixed codebase.release/notebooks/_release_targets.json: updated to 5-seed cross-seed medians.release/metrics.json+ per-tiermetrics.json: rebuilt from updated validation report.release/claims_register_source.yaml+ derivedclaims_register.{json,md}: claims c05–c12 now quote 5-seed medians matching the validation report.CI fix (notebooks job was failing):
.github/workflows/ci.yml: addedpython scripts/build_public_release.py release --tier advancedto the notebooks job. Notebook 04 §4 loads the advanced bundle unconditionally; the previous workflow only built intermediate, causing the notebooks CI job to fail.What to verify
release-artifacts-syncjob: all four sync checks pass.notebooksjob: all four notebooks execute end-to-end; tolerance gates in nb01 §7, nb02 §10, nb04 §10 pass.test_release_targets_match_report.py: targets file matches the 5-seed validation report.verify_claims_register.py: all 26 claims resolve within 1e-3.Files changed
.github/workflows/ci.yml— add advanced bundle build step to notebooks job.gitignore— excludenode_modules/,.wrangler/,release/_shmuggingface/scripts/build_release_notebook_01.py— widen tolerance, update narrativescripts/build_release_notebook_02.py— train-only GKF, move Next section, honest takeawayscripts/build_release_notebook_04.py— remove guard, honest cohort narrative, fix unused varrelease/notebooks/01_baseline_lead_scoring.ipynb— rebuiltrelease/notebooks/02_relational_feature_engineering.ipynb— rebuiltrelease/notebooks/04_lift_calibration_value_ranking.ipynb— rebuiltrelease/validation/validation_report.json— 5-seed regenerationrelease/notebooks/_release_targets.json— 5-seed mediansrelease/metrics.json+ per-tiermetrics.json— rebuiltrelease/claims_register_source.yaml— c05–c12 updatedrelease/claims_register.{json,md}— rebuilt from source🤖 Generated with Claude Code