feat: windowed snapshot for student_public bundles (3/3 of #57)#59
Conversation
Adds `snapshot_day` as a recipe-level config knob and threads it through the public API. When set, event-aggregate features (touch_count, session_count, expected_acv, days_since_last_touch, ...) only see events in [lead_created_at, lead_created_at + snapshot_day], so they cannot encode post-conversion data. total_touches_all keeps full-horizon counts as the deliberate pedagogical leakage trap, so the gap between the two columns now carries real signal instead of being structurally zero. The b2b_saas_procurement_v1 recipe pins snapshot_day=30, picked from measurements at seed=42, n_leads=5000. Conversion rates are invariant to snapshot_day (the label is event-derived from label_window_days), so intro/intermediate/advanced rates stay at 0.422/0.210/0.079 — well inside the declared difficulty_profiles ranges. Day 30 keeps LR AUC in the 0.85-0.86 band (challenging but modelable) while preserving a trap gap of ~3.0 touches with 54-77% of leads showing any gap. BUNDLE_SCHEMA_VERSION bumped 3 → 4. The published column SET is unchanged, but column VALUES are no longer full-horizon, which v3 consumers cannot detect from schema alone. manifest.snapshot_day is recorded so the contract is self-describing (null = legacy full-horizon). The schema contract test is renamed to test_bundle_schema_v4_contract.py and gains a new assertion that manifest.snapshot_day == 30. A new test_windowed_bundle_trap.py guards the pedagogical invariant: total_touches_all >= touch_count for every lead and > for at least some. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- release/README.md: refreshed leakage handling section with the v4 schema bump, the snapshot/redaction pair, and the new pedagogical framing of total_touches_all. Removed the open-caveat block — the underlying caveat (event aggregates over the label window) is fixed. - release/HF_DATASET_CARD.md: matched the README updates and bumped the schema-version reference. - CHANGELOG.md: documented the schema v4 entry (windowed snapshot, manifest.snapshot_day, contract test, trap invariant guard) and dropped the "open follow-up" line that previously flagged sub-item 1 of #57. - .agent-plan.md: marked all three sub-items of #57 resolved; added the windowed snapshot bullet to the Phase 5 release checklist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR finalizes the “windowed snapshot” contract for student_public release bundles by introducing a snapshot_day configuration (pinned to 30 for b2b_saas_procurement_v1), recording it in manifest.json, and bumping the bundle schema version to v4 to reflect the semantic shift in aggregate-feature values.
Changes:
- Add
snapshot_dayto the generation config/recipe resolution stack and thread it into bundle rendering; record it inmanifest.json. - Bump
BUNDLE_SCHEMA_VERSIONfrom"3"→"4"and update schema contract tests accordingly. - Update release documentation and add an end-to-end test to ensure the
total_touches_allleakage trap meaningfully diverges from windowedtouch_count.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/models.py |
Adds GenerationConfig.snapshot_day and validation logic. |
leadforge/api/recipes.py |
Adds Recipe.snapshot_day parsing and threads it through config precedence resolution. |
leadforge/api/generator.py |
Exposes snapshot_day override in Generator.from_recipe(). |
leadforge/api/bundle.py |
Passes snapshot_day into snapshot building during bundle writing. |
leadforge/render/manifests.py |
Bumps schema version to v4 and records snapshot_day in manifest.json. |
leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml |
Pins snapshot_day: 30 for the release recipe with rationale. |
tests/render/test_bundle_schema_v4_contract.py |
Renames/updates contract to v4 and asserts manifest.snapshot_day. |
tests/render/test_windowed_bundle_trap.py |
Adds end-to-end guard that the leakage trap still produces a non-zero gap for some leads. |
release/README.md |
Updates public release README to describe windowed snapshot + trap behavior. |
release/HF_DATASET_CARD.md |
Updates HF dataset card leakage-handling section for schema v4 + snapshot window. |
CHANGELOG.md |
Documents bundle schema v4 semantic change and test additions. |
.agent-plan.md |
Marks issue #57 sub-item as resolved and updates release checklist notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Self-review of the windowed-snapshot PR surfaced seven gaps; this commit addresses all of them. - Add tests/test_snapshot_day_threading.py — 22 tests covering Recipe.from_dict parsing (positive int, absent → None, rejection of zero/negative/string/bool), resolve_config precedence across all four layers (package default, recipe, override dict, kwarg), and GenerationConfig validation paths. Models the existing test_primary_task_threading.py. - Validate snapshot_day <= label_window_days in GenerationConfig.__post_init__. A snapshot anchored after the label closes would let features observe events past the label-scoring window — the exact structural leakage v4 is here to prevent. - Strengthen test_windowed_bundle_trap.py: replace the bare ``> 0`` check with ``frac_with_gap >= 0.20``. Measurements at the recipe default show 54-77% of leads carry a gap; a regression that collapsed the trap to a single lead would no longer pass silently. - Pin snapshot_day, primary_task, label_window_days, redacted_columns in the required-keys set in tests/render/test_render.py. These were already in the manifest but not formally pinned anywhere outside the v4 contract test. - Surface snapshot_day in the bundle-internal dataset_card.md: add ``Label window`` and ``Feature snapshot window`` rows to the header table, and replace the generic "Features are anchored at the snapshot date" caveat with concrete prose naming the day-30 feature window, the day-90 label window, and the total_touches_all trap. Consumers reading the card now see the v4 contract without needing the changelog. - Reconcile snapshot_day validation messages: both layers now say "must be a positive int or None/null". - Trim the recipe.yaml comment to a one-liner pointing at CHANGELOG / release docs for the rationale. 937 tests pass (was 915; +22 from the new threading file). ruff + mypy clean; verify_hash_determinism.py PASS 73/73. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
The previous wording — "the label is then resolved over the next 60 days" — implied the label was scored over [day 30, day 90]. In fact ``converted_within_90_days`` is evaluated over the full 90 days from lead creation; only the *features* are restricted to the first 30 days. Same misleading phrasing existed in the auto-generated dataset_card.md caveat and is corrected too, so consumers reading the in-bundle card get the same accurate description. Resolves Copilot review comment on release/README.md:74. The matching review comment on leadforge/core/models.py:104 was already addressed in f19d58e (snapshot_day <= label_window_days validation block); resolved as already-treated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #59 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "- Features are anchored at the snapshot date. No post-anchor data is " | ||
| "included (leakage-free by construction)." |
Summary
Closes the final sub-item of issue #57: event-aggregate features in
student_publicbundles are no longer computed over the same 90-day window the label resolves in.The
b2b_saas_procurement_v1recipe now pinssnapshot_day: 30.build_snapshot()filters touches, sessions, sales activities, and opportunities to events with timestamp ≤lead_created_at + snapshot_dayfor every aggregate column.total_touches_allis the only feature that intentionally crosses the line — it stays at full-horizon counts as the deliberate pedagogical leakage trap, and now actually diverges fromtouch_countfor 54–77% of leads (it was structurally identical before).BUNDLE_SCHEMA_VERSIONis bumped 3 → 4 (confirmed with the user before bumping). The published column SET is unchanged from v3, but column VALUES are no longer full-horizon — a contract shift v3 consumers would not detect from schema alone.manifest.snapshot_dayrecords the contract so the bundle is self-describing.Snapshot day measurements
Measured at seed 42, n_leads=5000, with a synthetic LR baseline (one-hot + standardize + impute) on a held-out 30% split:
Day 30 was chosen as the trade-off point: LR AUC stays in [0.85, 0.86] across tiers — challenging but modelable — while a meaningful trap gap (~3 touches; > 50% of leads) is preserved. Day 20 was a viable runner-up but pushed AUC into the marginal 0.78–0.82 band. Day 45 collapsed the trap (only 46–58% of leads showing any gap).
Conversion rates: before/after
difficulty_profiles.yaml)Conversion rates are invariant. The label is event-derived from
label_window_daysin the simulator (leadforge/simulation/engine.py) and is independent ofsnapshot_day; only feature aggregation is windowed. Declared difficulty ranges still hold under the existing_RATE_TOLERANCE = 0.05invalidation/difficulty.py, so no profile recalibration was needed.What changed
Code
leadforge/core/models.py:GenerationConfig.snapshot_day: int | None = Nonewith positive-int +≤ horizon_daysvalidation.leadforge/api/recipes.py:Recipe.snapshot_dayfield; threaded through all 4 precedence layers inresolve_config().leadforge/api/generator.py:snapshot_daykwarg onGenerator.from_recipe().leadforge/api/bundle.py: passed tobuild_snapshot().leadforge/render/manifests.py:BUNDLE_SCHEMA_VERSION = "4"with history note;manifest.snapshot_dayrecorded.leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml:snapshot_day: 30with rationale comment.Tests
tests/render/test_bundle_schema_v3_contract.py→test_bundle_schema_v4_contract.py; updated all assertions to v4; addedtest_manifest_records_snapshot_daycheckingmanifest.snapshot_day == 30.tests/render/test_windowed_bundle_trap.py: assertstotal_touches_all >= touch_countfor every lead and>for at least some — guards against a future refactor that silently widenstouch_countback to the full horizon and collapses the trap.Docs
release/README.md,release/HF_DATASET_CARD.md: refreshed leakage handling section to describe the snapshot+redaction pair, with thetotal_touches_allframing updated.CHANGELOG.md: schema v4 entry..agent-plan.md: all three Structural leakage in student_public bundles (post-#56 follow-up) #57 sub-items marked resolved; release-checklist updated.Verification
pytest: 915 passed (was 911 baseline; +4 tests from the new contract assertion + trap guard).ruff checkandruff format --check: clean.mypy leadforge/: clean (78 source files).scripts/verify_hash_determinism.py: PASS, 73/73 files identical across runs.validate_bundle().manifest.jsonshowsbundle_schema_version: "4"andsnapshot_day: 30for every released bundle.Test plan
recipe.yaml.manifest.snapshot_dayfield appears in regenerated bundles.total_touches_all > touch_countfor some leads in a generated bundle.python scripts/verify_hash_determinism.pyto independently confirm the determinism claim.Out of scope
🤖 Generated with Claude Code