feat: windowed snapshot for student_public bundles (3/3 of #57) by shaypal5 · Pull Request #59 · leadforge-dev/leadforge

shaypal5 · 2026-05-05T04:45:44Z

Summary

Closes the final sub-item of issue #57: event-aggregate features in student_public bundles are no longer computed over the same 90-day window the label resolves in.

The b2b_saas_procurement_v1 recipe now pins snapshot_day: 30. build_snapshot() filters touches, sessions, sales activities, and opportunities to events with timestamp ≤ lead_created_at + snapshot_day for every aggregate column. total_touches_all is the only feature that intentionally crosses the line — it stays at full-horizon counts as the deliberate pedagogical leakage trap, and now actually diverges from touch_count for 54–77% of leads (it was structurally identical before).

BUNDLE_SCHEMA_VERSION is bumped 3 → 4 (confirmed with the user before bumping). The published column SET is unchanged from v3, but column VALUES are no longer full-horizon — a contract shift v3 consumers would not detect from schema alone. manifest.snapshot_day records the contract so the bundle is self-describing.

Snapshot day measurements

Measured at seed 42, n_leads=5000, with a synthetic LR baseline (one-hot + standardize + impute) on a held-out 30% split:

tier	day	conv_rate	LR AUC	mean(total_touches_all − touch_count)	frac with gap
intro	14	0.422	0.780	5.94	0.89
intro	20	0.422	0.828	4.65	0.86
intro	30	0.422	0.854	2.96	0.77
intro	45	0.422	0.877	1.40	0.58
intro	90	0.422	0.887	0.00	0.00
intermediate	14	0.210	0.778	6.19	0.78
intermediate	20	0.210	0.820	4.91	0.76
intermediate	30	0.210	0.861	3.22	0.70
intermediate	45	0.210	0.889	1.64	0.56
intermediate	90	0.210	0.892	0.00	0.00
advanced	14	0.079	0.741	6.19	0.61
advanced	20	0.079	0.820	4.91	0.59
advanced	30	0.079	0.862	3.25	0.54
advanced	45	0.079	0.886	1.68	0.46
advanced	90	0.079	0.906	0.00	0.00

Day 30 was chosen as the trade-off point: LR AUC stays in [0.85, 0.86] across tiers — challenging but modelable — while a meaningful trap gap (~3 touches; > 50% of leads) is preserved. Day 20 was a viable runner-up but pushed AUC into the marginal 0.78–0.82 band. Day 45 collapsed the trap (only 46–58% of leads showing any gap).

Conversion rates: before/after

tier	before (v3, day 90)	after (v4, day 30)	declared range (`difficulty_profiles.yaml`)
intro	41.5%	41.5%	30–45%
intermediate	20.1%	20.1%	18–28%
advanced	7.9%	7.9%	8–15%

Conversion rates are invariant. The label is event-derived from label_window_days in the simulator (leadforge/simulation/engine.py) and is independent of snapshot_day; only feature aggregation is windowed. Declared difficulty ranges still hold under the existing _RATE_TOLERANCE = 0.05 in validation/difficulty.py, so no profile recalibration was needed.

What changed

Code

leadforge/core/models.py: GenerationConfig.snapshot_day: int | None = None with positive-int + ≤ horizon_days validation.
leadforge/api/recipes.py: Recipe.snapshot_day field; threaded through all 4 precedence layers in resolve_config().
leadforge/api/generator.py: snapshot_day kwarg on Generator.from_recipe().
leadforge/api/bundle.py: passed to build_snapshot().
leadforge/render/manifests.py: BUNDLE_SCHEMA_VERSION = "4" with history note; manifest.snapshot_day recorded.
leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml: snapshot_day: 30 with rationale comment.

Tests

Renamed tests/render/test_bundle_schema_v3_contract.py → test_bundle_schema_v4_contract.py; updated all assertions to v4; added test_manifest_records_snapshot_day checking manifest.snapshot_day == 30.
New tests/render/test_windowed_bundle_trap.py: asserts total_touches_all >= touch_count for every lead and > for at least some — guards against a future refactor that silently widens touch_count back to the full horizon and collapses the trap.

Docs

release/README.md, release/HF_DATASET_CARD.md: refreshed leakage handling section to describe the snapshot+redaction pair, with the total_touches_all framing updated.
CHANGELOG.md: schema v4 entry.
.agent-plan.md: all three Structural leakage in student_public bundles (post-#56 follow-up) #57 sub-items marked resolved; release-checklist updated.

Verification

pytest: 915 passed (was 911 baseline; +4 tests from the new contract assertion + trap guard).
ruff check and ruff format --check: clean.
mypy leadforge/: clean (78 source files).
scripts/verify_hash_determinism.py: PASS, 73/73 files identical across runs.
All four release bundles (intro, intermediate, advanced, intermediate_instructor) regenerate and pass validate_bundle().
manifest.json shows bundle_schema_version: "4" and snapshot_day: 30 for every released bundle.
Trap invariant test passes; v4 contract test passes.

Test plan

Reviewer: spot-check that the documented snapshot-day choice matches recipe.yaml.
Reviewer: confirm manifest.snapshot_day field appears in regenerated bundles.
Reviewer: confirm total_touches_all > touch_count for some leads in a generated bundle.
Reviewer: pull the branch and re-run python scripts/verify_hash_determinism.py to independently confirm the determinism claim.

Out of scope

Uploading to Kaggle/HuggingFace (manual user step, comes after this PR merges).
Adding new feature columns. The snapshot-day shift may make the current feature set differently informative; resisting the urge to expand here keeps the diff scoped.
Rewriting v4–v7 simplified-CSV pipelines. They have their own snapshot windows and don't track the release-bundle change.

🤖 Generated with Claude Code

Adds `snapshot_day` as a recipe-level config knob and threads it through the public API. When set, event-aggregate features (touch_count, session_count, expected_acv, days_since_last_touch, ...) only see events in [lead_created_at, lead_created_at + snapshot_day], so they cannot encode post-conversion data. total_touches_all keeps full-horizon counts as the deliberate pedagogical leakage trap, so the gap between the two columns now carries real signal instead of being structurally zero. The b2b_saas_procurement_v1 recipe pins snapshot_day=30, picked from measurements at seed=42, n_leads=5000. Conversion rates are invariant to snapshot_day (the label is event-derived from label_window_days), so intro/intermediate/advanced rates stay at 0.422/0.210/0.079 — well inside the declared difficulty_profiles ranges. Day 30 keeps LR AUC in the 0.85-0.86 band (challenging but modelable) while preserving a trap gap of ~3.0 touches with 54-77% of leads showing any gap. BUNDLE_SCHEMA_VERSION bumped 3 → 4. The published column SET is unchanged, but column VALUES are no longer full-horizon, which v3 consumers cannot detect from schema alone. manifest.snapshot_day is recorded so the contract is self-describing (null = legacy full-horizon). The schema contract test is renamed to test_bundle_schema_v4_contract.py and gains a new assertion that manifest.snapshot_day == 30. A new test_windowed_bundle_trap.py guards the pedagogical invariant: total_touches_all >= touch_count for every lead and > for at least some. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- release/README.md: refreshed leakage handling section with the v4 schema bump, the snapshot/redaction pair, and the new pedagogical framing of total_touches_all. Removed the open-caveat block — the underlying caveat (event aggregates over the label window) is fixed. - release/HF_DATASET_CARD.md: matched the README updates and bumped the schema-version reference. - CHANGELOG.md: documented the schema v4 entry (windowed snapshot, manifest.snapshot_day, contract test, trap invariant guard) and dropped the "open follow-up" line that previously flagged sub-item 1 of #57. - .agent-plan.md: marked all three sub-items of #57 resolved; added the windowed snapshot bullet to the Phase 5 release checklist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR finalizes the “windowed snapshot” contract for student_public release bundles by introducing a snapshot_day configuration (pinned to 30 for b2b_saas_procurement_v1), recording it in manifest.json, and bumping the bundle schema version to v4 to reflect the semantic shift in aggregate-feature values.

Changes:

Add snapshot_day to the generation config/recipe resolution stack and thread it into bundle rendering; record it in manifest.json.
Bump BUNDLE_SCHEMA_VERSION from "3" → "4" and update schema contract tests accordingly.
Update release documentation and add an end-to-end test to ensure the total_touches_all leakage trap meaningfully diverges from windowed touch_count.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`leadforge/core/models.py`	Adds `GenerationConfig.snapshot_day` and validation logic.
`leadforge/api/recipes.py`	Adds `Recipe.snapshot_day` parsing and threads it through config precedence resolution.
`leadforge/api/generator.py`	Exposes `snapshot_day` override in `Generator.from_recipe()`.
`leadforge/api/bundle.py`	Passes `snapshot_day` into snapshot building during bundle writing.
`leadforge/render/manifests.py`	Bumps schema version to v4 and records `snapshot_day` in `manifest.json`.
`leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml`	Pins `snapshot_day: 30` for the release recipe with rationale.
`tests/render/test_bundle_schema_v4_contract.py`	Renames/updates contract to v4 and asserts `manifest.snapshot_day`.
`tests/render/test_windowed_bundle_trap.py`	Adds end-to-end guard that the leakage trap still produces a non-zero gap for some leads.
`release/README.md`	Updates public release README to describe windowed snapshot + trap behavior.
`release/HF_DATASET_CARD.md`	Updates HF dataset card leakage-handling section for schema v4 + snapshot window.
`CHANGELOG.md`	Documents bundle schema v4 semantic change and test additions.
`.agent-plan.md`	Marks issue #57 sub-item as resolved and updates release checklist notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Self-review of the windowed-snapshot PR surfaced seven gaps; this commit addresses all of them. - Add tests/test_snapshot_day_threading.py — 22 tests covering Recipe.from_dict parsing (positive int, absent → None, rejection of zero/negative/string/bool), resolve_config precedence across all four layers (package default, recipe, override dict, kwarg), and GenerationConfig validation paths. Models the existing test_primary_task_threading.py. - Validate snapshot_day <= label_window_days in GenerationConfig.__post_init__. A snapshot anchored after the label closes would let features observe events past the label-scoring window — the exact structural leakage v4 is here to prevent. - Strengthen test_windowed_bundle_trap.py: replace the bare ``> 0`` check with ``frac_with_gap >= 0.20``. Measurements at the recipe default show 54-77% of leads carry a gap; a regression that collapsed the trap to a single lead would no longer pass silently. - Pin snapshot_day, primary_task, label_window_days, redacted_columns in the required-keys set in tests/render/test_render.py. These were already in the manifest but not formally pinned anywhere outside the v4 contract test. - Surface snapshot_day in the bundle-internal dataset_card.md: add ``Label window`` and ``Feature snapshot window`` rows to the header table, and replace the generic "Features are anchored at the snapshot date" caveat with concrete prose naming the day-30 feature window, the day-90 label window, and the total_touches_all trap. Consumers reading the card now see the v4 contract without needing the changelog. - Reconcile snapshot_day validation messages: both layers now say "must be a positive int or None/null". - Trim the recipe.yaml comment to a one-liner pointing at CHANGELOG / release docs for the rationale. 937 tests pass (was 915; +22 from the new threading file). ruff + mypy clean; verify_hash_determinism.py PASS 73/73. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The previous wording — "the label is then resolved over the next 60 days" — implied the label was scored over [day 30, day 90]. In fact ``converted_within_90_days`` is evaluated over the full 90 days from lead creation; only the *features* are restricted to the first 30 days. Same misleading phrasing existed in the auto-generated dataset_card.md caveat and is corrected too, so consumers reading the in-bundle card get the same accurate description. Resolves Copilot review comment on release/README.md:74. The matching review comment on leadforge/core/models.py:104 was already addressed in f19d58e (snapshot_day <= label_window_days validation block); resolved as already-treated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-05T05:16:08Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #59 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25359146347 attempt 1
Comment timestamp: 2026-05-05T05:15:18.766833+00:00
PR head commit: 650c3a7535e20ddda250bee96e5414e9ae7339b1

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            "- Features are anchored at the snapshot date. No post-anchor data is "
+            "included (leakage-free by construction)."


shaypal5 and others added 2 commits May 5, 2026 07:34

Copilot AI review requested due to automatic review settings May 5, 2026 04:45

shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output layer: api api/ public Python surface labels May 5, 2026

Copilot started reviewing on behalf of shaypal5 May 5, 2026 04:46 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread leadforge/core/models.py

Comment thread release/README.md Outdated

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 5, 2026 05:14

Copilot started reviewing on behalf of shaypal5 May 5, 2026 05:15 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread leadforge/narrative/dataset_card.py

Comment on lines +209 to +210

"- Features are anchored at the snapshot date. No post-anchor data is "

"included (leakage-free by construction)."

shaypal5 merged commit 05a7341 into main May 5, 2026
12 checks passed

shaypal5 deleted the post-v1-windowed-snapshot branch May 5, 2026 05:41

shaypal5 mentioned this pull request May 5, 2026

M12: CLI polish — surface v4 manifest fields, add --json, expose snapshot/task flags #60

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: windowed snapshot for student_public bundles (3/3 of #57)#59

feat: windowed snapshot for student_public bundles (3/3 of #57)#59
shaypal5 merged 4 commits into
mainfrom
post-v1-windowed-snapshot

shaypal5 commented May 5, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"- Features are anchored at the snapshot date. No post-anchor data is "
		"included (leakage-free by construction)."

Conversation

shaypal5 commented May 5, 2026

Summary

Snapshot day measurements

Conversion rates: before/after

What changed

Code

Tests

Docs

Verification

Test plan

Out of scope

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants