Skip to content

feat: windowed snapshot for student_public bundles (3/3 of #57)#59

Merged
shaypal5 merged 4 commits into
mainfrom
post-v1-windowed-snapshot
May 5, 2026
Merged

feat: windowed snapshot for student_public bundles (3/3 of #57)#59
shaypal5 merged 4 commits into
mainfrom
post-v1-windowed-snapshot

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the final sub-item of issue #57: event-aggregate features in student_public bundles are no longer computed over the same 90-day window the label resolves in.

The b2b_saas_procurement_v1 recipe now pins snapshot_day: 30. build_snapshot() filters touches, sessions, sales activities, and opportunities to events with timestamp ≤ lead_created_at + snapshot_day for every aggregate column. total_touches_all is the only feature that intentionally crosses the line — it stays at full-horizon counts as the deliberate pedagogical leakage trap, and now actually diverges from touch_count for 54–77% of leads (it was structurally identical before).

BUNDLE_SCHEMA_VERSION is bumped 3 → 4 (confirmed with the user before bumping). The published column SET is unchanged from v3, but column VALUES are no longer full-horizon — a contract shift v3 consumers would not detect from schema alone. manifest.snapshot_day records the contract so the bundle is self-describing.

Snapshot day measurements

Measured at seed 42, n_leads=5000, with a synthetic LR baseline (one-hot + standardize + impute) on a held-out 30% split:

tier day conv_rate LR AUC mean(total_touches_all − touch_count) frac with gap
intro 14 0.422 0.780 5.94 0.89
intro 20 0.422 0.828 4.65 0.86
intro 30 0.422 0.854 2.96 0.77
intro 45 0.422 0.877 1.40 0.58
intro 90 0.422 0.887 0.00 0.00
intermediate 14 0.210 0.778 6.19 0.78
intermediate 20 0.210 0.820 4.91 0.76
intermediate 30 0.210 0.861 3.22 0.70
intermediate 45 0.210 0.889 1.64 0.56
intermediate 90 0.210 0.892 0.00 0.00
advanced 14 0.079 0.741 6.19 0.61
advanced 20 0.079 0.820 4.91 0.59
advanced 30 0.079 0.862 3.25 0.54
advanced 45 0.079 0.886 1.68 0.46
advanced 90 0.079 0.906 0.00 0.00

Day 30 was chosen as the trade-off point: LR AUC stays in [0.85, 0.86] across tiers — challenging but modelable — while a meaningful trap gap (~3 touches; > 50% of leads) is preserved. Day 20 was a viable runner-up but pushed AUC into the marginal 0.78–0.82 band. Day 45 collapsed the trap (only 46–58% of leads showing any gap).

Conversion rates: before/after

tier before (v3, day 90) after (v4, day 30) declared range (difficulty_profiles.yaml)
intro 41.5% 41.5% 30–45%
intermediate 20.1% 20.1% 18–28%
advanced 7.9% 7.9% 8–15%

Conversion rates are invariant. The label is event-derived from label_window_days in the simulator (leadforge/simulation/engine.py) and is independent of snapshot_day; only feature aggregation is windowed. Declared difficulty ranges still hold under the existing _RATE_TOLERANCE = 0.05 in validation/difficulty.py, so no profile recalibration was needed.

What changed

Code

  • leadforge/core/models.py: GenerationConfig.snapshot_day: int | None = None with positive-int + ≤ horizon_days validation.
  • leadforge/api/recipes.py: Recipe.snapshot_day field; threaded through all 4 precedence layers in resolve_config().
  • leadforge/api/generator.py: snapshot_day kwarg on Generator.from_recipe().
  • leadforge/api/bundle.py: passed to build_snapshot().
  • leadforge/render/manifests.py: BUNDLE_SCHEMA_VERSION = "4" with history note; manifest.snapshot_day recorded.
  • leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml: snapshot_day: 30 with rationale comment.

Tests

  • Renamed tests/render/test_bundle_schema_v3_contract.pytest_bundle_schema_v4_contract.py; updated all assertions to v4; added test_manifest_records_snapshot_day checking manifest.snapshot_day == 30.
  • New tests/render/test_windowed_bundle_trap.py: asserts total_touches_all >= touch_count for every lead and > for at least some — guards against a future refactor that silently widens touch_count back to the full horizon and collapses the trap.

Docs

  • release/README.md, release/HF_DATASET_CARD.md: refreshed leakage handling section to describe the snapshot+redaction pair, with the total_touches_all framing updated.
  • CHANGELOG.md: schema v4 entry.
  • .agent-plan.md: all three Structural leakage in student_public bundles (post-#56 follow-up) #57 sub-items marked resolved; release-checklist updated.

Verification

  • pytest: 915 passed (was 911 baseline; +4 tests from the new contract assertion + trap guard).
  • ruff check and ruff format --check: clean.
  • mypy leadforge/: clean (78 source files).
  • scripts/verify_hash_determinism.py: PASS, 73/73 files identical across runs.
  • All four release bundles (intro, intermediate, advanced, intermediate_instructor) regenerate and pass validate_bundle().
  • manifest.json shows bundle_schema_version: "4" and snapshot_day: 30 for every released bundle.
  • Trap invariant test passes; v4 contract test passes.

Test plan

  • Reviewer: spot-check that the documented snapshot-day choice matches recipe.yaml.
  • Reviewer: confirm manifest.snapshot_day field appears in regenerated bundles.
  • Reviewer: confirm total_touches_all > touch_count for some leads in a generated bundle.
  • Reviewer: pull the branch and re-run python scripts/verify_hash_determinism.py to independently confirm the determinism claim.

Out of scope

  • Uploading to Kaggle/HuggingFace (manual user step, comes after this PR merges).
  • Adding new feature columns. The snapshot-day shift may make the current feature set differently informative; resisting the urge to expand here keeps the diff scoped.
  • Rewriting v4–v7 simplified-CSV pipelines. They have their own snapshot windows and don't track the release-bundle change.

🤖 Generated with Claude Code

shaypal5 and others added 2 commits May 5, 2026 07:34
Adds `snapshot_day` as a recipe-level config knob and threads it through
the public API.  When set, event-aggregate features (touch_count,
session_count, expected_acv, days_since_last_touch, ...) only see events
in [lead_created_at, lead_created_at + snapshot_day], so they cannot
encode post-conversion data.  total_touches_all keeps full-horizon counts
as the deliberate pedagogical leakage trap, so the gap between the two
columns now carries real signal instead of being structurally zero.

The b2b_saas_procurement_v1 recipe pins snapshot_day=30, picked from
measurements at seed=42, n_leads=5000.  Conversion rates are invariant
to snapshot_day (the label is event-derived from label_window_days), so
intro/intermediate/advanced rates stay at 0.422/0.210/0.079 — well
inside the declared difficulty_profiles ranges.  Day 30 keeps LR AUC
in the 0.85-0.86 band (challenging but modelable) while preserving a
trap gap of ~3.0 touches with 54-77% of leads showing any gap.

BUNDLE_SCHEMA_VERSION bumped 3 → 4.  The published column SET is
unchanged, but column VALUES are no longer full-horizon, which v3
consumers cannot detect from schema alone.  manifest.snapshot_day is
recorded so the contract is self-describing (null = legacy full-horizon).

The schema contract test is renamed to test_bundle_schema_v4_contract.py
and gains a new assertion that manifest.snapshot_day == 30.
A new test_windowed_bundle_trap.py guards the pedagogical invariant:
total_touches_all >= touch_count for every lead and > for at least some.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- release/README.md: refreshed leakage handling section with the v4 schema
  bump, the snapshot/redaction pair, and the new pedagogical framing of
  total_touches_all.  Removed the open-caveat block — the underlying
  caveat (event aggregates over the label window) is fixed.
- release/HF_DATASET_CARD.md: matched the README updates and bumped the
  schema-version reference.
- CHANGELOG.md: documented the schema v4 entry (windowed snapshot,
  manifest.snapshot_day, contract test, trap invariant guard) and dropped
  the "open follow-up" line that previously flagged sub-item 1 of #57.
- .agent-plan.md: marked all three sub-items of #57 resolved; added the
  windowed snapshot bullet to the Phase 5 release checklist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 04:45
@shaypal5 shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output layer: api api/ public Python surface labels May 5, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR finalizes the “windowed snapshot” contract for student_public release bundles by introducing a snapshot_day configuration (pinned to 30 for b2b_saas_procurement_v1), recording it in manifest.json, and bumping the bundle schema version to v4 to reflect the semantic shift in aggregate-feature values.

Changes:

  • Add snapshot_day to the generation config/recipe resolution stack and thread it into bundle rendering; record it in manifest.json.
  • Bump BUNDLE_SCHEMA_VERSION from "3""4" and update schema contract tests accordingly.
  • Update release documentation and add an end-to-end test to ensure the total_touches_all leakage trap meaningfully diverges from windowed touch_count.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
leadforge/core/models.py Adds GenerationConfig.snapshot_day and validation logic.
leadforge/api/recipes.py Adds Recipe.snapshot_day parsing and threads it through config precedence resolution.
leadforge/api/generator.py Exposes snapshot_day override in Generator.from_recipe().
leadforge/api/bundle.py Passes snapshot_day into snapshot building during bundle writing.
leadforge/render/manifests.py Bumps schema version to v4 and records snapshot_day in manifest.json.
leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml Pins snapshot_day: 30 for the release recipe with rationale.
tests/render/test_bundle_schema_v4_contract.py Renames/updates contract to v4 and asserts manifest.snapshot_day.
tests/render/test_windowed_bundle_trap.py Adds end-to-end guard that the leakage trap still produces a non-zero gap for some leads.
release/README.md Updates public release README to describe windowed snapshot + trap behavior.
release/HF_DATASET_CARD.md Updates HF dataset card leakage-handling section for schema v4 + snapshot window.
CHANGELOG.md Documents bundle schema v4 semantic change and test additions.
.agent-plan.md Marks issue #57 sub-item as resolved and updates release checklist notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/core/models.py
Comment thread release/README.md Outdated
Self-review of the windowed-snapshot PR surfaced seven gaps; this commit
addresses all of them.

- Add tests/test_snapshot_day_threading.py — 22 tests covering
  Recipe.from_dict parsing (positive int, absent → None, rejection of
  zero/negative/string/bool), resolve_config precedence across all four
  layers (package default, recipe, override dict, kwarg), and
  GenerationConfig validation paths.  Models the existing
  test_primary_task_threading.py.
- Validate snapshot_day <= label_window_days in
  GenerationConfig.__post_init__.  A snapshot anchored after the label
  closes would let features observe events past the label-scoring
  window — the exact structural leakage v4 is here to prevent.
- Strengthen test_windowed_bundle_trap.py: replace the bare ``> 0`` check
  with ``frac_with_gap >= 0.20``.  Measurements at the recipe default
  show 54-77% of leads carry a gap; a regression that collapsed the trap
  to a single lead would no longer pass silently.
- Pin snapshot_day, primary_task, label_window_days, redacted_columns in
  the required-keys set in tests/render/test_render.py.  These were
  already in the manifest but not formally pinned anywhere outside the
  v4 contract test.
- Surface snapshot_day in the bundle-internal dataset_card.md: add
  ``Label window`` and ``Feature snapshot window`` rows to the header
  table, and replace the generic "Features are anchored at the snapshot
  date" caveat with concrete prose naming the day-30 feature window, the
  day-90 label window, and the total_touches_all trap.  Consumers
  reading the card now see the v4 contract without needing the
  changelog.
- Reconcile snapshot_day validation messages: both layers now say
  "must be a positive int or None/null".
- Trim the recipe.yaml comment to a one-liner pointing at CHANGELOG /
  release docs for the rationale.

937 tests pass (was 915; +22 from the new threading file).
ruff + mypy clean; verify_hash_determinism.py PASS 73/73.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

The previous wording — "the label is then resolved over the next 60
days" — implied the label was scored over [day 30, day 90].  In fact
``converted_within_90_days`` is evaluated over the full 90 days from
lead creation; only the *features* are restricted to the first 30 days.
Same misleading phrasing existed in the auto-generated dataset_card.md
caveat and is corrected too, so consumers reading the in-bundle card
get the same accurate description.

Resolves Copilot review comment on release/README.md:74.
The matching review comment on leadforge/core/models.py:104 was
already addressed in f19d58e (snapshot_day <= label_window_days
validation block); resolved as already-treated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 05:14
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #59 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25359146347 attempt 1
Comment timestamp: 2026-05-05T05:15:18.766833+00:00
PR head commit: 650c3a7535e20ddda250bee96e5414e9ae7339b1

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +209 to +210
"- Features are anchored at the snapshot date. No post-anchor data is "
"included (leakage-free by construction)."
@shaypal5 shaypal5 merged commit 05a7341 into main May 5, 2026
12 checks passed
@shaypal5 shaypal5 deleted the post-v1-windowed-snapshot branch May 5, 2026 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: api api/ public Python surface layer: render render/ bundle and artifact output layer: schema schema/ entity/event contracts type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants