leadforge-dev · shaypal5 · May 5, 2026 · May 5, 2026 · May 5, 2026 · May 5, 2026
diff --git a/.agent-plan.md b/.agent-plan.md
@@ -46,6 +46,7 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie
 - [x] Update release/HF_DATASET_CARD.md — add conversion rates to summary table
 - [x] Verify SHA-256 hash determinism (re-run build, compare hashes) — `scripts/verify_hash_determinism.py`; 73/73 files identical across two `build_public_release.py` runs (modulo `manifest.json`'s wall-clock `generation_timestamp`)
 - [x] Fix `current_stage` leakage in student_public bundles via exposure-layer redaction — `is_leakage_trap` flag distinguishes the pedagogical trap (`total_touches_all`) from true label leaks; `BundleFilter.redacted_columns` strips the latter; `validate_bundle()` enforces the invariant. 73/73 hash-determinism preserved.
+- [x] Windowed snapshot for student_public bundles — `snapshot_day=30` pinned in recipe; event-aggregate features no longer share the 90-day label window; `BUNDLE_SCHEMA_VERSION` bumped to 4; 73/73 hash-determinism preserved; conversion rates unchanged (41.5% / 20.1% / 7.9%).
 - [ ] Upload to Kaggle and HuggingFace
 - [ ] Announce
 
@@ -66,11 +67,11 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie
 
 Deterministic leak fixed via exposure-layer redaction. `FeatureSpec` now carries an explicit `redact_in_modes: frozenset[ExposureMode]` field — *prescriptive* — alongside the descriptive `leakage_risk` flag. `current_stage` is marked `redact_in_modes={ExposureMode.student_public}`; the writer queries `redacted_columns_for(mode)` and strips matching columns from the snapshot, task splits, and feature dictionary before they hit disk. The pedagogical trap `total_touches_all` is preserved in all modes (no entry in `redact_in_modes`). The manifest records `redacted_columns: [...]` so the bundle is self-describing. `validate_bundle()` cross-checks parquet schemas, feature dictionary, and the manifest's declared redaction set against `redacted_columns_for(mode)` derived independently from the feature spec. Hash-determinism preserved (73/73 identical across builds).
 
-### Follow-up: structural leakage in `student_public` bundles (issue #57)
+### Follow-up: structural leakage in `student_public` bundles (issue #57) — fully resolved
 
 Tracked in [GitHub issue #57](https://github.com/leadforge-dev/leadforge/issues/57).
 
-1. **Event-aggregate features are computed over the label window.** `touch_count`, `session_count`, `pricing_page_views`, `expected_acv`, `days_since_last_touch`, etc. all aggregate events in `[lead_created_at, lead_created_at + 90d]`, the same window over which the label resolves. The structural fix is a windowed snapshot (`snapshot_day=N` with `N < label_window_days`), as v6/v7 datasets already do at day 14/20. **Open** — its own PR with documentation recalibration; will likely bump `BUNDLE_SCHEMA_VERSION` again.
+1. ~~**Event-aggregate features are computed over the label window.** `touch_count`, `session_count`, `pricing_page_views`, `expected_acv`, `days_since_last_touch`, etc. all aggregate events in `[lead_created_at, lead_created_at + 90d]`, the same window over which the label resolves.~~ **Resolved** — windowed snapshot at `snapshot_day=30` (recipe default). `BUNDLE_SCHEMA_VERSION` bumped 3 → 4; `manifest.snapshot_day` records the contract. Conversion rates invariant (label is event-derived from `label_window_days`); trap gap (`total_touches_all − touch_count`) ~3 touches with 54–77% of leads showing divergence. Guarded by `test_bundle_schema_v4_contract.py` and `test_windowed_bundle_trap.py`.
 2. ~~**`is_sql=False` is near-deterministic for non-conversion.** Measured on the regenerated bundle: P(converted | is_sql=False) = 0.038 (intro), 0.015 (intermediate), 0.006 (advanced).~~ **Resolved** — `is_sql` redacted in `student_public` mode by post-#57 PR (bundle schema v3).
 3. ~~**`is_mql` is a constant `True`.** Zero variance feature in all three tiers.~~ **Resolved** — `is_mql` removed from the canonical feature list by post-#57 PR (bundle schema v3). Guarded by a new `test_no_zero_variance_features` check.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,45 @@ Format inspired by [Keep a Changelog](https://keepachangelog.com/).
 
 ## Unreleased
 
+### Bundle schema v4
+
+`bundle_schema_version` bumped from `"3"` to `"4"`.  Closes the final
+sub-item of issue #57: event-aggregate features are no longer computed
+over the same 90-day window the label resolves in.
+
+- **Windowed snapshot.**  `GenerationConfig.snapshot_day` (also exposed
+  as a recipe-level field and an explicit kwarg on
+  `Generator.from_recipe()`) now controls the feature aggregation
+  window.  When set, `build_snapshot()` filters touches, sessions,
+  sales activities, and opportunities to events with timestamp
+  ≤ `lead_created_at + snapshot_day`.  The
+  `b2b_saas_procurement_v1` recipe pins `snapshot_day: 30` —
+  measurements at seed 42, n_leads=5000 across all three difficulty
+  tiers showed day 30 keeps LR AUC in [0.85, 0.86] (challenging but
+  modelable) while preserving a meaningful trap gap of ~3 touches
+  with 54–77% of leads showing any divergence between
+  `total_touches_all` (full-horizon) and `touch_count` (windowed).
+- **Conversion rates unchanged.**  The label is event-derived from
+  `label_window_days` in the simulator and is independent of
+  `snapshot_day`, so the published rates stay at 41.5% / 20.1% / 7.9%
+  (intro / intermediate / advanced) — well inside the declared
+  `difficulty_profiles.yaml` ranges.
+- **`manifest.snapshot_day` recorded.**  The published bundle
+  declares its windowing contract; consumers can distinguish
+  full-horizon (legacy v2/v3) bundles from windowed (v4) bundles
+  without inspecting package internals.  Column SET is unchanged
+  from v3, but column VALUES are no longer full-horizon — a contract
+  shift that v3 consumers would not detect from schema alone.
+- **Schema contract test.**  `tests/render/test_bundle_schema_v3_contract.py`
+  renamed to `test_bundle_schema_v4_contract.py` and gains a
+  `snapshot_day == 30` assertion alongside the existing column-set
+  pinning.
+- **Trap invariant guard.**  New `tests/render/test_windowed_bundle_trap.py`
+  asserts `total_touches_all >= touch_count` for every lead and
+  `>` for at least some — guarding against a future refactor that
+  silently widens `touch_count` back to the full horizon and
+  collapses the pedagogical gap.
+
 ### Bundle schema v3
 
 `bundle_schema_version` bumped from `"2"` to `"3"`.  Three structural
@@ -53,14 +92,6 @@ changes follow up on PR #56 (issue #57):
   columns (down from 35); **11** columns in `tables/leads.parquet`
   (down from 12 — `is_mql` removed).
 
-### Open follow-up
-
-Issue #57 sub-item 1 remains open: event-aggregate features
-(`touch_count`, `session_count`, `pricing_page_views`, ...) are still
-computed over the same 90-day window the label resolves in.  The
-structural fix is a windowed snapshot rebuild and is deferred to its
-own PR.
-
 ---
 
 ## v1.0.0 — 2026-05-02

diff --git a/leadforge/api/bundle.py b/leadforge/api/bundle.py
@@ -94,6 +94,7 @@ def write_bundle(
         result,
         population,
         horizon_days=config.horizon_days,
+        snapshot_day=config.snapshot_day,
         difficulty_params=config.difficulty_params,
         seed=config.seed,
     )

diff --git a/leadforge/api/generator.py b/leadforge/api/generator.py
@@ -54,6 +54,7 @@ def from_recipe(
         horizon_days: int | None = None,
         primary_task: str | None = None,
         label_window_days: int | None = None,
+        snapshot_day: int | None = None,
         output_path: str = _MISSING,  # type: ignore[assignment]
         override: dict[str, Any] | None = None,
     ) -> Generator:
@@ -76,6 +77,10 @@ def from_recipe(
                 directory name and manifest key.
             label_window_days: Override recipe default label observation
                 window in days.
+            snapshot_day: Override recipe default snapshot day for windowed
+                feature aggregation.  ``None`` means full-horizon (legacy)
+                aggregation; an integer ``N`` means features aggregate only
+                events with ``timestamp <= lead_created_at + N days``.
             output_path: Directory where the bundle will be saved.
             override: Optional dict of overrides (mirrors a ``--override`` file).
                 Applied after recipe defaults but before explicit kwargs.
@@ -105,6 +110,7 @@ def from_recipe(
             horizon_days=horizon_days,
             primary_task=primary_task,
             label_window_days=label_window_days,
+            snapshot_day=snapshot_day,
             output_path=output_path,
             override=override,
         )

diff --git a/leadforge/api/recipes.py b/leadforge/api/recipes.py
@@ -41,6 +41,7 @@ class Recipe:
     default_population: dict[str, int]
     horizon_days: int
     label_window_days: int | None = None
+    snapshot_day: int | None = None
 
     # ------------------------------------------------------------------ #
     # Construction
@@ -105,6 +106,19 @@ def from_dict(cls, data: dict[str, Any]) -> Recipe:
                 raise InvalidRecipeError(f"'label_window_days' must be positive, got {raw_lwd}")
             label_window_days = raw_lwd
 
+        snapshot_day: int | None = None
+        raw_sd = data.get("snapshot_day")
+        if raw_sd is not None:
+            if isinstance(raw_sd, bool) or not isinstance(raw_sd, int):
+                raise InvalidRecipeError(
+                    f"'snapshot_day' must be a positive int or null, got {type(raw_sd).__name__!r}"
+                )
+            if raw_sd <= 0:
+                raise InvalidRecipeError(
+                    f"'snapshot_day' must be a positive int or null, got {raw_sd}"
+                )
+            snapshot_day = raw_sd
+
         return cls(
             id=data["id"],
             title=data["title"],
@@ -116,6 +130,7 @@ def from_dict(cls, data: dict[str, Any]) -> Recipe:
             default_population=dict(pop),
             horizon_days=horizon_days,
             label_window_days=label_window_days,
+            snapshot_day=snapshot_day,
         )
 
     # ------------------------------------------------------------------ #
@@ -134,6 +149,7 @@ def resolve_config(
         horizon_days: int | None = None,
         primary_task: str | None = None,
         label_window_days: int | None = None,
+        snapshot_day: int | None = None,
         output_path: str = _MISSING,  # type: ignore[assignment]
         override: dict[str, Any] | None = None,
     ) -> GenerationConfig:
@@ -165,6 +181,7 @@ def resolve_config(
             "horizon_days": pkg["horizon_days"],
             "primary_task": pkg["primary_task"],
             "label_window_days": pkg["label_window_days"],
+            "snapshot_day": pkg["snapshot_day"],
         }
 
         # Layer 3 — recipe defaults
@@ -176,6 +193,8 @@ def resolve_config(
         resolved["primary_task"] = self.primary_task
         if self.label_window_days is not None:
             resolved["label_window_days"] = self.label_window_days
+        if self.snapshot_day is not None:
+            resolved["snapshot_day"] = self.snapshot_day
 
         # Layer 2 — override dict (beats recipe/package defaults)
         if override:
@@ -186,6 +205,7 @@ def resolve_config(
                 "horizon_days",
                 "primary_task",
                 "label_window_days",
+                "snapshot_day",
                 "seed",
                 "output_path",
                 "exposure_mode",
@@ -216,6 +236,8 @@ def resolve_config(
             resolved["primary_task"] = primary_task
         if label_window_days is not None:
             resolved["label_window_days"] = label_window_days
+        if snapshot_day is not None:
+            resolved["snapshot_day"] = snapshot_day
 
         try:
             mode = ExposureMode(resolved["exposure_mode"])
@@ -254,6 +276,7 @@ def resolve_config(
             horizon_days=resolved["horizon_days"],
             primary_task=resolved["primary_task"],
             label_window_days=resolved["label_window_days"],
+            snapshot_day=resolved["snapshot_day"],
             output_path=resolved["output_path"],
         )
 

diff --git a/leadforge/core/models.py b/leadforge/core/models.py
@@ -64,6 +64,7 @@ class GenerationConfig:
     horizon_days: int = 90
     primary_task: str = "converted_within_90_days"
     label_window_days: int = 90
+    snapshot_day: int | None = None
     output_path: str = "./out"
     package_version: str = field(default_factory=lambda: __version__)
     difficulty_params: DifficultyParams | None = None
@@ -87,6 +88,32 @@ def __post_init__(self) -> None:
                 f"label_window_days ({self.label_window_days}) must not exceed "
                 f"horizon_days ({self.horizon_days})"
             )
+        if self.snapshot_day is not None:
+            if isinstance(self.snapshot_day, bool) or not isinstance(self.snapshot_day, int):
+                raise InvalidConfigError(
+                    f"snapshot_day must be a positive int or None, "
+                    f"got {type(self.snapshot_day).__name__!r}"
+                )
+            if self.snapshot_day <= 0:
+                raise InvalidConfigError(
+                    f"snapshot_day must be a positive int or None, got {self.snapshot_day}"
+                )
+            if self.snapshot_day > self.horizon_days:
+                raise InvalidConfigError(
+                    f"snapshot_day ({self.snapshot_day}) must not exceed "
+                    f"horizon_days ({self.horizon_days})"
+                )
+            # A snapshot anchored after the label closes would let features
+            # observe events that occur beyond the label-scoring window —
+            # exactly the structural leakage the windowed snapshot is here
+            # to prevent.  Reject at config time.
+            if self.snapshot_day > self.label_window_days:
+                raise InvalidConfigError(
+                    f"snapshot_day ({self.snapshot_day}) must not exceed "
+                    f"label_window_days ({self.label_window_days}); a snapshot "
+                    f"anchored after the label closes would re-introduce "
+                    f"structural leakage."
+                )
         # Coerce string enums supplied as plain strings
         if not isinstance(self.exposure_mode, ExposureMode):
             try:

diff --git a/leadforge/narrative/dataset_card.py b/leadforge/narrative/dataset_card.py
@@ -54,6 +54,11 @@ def render_dataset_card(
     # ------------------------------------------------------------------
     # Header
     # ------------------------------------------------------------------
+    snapshot_label = (
+        f"{cfg.snapshot_day} days (windowed)"
+        if cfg.snapshot_day is not None and cfg.snapshot_day < cfg.horizon_days
+        else f"{cfg.horizon_days} days (full horizon)"
+    )
     lines += [
         "# leadforge dataset card",
         "",
@@ -65,6 +70,8 @@ def render_dataset_card(
         f"| Exposure mode | `{cfg.exposure_mode}` |",
         f"| Difficulty | `{cfg.difficulty}` |",
         f"| Horizon | {cfg.horizon_days} days |",
+        f"| Label window | {cfg.label_window_days} days |",
+        f"| Feature snapshot window | {snapshot_label} |",
         "",
     ]
 
@@ -188,14 +195,27 @@ def render_dataset_card(
     # ------------------------------------------------------------------
     # Caveats
     # ------------------------------------------------------------------
+    if cfg.snapshot_day is not None and cfg.snapshot_day < cfg.horizon_days:
+        feature_window_caveat = (
+            f"- The label is evaluated over the full {cfg.label_window_days}-day "
+            f"window from lead creation; event-aggregate features (e.g. "
+            f"`touch_count`, `session_count`, `expected_acv`) observe only the "
+            f"first {cfg.snapshot_day} days of that window. The deliberate "
+            f"exception is `total_touches_all`, which counts touches over the "
+            f"full {cfg.horizon_days}-day horizon as a pedagogical leakage trap."
+        )
+    else:
+        feature_window_caveat = (
+            "- Features are anchored at the snapshot date. No post-anchor data is "
+            "included (leakage-free by construction)."
+        )
     lines += [
         "## Caveats",
         "",
         "- This is **synthetic** data. It does not represent any real company, product, or market.",
         "- The hidden world structure varies by motif family and stochastic rewiring; "
         "no two seeds produce the same DGP.",
-        "- Features are anchored at the snapshot date. No post-anchor data is "
-        "included (leakage-free by construction).",
+        feature_window_caveat,
         "- In `student_public` mode, the latent world graph, mechanism summary, "
         "and full world spec are withheld.",
         "",

diff --git a/leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml b/leadforge/recipes/b2b_saas_procurement_v1/recipe.yaml
@@ -18,3 +18,6 @@ default_population:
   n_contacts: 4200
   n_leads: 5000
 horizon_days: 90
+# Feature aggregation window in days; see CHANGELOG (bundle schema v4) and
+# release/README.md for the rationale.  Must satisfy snapshot_day <= label_window_days.
+snapshot_day: 30
diff --git a/leadforge/render/manifests.py b/leadforge/render/manifests.py
@@ -28,7 +28,15 @@
 #         feature list (zero-variance); ``is_sql`` redacted in
 #         ``student_public`` mode (near-deterministic for non-conversion).
 #         ``manifest.redacted_columns`` was already added in PR #56.
-BUNDLE_SCHEMA_VERSION = "3"
+#   "4" — issue #57 sub-item 1: windowed snapshot.  Event-aggregate
+#         features (touch_count, session_count, expected_acv, ...) now
+#         aggregate only events within ``[lead_created_at, lead_created_at
+#         + snapshot_day]``.  Column SET unchanged from v3, but column
+#         VALUES are no longer full-horizon — consumers pinning v3 and
+#         assuming "features computed over full horizon" must update.
+#         ``manifest.snapshot_day`` recorded so the contract is
+#         self-describing (``null`` means full-horizon, legacy behaviour).
+BUNDLE_SCHEMA_VERSION = "4"
 
 # Manifest fields whose value is non-deterministic by design (wall-clock,
 # host metadata, etc.).  Determinism checks must ignore these fields when
@@ -106,6 +114,7 @@ def build_manifest(
         "horizon_days": config.horizon_days,
         "primary_task": config.primary_task,
         "label_window_days": config.label_window_days,
+        "snapshot_day": config.snapshot_day,
         "motif_family": world_graph.motif_family,
         "redacted_columns": redacted_columns_list,
         "tables": tables,