leadforge-dev
diff --git a/‎.agent-plan.md‎
Lines changed: 8 additions & 4 deletions b/‎.agent-plan.md‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎docs/ltv/roadmap.md‎
Lines changed: 17 additions & 10 deletions b/‎docs/ltv/roadmap.md‎
Lines changed: 17 additions & 10 deletions
diff --git a/‎leadforge/schemes/lifecycle/__init__.py‎
Lines changed: 173 additions & 10 deletions b/‎leadforge/schemes/lifecycle/__init__.py‎
Lines changed: 173 additions & 10 deletions
diff --git a/‎leadforge/schemes/lifecycle/render/dataset_card.py‎
Lines changed: 89 additions & 0 deletions b/‎leadforge/schemes/lifecycle/render/dataset_card.py‎
Lines changed: 89 additions & 0 deletions
@@ -82,10 +82,14 @@ split writer + `schemes/lifecycle/tasks.py` task families; discharges the
 `LTV-Pc` regression-task-spec leftover) opened as **#124** (merged). `LTV-Pn.4` split into four (build → write → public-safety
 → orchestrator): `LTV-Pn.4a` (`LifecycleScheme.build_world` — deterministic
 motif sampling + population + sim + `LifecycleArtifacts`; lifecycle relational
-`to_dataframes`; consumes the Pn.3 config fields) opened as **#125**. Next:
-`Pn.4b` (instructor `write_bundle` + tasks), `Pn.4c` (student_public
-snapshot-safety + CLAUDE.md), `Pn.4d` (shared bundle orchestrator), `LTV-Po`
-(recipe).
+`to_dataframes`; consumes the Pn.3 config fields) opened as **#125** (merged). `LTV-Pn.4b` (instructor-mode `write_bundle` —
+first on-disk lifecycle bundle: 6 relational tables + 8 task dirs (both
+regimes) + lifecycle dataset card + manifest extra_fields + hidden-truth
+metadata; difficulty_params threaded; student_public refused until 4c) opened
+as **#126**. Next: `Pn.4c` (student_public snapshot-safety + CLAUDE.md +
+recipe-driven difficulty resolution), `Pn.4d` (shared bundle orchestrator),
+`LTV-Po` (recipe).  Note: `validate_bundle` is lead-scoring-coupled — scheme-
+aware validation is `LTV-Pp`.
 
 ---
 
 
@@ -46,7 +46,7 @@ protocol + registry, with the package physically reorganized into
 | `LTV-M3` | Customer population + lifecycle world | `LTV-Ph`, `LTV-Pi` | #113 (Ph) |
 | `LTV-M4` | Lifecycle simulation engine | `LTV-Pj`, `LTV-Pk` | #117 (Pj), #118 (Pk) |
 | `LTV-M5` | Customer snapshots + pLTV targets (both regimes) | `LTV-Pl`, `LTV-Pm` | #119 (Pl), #120 (Pm) |
-| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a) |
+| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a), #126 (Pn.4b) |
 | `LTV-M7` | Validation + regression-metric calibration | `LTV-Pp` | |
 | `LTV-M8` | CLI, notebooks, publish | `LTV-Pq`, `LTV-Pr`, `LTV-Ps` | |
 
@@ -325,15 +325,22 @@ methods, then public-safety, then the carried orchestrator cleanup:
   `write_bundle` still stubbed.
   - Tests: determinism, cross-seed motif variability, FK integrity, table shapes.
   - Labels: `type: feature`, `layer: api`, `layer: render`
-- [ ] **`LTV-Pn.4b`** — `feat(lifecycle): write_bundle (instructor) + tasks`.
-  Instructor-mode `write_bundle`: relational tables; both regime snapshots →
-  8 task dirs (3 pLTV regression + churn, × 2 regimes) via the shared writer;
-  dataset card; feature dictionary; manifest with `generation_scheme` +
-  `observation_date` + windows (`extra_fields`); lifecycle `write_metadata`
-  hidden-truth hook (latent registry + mechanism summary).  First on-disk
-  lifecycle bundle.  **Must resolve `difficulty_params` from the active profile
-  and thread it into `build_customer_snapshot` (Pn.4a's `build_world` does not —
-  without this the snapshot distortions never fire and every tier is identical).**
+- [x] **`LTV-Pn.4b`** — `feat(lifecycle): write_bundle (instructor) + tasks`
+  (**PR #126**).  Instructor-mode `write_bundle` produces the first on-disk
+  lifecycle bundle: six relational tables; both regime snapshots → 8 task dirs
+  (3 pLTV regression + churn, × 2 regimes) via the shared writer; a lifecycle
+  dataset card (`render/dataset_card.py` — the lead-scoring card is too
+  coupled to reuse); feature dictionary; manifest with `generation_scheme` +
+  `observation_date` + `forward_windows_days` (`extra_fields`); lifecycle
+  `write_metadata` hidden-truth hook (latent registry + mechanism summary;
+  no graph).  `config.difficulty_params` is **threaded** into both snapshot
+  builders (tested), so recipe-resolved difficulty will drive distortions;
+  recipe-driven *resolution* of `difficulty_params` lands in `LTV-Po`.
+  `student_public` is **refused** (raises) until `LTV-Pn.4c` adds the
+  snapshot-safe export — never emit an unsafe public bundle.
+  - **Flagged:** `validation.bundle_checks.validate_bundle` is lead-scoring-
+    coupled (applies lead-scoring FK/table/task checks) and errors on a
+    lifecycle bundle; scheme-aware validation is `LTV-Pp`.
   - Labels: `type: feature`, `layer: api`, `layer: render`
 - [ ] **`LTV-Pn.4c`** — `feat(lifecycle): student_public snapshot-safety`.
   Public relational filtering (event tables ≤ cutoff; drop terminal
 
@@ -2,9 +2,9 @@
 
 The second peer scheme alongside ``lead_scoring``.  Its entity rows and FK
 constraints live here (``entities`` / ``relationships``); the snapshot, feature,
-and task definitions live in sibling modules.  :meth:`LifecycleScheme.build_world`
-is implemented (LTV-Pn.4a); :meth:`write_bundle` / :meth:`write_metadata` are
-built out in LTV-Pn.4b–c and currently raise :class:`NotImplementedError`.
+and task definitions live in sibling modules.  ``build_world`` (LTV-Pn.4a) and
+the instructor-mode ``write_bundle`` / ``write_metadata`` (LTV-Pn.4b) are
+implemented; the ``student_public`` snapshot-safe export lands in LTV-Pn.4c.
 """
 
 from __future__ import annotations
@@ -20,11 +20,6 @@
     from leadforge.core.models import GenerationConfig, WorldBundle
     from leadforge.narrative.spec import NarrativeSpec
 
-_NOT_IMPLEMENTED = (
-    "the lifecycle (b2b_saas_ltv_v1) write path is not implemented yet; "
-    "it is built across LTV-Pn.4b–c"
-)
-
 
 def _sample_motif_family(rng: random.Random) -> str:
     """Deterministically pick a retention motif family for this world.
@@ -74,11 +69,26 @@ def build_world(
           ``narrative.yaml`` will not drive them until ``LTV-Po`` decides
           whether the lifecycle scheme should consume the narrative spec.
         """
+        from leadforge.core.exceptions import InvalidConfigError
         from leadforge.core.models import WorldBundle, WorldSpec
         from leadforge.core.rng import RNGRoot
         from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts
         from leadforge.schemes.lifecycle.engine import simulate_lifecycle
         from leadforge.schemes.lifecycle.population import build_customer_population
+        from leadforge.schemes.lifecycle.snapshots import FORWARD_WINDOWS_DAYS
+
+        # config.forward_windows_days is not yet threaded into the snapshot
+        # builder, which exports the fixed FORWARD_WINDOWS_DAYS targets.  Reject
+        # an override now (clear, early) rather than emit a bundle whose manifest
+        # disagrees with its task dirs, or under-simulate and fail opaquely later.
+        # Threading config-driven windows through is tracked for a later step.
+        if tuple(config.forward_windows_days) != tuple(FORWARD_WINDOWS_DAYS):
+            raise InvalidConfigError(
+                f"config.forward_windows_days={tuple(config.forward_windows_days)} differs "
+                f"from the lifecycle scheme's exported windows {tuple(FORWARD_WINDOWS_DAYS)}; "
+                "config-driven forward windows are not yet supported (the snapshot builder "
+                "exports the fixed set).  Use the default until that wiring lands."
+            )
 
         motif_rng = RNGRoot(config.seed).child("lifecycle_motif")
         motif_family = _sample_motif_family(motif_rng)
@@ -112,10 +122,163 @@ def write_bundle(
         path: str,
         generation_timestamp: str | None = None,
     ) -> None:
-        raise NotImplementedError(_NOT_IMPLEMENTED)
+        """Serialise a lifecycle *bundle* to *path* (instructor mode).
+
+        Writes the six relational tables, both observation regimes' snapshots
+        split into 8 task directories (3 pLTV regression + 1 churn
+        classification per regime, the early regime prefixed ``early_``), a
+        dataset card, the feature dictionary, the hidden-truth ``metadata/``
+        (via :meth:`write_metadata`), and the manifest (recording
+        ``generation_scheme`` + ``observation_date`` + the forward windows).
+
+        ``config.difficulty_params`` is threaded into both snapshot builders —
+        when set (LTV-Po resolves it from the recipe profile), it drives the
+        snapshot distortions.
+
+        Only ``research_instructor`` mode is supported here.  The
+        ``student_public`` snapshot-safety projection (event-table cutoff
+        filtering, terminal-column drops, per-task target projection) lands in
+        LTV-Pn.4c; until then this refuses to write a public bundle rather than
+        emit one that is not snapshot-safe.
+        """
+        from pathlib import Path
+
+        from leadforge.core.enums import ExposureMode
+        from leadforge.exposure.modes import apply_exposure
+        from leadforge.render.manifests import build_manifest, write_manifest
+        from leadforge.render.relational_io import write_relational_tables
+        from leadforge.render.tasks import write_task_splits
+        from leadforge.schema.dictionaries import write_feature_dictionary
+        from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts
+        from leadforge.schemes.lifecycle.features import CUSTOMER_SNAPSHOT_FEATURES
+        from leadforge.schemes.lifecycle.render.dataset_card import render_lifecycle_dataset_card
+        from leadforge.schemes.lifecycle.render.relational import to_dataframes
+        from leadforge.schemes.lifecycle.snapshots import (
+            FORWARD_WINDOWS_DAYS,
+            build_customer_snapshot,
+            build_early_pltv_snapshot,
+        )
+        from leadforge.schemes.lifecycle.tasks import (
+            CALENDAR_REGIME,
+            EARLY_REGIME,
+            lifecycle_task_manifests,
+        )
+
+        artifacts = bundle.artifacts
+        if not isinstance(artifacts, LifecycleArtifacts):
+            raise RuntimeError(
+                "WorldBundle is not populated with lifecycle artifacts. "
+                "Call Generator.generate() / build_world() first."
+            )
+        config = bundle.spec.config
+        if config.exposure_mode is not ExposureMode.research_instructor:
+            raise NotImplementedError(
+                f"lifecycle write_bundle currently supports only "
+                f"research_instructor; {config.exposure_mode.value!r} (snapshot-safe "
+                "public export) lands in LTV-Pn.4c"
+            )
+
+        population = artifacts.population
+        sim = artifacts.simulation_result
+        root = Path(path)
+        root.mkdir(parents=True, exist_ok=True)
+
+        # 1. Relational tables → tables/
+        dfs = to_dataframes(sim, population)
+        table_row_counts = write_relational_tables(dfs, root / "tables")
+
+        # 2. Both regime snapshots → 8 task directories.
+        #    difficulty_params (None until LTV-Po resolves it) drives distortions.
+        snapshots = {
+            CALENDAR_REGIME: build_customer_snapshot(
+                population, sim, difficulty_params=config.difficulty_params, seed=config.seed
+            ),
+            EARLY_REGIME: build_early_pltv_snapshot(
+                population,
+                sim,
+                early_tenure_weeks=config.early_tenure_weeks,
+                difficulty_params=config.difficulty_params,
+                seed=config.seed,
+            ),
+        }
+        # Each task is a standalone single-target split: drop every OTHER
+        # target column so a task's parquet cannot leak the answer's siblings
+        # (e.g. ltv_revenue_730d ⊇ ltv_revenue_90d).  The deliberate
+        # mrr_change_full_period trap (leakage_risk but not a target) is kept.
+        all_target_cols = {f.name for f in CUSTOMER_SNAPSHOT_FEATURES if f.is_target}
+        task_row_counts: dict[str, dict[str, int]] = {}
+        all_tasks = []
+        for regime, snapshot in snapshots.items():
+            for task in lifecycle_task_manifests(regime):
+                other_targets = [
+                    c for c in all_target_cols - {task.label_column} if c in snapshot.columns
+                ]
+                task_df = snapshot.drop(columns=other_targets)
+                counts = write_task_splits(task_df, root / "tasks", seed=config.seed, task=task)
+                task_row_counts[task.task_id] = counts
+                all_tasks.append(task)
+
+        # 3. Dataset card + feature dictionary
+        (root / "dataset_card.md").write_text(
+            render_lifecycle_dataset_card(
+                bundle.spec,
+                table_counts=table_row_counts,
+                tasks=tuple(all_tasks),
+                observation_date=population.observation_date,
+            )
+        )
+        write_feature_dictionary(
+            root / "feature_dictionary.csv", features=tuple(CUSTOMER_SNAPSHOT_FEATURES)
+        )
+
+        # 4. Exposure metadata (delegates hidden truth to write_metadata)
+        apply_exposure(bundle, root, config.exposure_mode)
+
+        # 5. Manifest
+        manifest = build_manifest(
+            config=config,
+            generation_scheme=self.name,
+            motif_family=artifacts.motif_family,
+            table_row_counts=table_row_counts,
+            task_row_counts=task_row_counts,
+            bundle_root=root,
+            generation_timestamp=generation_timestamp,
+            extra_fields={
+                "observation_date": population.observation_date,
+                # The actual exported target windows (source of truth), not
+                # config.forward_windows_days — build_world rejects any mismatch.
+                "forward_windows_days": list(FORWARD_WINDOWS_DAYS),
+                "early_tenure_weeks": config.early_tenure_weeks,
+            },
+        )
+        write_manifest(manifest, root)
 
     def write_metadata(self, bundle: WorldBundle, meta_dir: Path) -> None:
-        raise NotImplementedError(_NOT_IMPLEMENTED)
+        """Write the lifecycle hidden-truth files into *meta_dir*.
+
+        Called by :func:`leadforge.exposure.modes.apply_exposure` after the
+        shared ``world_spec.json``.  The lifecycle scheme has no hidden graph;
+        its latent truth is the per-entity latent registry and the
+        motif-derived mechanism parameters.
+        """
+        import json
+
+        from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts
+        from leadforge.schemes.lifecycle.render.metadata import (
+            latent_registry_dict,
+            mechanism_summary_dict,
+        )
+
+        artifacts = bundle.artifacts
+        if not isinstance(artifacts, LifecycleArtifacts):
+            raise RuntimeError("WorldBundle is not populated with lifecycle artifacts.")
+
+        (meta_dir / "latent_registry.json").write_text(
+            json.dumps(latent_registry_dict(artifacts.population.latent_state), indent=2)
+        )
+        (meta_dir / "mechanism_summary.json").write_text(
+            json.dumps(mechanism_summary_dict(artifacts.motif_family), indent=2)
+        )
 
 
 LIFECYCLE_SCHEME = LifecycleScheme()
 
@@ -0,0 +1,89 @@
+"""Dataset-card renderer for the lifecycle (pLTV) scheme.
+
+The lead-scoring card (:func:`leadforge.narrative.dataset_card.render_dataset_card`)
+is hard-coupled to the lead-scoring framing (binary conversion label, single
+task, narrative-driven firmographics), so the lifecycle scheme renders its own.
+Kept deliberately concise for LTV-Pn.4b; richer prose can follow.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from leadforge.core.models import WorldSpec
+    from leadforge.schema.tasks import TaskManifest
+
+__all__ = ["render_lifecycle_dataset_card"]
+
+
+def render_lifecycle_dataset_card(
+    world_spec: WorldSpec,
+    *,
+    table_counts: dict[str, int],
+    tasks: tuple[TaskManifest, ...],
+    observation_date: str,
+) -> str:
+    """Return a Markdown dataset card for a lifecycle (pLTV) bundle."""
+    cfg = world_spec.config
+    tier = (str(cfg.difficulty) if cfg.difficulty else "unknown").capitalize()
+
+    lines: list[str] = [
+        f"# B2B SaaS pLTV Dataset — {tier} Tier",
+        "",
+        "## What this is",
+        "",
+        "A synthetic B2B SaaS customer base simulated week by week from "
+        "acquisition through retention, expansion, and churn.  The prediction "
+        "task is **predicted lifetime value (pLTV)**: a continuous, "
+        "zero-inflated, right-skewed regression target — forecast each "
+        "customer's future gross revenue over a fixed forward window.  Customer "
+        "churn is provided as a secondary classification label.",
+        "",
+        "## Two observation regimes",
+        "",
+        "- **Calendar-anchored (standard)** — every customer observed at the "
+        f"fixed observation date (`{observation_date}`); tenure varies from "
+        "cold to mature.  Task ids: `pltv_revenue_*`, `churned_within_180d`.",
+        "- **Tenure-anchored (early-pLTV)** — every customer observed at a "
+        f"fixed short tenure (`customer_start + {cfg.early_tenure_weeks}w`); the "
+        "genuine cold-start case.  Task ids prefixed `early_`.",
+        "",
+        "## Tasks",
+        "",
+        "| task_id | type | target | window (days) |",
+        "|---|---|---|---|",
+    ]
+    for t in tasks:
+        lines.append(
+            f"| `{t.task_id}` | {t.task_type} | `{t.label_column}` | {t.label_window_days} |"
+        )
+
+    lines += [
+        "",
+        "## Relational tables",
+        "",
+        "| table | rows |",
+        "|---|---|",
+    ]
+    for name, count in table_counts.items():
+        lines.append(f"| `{name}` | {count} |")
+
+    lines += [
+        "",
+        "## Leakage trap",
+        "",
+        "`mrr_change_full_period` is a deliberate trap: it is computed through "
+        "the end of simulation, so post-cutoff expansions inflate it.  Use "
+        "`mrr_change_at_snapshot` (computed strictly at the cutoff) instead.",
+        "",
+        "## Reproducibility",
+        "",
+        f"- Recipe: `{cfg.recipe_id}`",
+        f"- Seed: `{cfg.seed}`",
+        f"- Scheme: `{world_spec.scheme}`",
+        "",
+        "Deterministic given (recipe, config, seed, package version).",
+        "",
+    ]
+    return "\n".join(lines)