From 636b9226d55deeaf580a28919e8e10d53b6296ac Mon Sep 17 00:00:00 2001
From: Shay Palachy <shaypal5@users.noreply.github.com>
Date: Wed, 10 Jun 2026 09:57:16 +0300
Subject: [PATCH] docs(ltv): plan the b2b_saas_ltv_v1 LTV dataset workstream
 [LTV-Pa]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Solidify the customer-lifetime-value workstream design and roadmap before
any package code lands. New recipe on the existing framework via a
recipe_type: lifecycle dispatch hook — not a fork.

- docs/ltv/design.md — single source of truth: locked decisions (D1-D5),
  new entities (subscription_events, health_signals, invoices), lifecycle
  mechanisms (churn/expansion/payment), motif families, leakage trap
  (mrr_change_full_period), absolute observation_date anchor, snapshot-safety
  contract, framework change inventory, difficulty profiles.
- docs/ltv/roadmap.md — milestones LTV-M0..M7 and PRs LTV-Pa..Pn, with a
  planning notation deliberately distinct from GitHub #NNN numbering.
- .agent-plan.md — add the LTV workstream as the next workstream pointer.
- CLAUDE.md — link the LTV design + roadmap in Reference Docs.

Locked decisions (2026-06-10): primary task churn_within_180_days; weekly
simulation steps; independent generation (chaining later); staggered start
dates + fixed absolute observation_date; bundle schema version 5 -> 6.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .agent-plan.md      |  23 ++-
 CLAUDE.md           |   1 +
 docs/ltv/design.md  | 351 ++++++++++++++++++++++++++++++++++++++++++++
 docs/ltv/roadmap.md | 198 +++++++++++++++++++++++++
 4 files changed, 572 insertions(+), 1 deletion(-)
 create mode 100644 docs/ltv/design.md
 create mode 100644 docs/ltv/roadmap.md

diff --git a/.agent-plan.md b/.agent-plan.md
index 71d0b85..4bb553b 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -12,7 +12,28 @@
 
 ---
 
-## Next Up — `leadforge-lead-scoring-v1` Curated Dataset Release
+## Next Workstream — `b2b_saas_ltv_v1` LTV Dataset (planning)
+
+Goal: add a customer-lifetime-value dataset family — post-conversion
+subscription lifecycle (renewals, expansion, churn, payment failure) on the
+same Veridian vertical. New recipe on the existing framework via a
+`recipe_type: lifecycle` dispatch hook, **not** a fork.
+
+**Source of truth:** `docs/ltv/design.md`
+**Roadmap (milestones/PRs):** `docs/ltv/roadmap.md` — planning notation
+`LTV-M1…M7` (milestones) / `LTV-Pa…Pn` (PRs), kept distinct from GitHub `#NNN`.
+**GitHub milestone:** `dataset: leadforge-ltv-v1`
+
+Locked decisions (2026-06-10): primary task `churn_within_180_days`; weekly
+simulation steps; independent generation (chaining later); staggered start
+dates + fixed absolute `observation_date`; bundle schema version 5 → 6.
+
+Status: `LTV-M0` (planning) — design + roadmap landed. `LTV-M1` (schema
+foundation) is next.
+
+---
+
+## Prior Workstream — `leadforge-lead-scoring-v1` Curated Dataset Release
 
 Goal: ship a best-in-class educational synthetic CRM lead-scoring dataset family to Kaggle and Hugging Face. Dataset version is decoupled from the leadforge package version (package stays at `1.x`).
 
diff --git a/CLAUDE.md b/CLAUDE.md
index e93486c..a8e6c06 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -357,6 +357,7 @@ The current focus is producing a v4 lead scoring intro dataset. See `docs/v4/` f
 ---
 
 ## Reference Docs
+- **LTV workstream (next, active planning): `docs/ltv/design.md` + `docs/ltv/roadmap.md`**
 - Design decisions: `docs/leadforge_design_doc.md`
 - Architecture/spec: `docs/leadforge_architecture_spec.md`
 - Implementation roadmap: `docs/leadforge_implementation_plan.md`
diff --git a/docs/ltv/design.md b/docs/ltv/design.md
new file mode 100644
index 0000000..de75a4c
--- /dev/null
+++ b/docs/ltv/design.md
@@ -0,0 +1,351 @@
+# LTV Dataset Design — `b2b_saas_ltv_v1`
+
+> **Single source of truth** for the customer-lifetime-value workstream. The
+> companion `roadmap.md` breaks this design into milestones and PRs. Update
+> both as the workstream progresses.
+
+**Status:** planning (pre-implementation)
+**Owner:** @shaypalachy
+**Vertical:** same Veridian Technologies procurement/AP SaaS world as
+`b2b_saas_procurement_v1`, but **post-conversion**: subscriptions, renewals,
+expansion, churn, and lifetime value.
+
+---
+
+## 1. Goal
+
+Add a customer-lifecycle dataset family to leadforge that picks up where the
+lead-scoring dataset leaves off. The lead-scoring recipe ends at `closed_won`;
+the LTV recipe begins there and simulates the subscription lifecycle.
+
+**Teaching goals (analogous to lead scoring):**
+
+- Predict churn / lifetime value for active customers from a point-in-time
+  snapshot.
+- Teach **survival analysis** (right-censoring discipline), **cohort
+  analysis**, and **expansion-revenue modelling**.
+- Reproduce realistic confusions: leakage traps around future expansion
+  events, right-censoring of still-active customers, and time-window
+  discipline on health-signal features.
+
+This is a **new recipe on the existing framework**, not a fork. The CLI,
+bundle format, RNG system, exposure modes, manifest schema, and validation
+harness are reused. A single dispatch hook in `Generator.generate()` selects
+the lifecycle simulation path when the recipe declares `recipe_type:
+lifecycle`.
+
+---
+
+## 2. Locked design decisions
+
+These five questions were resolved by the maintainer on 2026-06-10. They are
+load-bearing for everything below.
+
+| # | Question | Decision |
+|---|----------|----------|
+| D1 | Primary task for the first shipped recipe | **`churn_within_180_days`** (binary). `ltv_bucket_6m` ships as a secondary task in the same bundle. |
+| D2 | Simulation time resolution | **Weekly steps** — gives granular health-signal trend curves for teaching trend features. Billing/renewal events resolve to the enclosing week. |
+| D3 | Independent vs chained generation | **Independent for v1.** The customer population is generated self-contained, not chained from a lead-scoring bundle. Optional chaining is designed-for-later but not built now. |
+| D4 | Observation cohort | **Staggered start dates, fixed observation date.** Customers are acquired across an acquisition window; all are observed at one absolute calendar date, so tenure-at-observation varies (cold-start customers vs mature customers). |
+| D5 | Bundle schema version | **Bump** `BUNDLE_SCHEMA_VERSION` (currently `5` → `6`). The table inventory and the customers/subscriptions semantics change substantially. |
+
+### 2.1 Consequence of D4 — absolute observation anchor
+
+This is the most important framework divergence. The lead-scoring path filters
+events by a **per-entity relative** cutoff (`lead_created_at + snapshot_day`).
+With staggered starts + a single fixed observation date (D4), the lifecycle
+path filters by an **absolute calendar** cutoff (`observation_date`, the same
+for every customer).
+
+Therefore the lifecycle path does **not** reuse `snapshot_day`. It introduces
+an `observation_date` concept derived deterministically from the world
+calendar (acquisition-window end) and recorded in the manifest. The customer
+snapshot builder is a **separate function** (`build_customer_snapshot`) so it
+can apply absolute-cutoff filtering without touching the lead-scoring
+`build_snapshot` path.
+
+### 2.2 Consequence of D1 — right-censoring is a property, not a label problem
+
+Because the primary task is `churn_within_180_days`, every customer gets a
+**definite** binary label as long as the simulation runs through
+`observation_date + 180 days` for all customers. A customer still active at
+`observation_date + 180d` is a clean negative, not a censored row.
+
+Right-censoring is still taught — but as a property of the **relational
+tables** and the **secondary LTV task**: a customer still active at the end of
+the simulation horizon is right-censored for *total* lifetime value. The
+notebooks teach censoring discipline on the relational data and on
+`ltv_bucket_6m`, while the headline binary task stays clean. This is a cleaner
+split than making censoring a label-derivation hazard.
+
+---
+
+## 3. Entities and tables
+
+### 3.1 New entity rows (in `leadforge/schema/entities.py`)
+
+**`SubscriptionEventRow`** — the lifecycle backbone; one row per state change.
+
+| column | dtype | notes |
+|--------|-------|-------|
+| `event_id` | string | opaque ID, `subev_000001` |
+| `subscription_id` | string | FK → subscriptions |
+| `customer_id` | string | FK → customers |
+| `event_timestamp` | string | ISO-8601 (week-resolved) |
+| `event_type` | string | `renewal` / `expansion` / `downgrade` / `churn` / `payment_failure` / `payment_recovered` |
+| `mrr_before` | Int64 | MRR (USD) before the event |
+| `mrr_after` | Int64 | MRR (USD) after; `0` on churn |
+| `contract_term_months_new` | Int64 \| null | set on `renewal` only |
+
+**`HealthSignalRow`** — weekly product-usage telemetry; the core predictive signal.
+
+| column | dtype | notes |
+|--------|-------|-------|
+| `signal_id` | string | opaque ID |
+| `customer_id` | string | FK → customers |
+| `period_start` | string | ISO-8601 first day of the signal week |
+| `active_users` | Int64 | weekly active users |
+| `feature_depth_score` | Float64 | 0–1, latent-derived breadth-of-use |
+| `support_tickets` | Int64 | tickets opened that week |
+| `nps_score` | Int64 \| null | quarterly survey; null most weeks |
+
+**`InvoiceRow`** — monthly billing; drives the payment-failure mechanism.
+
+| column | dtype | notes |
+|--------|-------|-------|
+| `invoice_id` | string | opaque ID |
+| `customer_id` | string | FK → customers |
+| `invoice_date` | string | ISO-8601 |
+| `amount_usd` | Int64 | |
+| `payment_status` | string | `paid` / `failed` / `recovered` / `written_off` |
+
+### 3.2 Extended existing entity rows
+
+The current `CustomerRow` (4 fields) and `SubscriptionRow` (5 fields,
+`subscription_status` hardcoded `"active"`) are shells. The lifecycle recipe
+fills them out. **The lead-scoring recipe keeps the thin versions** — the new
+fields are nullable/optional so the procurement recipe's output is unchanged.
+
+`CustomerRow` gains: `initial_mrr`, `initial_plan`, `contract_term_months`,
+`csm_rep_id`.
+
+`SubscriptionRow` gains: `current_mrr`, `subscription_end_at`, `churn_at`,
+`churn_reason`, `renewal_count`, `expansion_count`.
+
+### 3.3 Public lifecycle table inventory
+
+| table | public (`student_public`) | instructor (`research_instructor`) |
+|-------|---------------------------|-------------------------------------|
+| accounts | ✓ | ✓ |
+| customers | ✓ (terminal fields redacted) | ✓ full |
+| subscriptions | ✓ (terminal fields redacted) | ✓ full |
+| subscription_events | ✓ (filtered to `<= observation_date`) | ✓ full horizon |
+| health_signals | ✓ (filtered to `<= observation_date`) | ✓ full horizon |
+| invoices | ✓ (filtered to `<= observation_date`) | ✓ full horizon |
+
+Contacts/leads/touches/etc. from the lead-scoring world are **not** part of
+the LTV bundle (independent generation, D3).
+
+---
+
+## 4. Snapshot-safety contract (lifecycle)
+
+Analogous to the lead-scoring hard constraint in `CLAUDE.md`, but anchored on
+the **absolute** `observation_date`:
+
+- Every timestamp column in public event tables
+  (`subscription_events.event_timestamp`, `health_signals.period_start`,
+  `invoices.invoice_date`) must satisfy `<= observation_date`.
+- No terminal/outcome fields in public `customers` / `subscriptions`:
+  `churn_at`, `churn_reason`, `subscription_end_at`, and the derived
+  `churned_within_180_days` label are banned from the public relational
+  tables.
+- `subscription_events` rows with `event_type == "churn"` after
+  `observation_date` are excluded from public bundles (they encode the label).
+- No flat snapshot feature may use events after `observation_date`, **except**
+  the deliberately-retained leakage trap (§6).
+
+This contract is enforced by lifecycle-specific leakage probes (see roadmap
+`LTV-M6`) and recorded in the manifest. `CLAUDE.md`'s hard-constraints section
+gains a lifecycle clause when the recipe wiring lands (`LTV-M5`).
+
+---
+
+## 5. Simulation mechanisms
+
+Three new mechanism types, none of which exist today. All reuse the existing
+`LatentScore` + per-step Bernoulli pattern from
+`leadforge/mechanisms/hazards.py`.
+
+**Churn hazard (post-conversion).** Weekly probability driven by health
+signals + latent traits, structurally different from the flat pre-conversion
+`_DAILY_CHURN_RATE`:
+
+- Weibull-shaped over tenure: elevated in weeks 1–12 (onboarding instability),
+  low in the steady middle.
+- **Renewal-date spike**: at each contract anniversary the churn hazard is
+  ~10× the background rate (discrete renewal decision).
+- Drivers: `latent_product_fit` (background), `latent_champion_strength`
+  (renewal-date), `feature_depth_score` (leading indicator),
+  unrecovered payment failures (financial trigger).
+
+**Expansion propensity.** Weekly probability of a plan upgrade / seat add:
+
+- Drivers: `latent_adoption_velocity`, `feature_depth_score`, active-user
+  growth, `employee_band` (expansion ceiling).
+- Expansion MRR delta: `randint(0.25·mrr, 1.0·mrr)`.
+
+**Payment failure.** Monthly billing event with a failure probability:
+
+- Driver: `latent_budget_stability`.
+- Dunning: 3 months of `failed` before escalation to `recovered` or
+  `written_off`; unrecovered → forced churn.
+
+### 5.1 Lifecycle motif families (new)
+
+The 5 lead-scoring motifs describe *buying*; the LTV world needs *retention*
+motifs:
+
+| family | retention driver |
+|--------|------------------|
+| `product_led_retention` | `latent_product_fit` dominant; health signals strongly predictive |
+| `relationship_led_retention` | `latent_champion_strength` dominant; health weaker |
+| `expansion_led_growth` | low churn, high upsell; LTV variance from expansion |
+| `payment_fragile` | financially-triggered churn; `latent_budget_stability` dominant |
+| `churner_dominated` | high background churn; strong early-warning signals (teaching tier) |
+
+### 5.2 New customer latent traits
+
+`latent_product_fit`, `latent_adoption_velocity`, `latent_budget_stability`,
+`latent_champion_strength`, `latent_organizational_stability`. Sampled from the
+same clipped-Gaussian `_sample_latent` helper, with motif-family mean biases
+mirroring `_MOTIF_LATENT_BIAS`.
+
+---
+
+## 6. Leakage trap
+
+**Primary trap: `mrr_change_full_period` vs `mrr_change_at_snapshot`.**
+
+- `mrr_change_at_snapshot` = `current_mrr − initial_mrr` measured at
+  `observation_date`. **Valid.**
+- `mrr_change_full_period` = MRR delta from start to **end of simulation**.
+  **Leaks** — future expansions (which correlate with high LTV / low churn)
+  inflate it. Retained in all modes (`leakage_risk=True,
+  redact_in_modes=frozenset()`), documented in the feature dictionary and
+  release notes, exactly like `total_touches_all`.
+
+Why it's a good trap: both columns are "MRR delta", differing only by window;
+standalone AUC is moderate (looks useful, not obviously inflated); tree models
+extract more from it than LR (reproduces the NB03 lesson); removing it causes a
+measurable-but-not-catastrophic drop.
+
+**Secondary trap (advanced tier): `last_health_signal_post_obs`** — a
+health-signal reading from *after* `observation_date`, named to look like a
+current-state feature. More subtle because the column name doesn't reveal the
+time shift.
+
+A trap-invariant test (analogous to `test_windowed_bundle_trap.py`) asserts
+`mrr_change_full_period` diverges from `mrr_change_at_snapshot` for a
+non-trivial fraction of customers and never contradicts it in sign for
+expansion-only customers.
+
+---
+
+## 7. Customer snapshot features (at `observation_date`)
+
+Grouped like `LEAD_SNAPSHOT_FEATURES`. New constant
+`CUSTOMER_SNAPSHOT_FEATURES` in `leadforge/schema/features.py`.
+
+**Account** (from `AccountRow`): `industry`, `region`, `employee_band`,
+`estimated_revenue_band`.
+
+**Customer/subscription:** `tenure_weeks`, `initial_plan`, `current_plan`,
+`initial_mrr`, `current_mrr`, `mrr_change_at_snapshot`, `renewal_count`,
+`expansion_count`, `downgrade_count`, `contract_term_months`,
+`weeks_to_next_renewal`.
+
+**Health (aggregated over last 12 weeks before `observation_date`):**
+`avg_active_users_l12w`, `active_user_trend_l12w` (slope),
+`avg_feature_depth_l12w`, `support_ticket_count_l12w`, `last_nps_score`
+(nullable).
+
+**Financial:** `payment_failure_count`, `weeks_since_last_payment_failure`
+(nullable).
+
+**Leakage trap (all modes):** `mrr_change_full_period`.
+
+**Target (primary, `churn_within_180_days`):** `churned_within_180_days`
+(boolean) — True iff a `churn` event falls in
+`[observation_date, observation_date + 180d]`.
+
+**Target (secondary, `ltv_bucket_6m`):** quartile (`low`/`medium`/`high`/`top`)
+of revenue collected (paid invoices) in the 6 months after `observation_date`.
+
+---
+
+## 8. Framework changes inventory
+
+### New files
+
+| file | purpose |
+|------|---------|
+| `leadforge/simulation/lifecycle.py` | `simulate_lifecycle()` — weekly-step subscription simulator |
+| `leadforge/simulation/customer_population.py` | `build_customer_population()` — customer entities + latents + staggered starts |
+| `leadforge/render/customer_snapshots.py` | `build_customer_snapshot()` — per-customer row at absolute `observation_date` |
+| `leadforge/mechanisms/lifecycle_hazards.py` | churn hazard, expansion propensity, payment failure |
+| `leadforge/recipes/b2b_saas_ltv_v1/{recipe,narrative,difficulty_profiles}.yaml` | new recipe |
+
+### Modified files
+
+| file | change |
+|------|--------|
+| `leadforge/schema/entities.py` | add 3 rows; extend `CustomerRow`/`SubscriptionRow` |
+| `leadforge/schema/features.py` | add `CUSTOMER_SNAPSHOT_FEATURES` |
+| `leadforge/schema/tasks.py` | add `CHURN_WITHIN_180_DAYS`, `LTV_BUCKET_6M` |
+| `leadforge/schema/relationships.py` | FK constraints for new tables |
+| `leadforge/core/models.py` | add `n_customers: int \| None`; lifecycle config fields |
+| `leadforge/api/recipes.py` | parse `recipe_type` + `lifecycle:` section |
+| `leadforge/api/generator.py` | dispatch on `recipe.recipe_type` |
+| `leadforge/render/manifests.py` | record `observation_date`, bump schema version to 6 |
+| `leadforge/validation/*` | lifecycle leakage probes + realism bands |
+| `CLAUDE.md` | lifecycle snapshot-safety hard-constraint clause; reference docs |
+
+### Unchanged (reused as-is)
+
+CLI commands/flags, `WorldBundle`/`WorldSpec`, RNG architecture, exposure-mode
+dispatch, `manifest.json` envelope (additive), determinism/monotonicity
+invariants.
+
+---
+
+## 9. Difficulty profiles
+
+| dimension | intro | intermediate | advanced |
+|-----------|-------|--------------|----------|
+| `signal_strength` | 0.90 | 0.70 | 0.50 |
+| `noise_scale` | 0.10 | 0.30 | 0.55 |
+| `missing_rate` | 0.02 | 0.08 | 0.18 |
+| `annual_churn_rate_range` | [0.10, 0.20] | [0.20, 0.35] | [0.30, 0.50] |
+| `expansion_rate_range` | [0.15, 0.30] | [0.10, 0.20] | [0.05, 0.15] |
+| `still_active_fraction` (≈ right-censored for total LTV) | ~0.40 | ~0.60 | ~0.75 |
+| secondary trap `last_health_signal_post_obs` | off | off | on |
+
+Primary task is `churn_within_180_days` on all three tiers; difficulty is a
+prevalence + noise + calibration axis (matching the v1 reframe — *not* a flat
+AUC-vs-tier promise).
+
+---
+
+## 10. Open items deferred past planning
+
+- **Chained generation** (D3 later): an interface to seed the customer
+  population from a lead-scoring bundle's converted leads. Designed-for but not
+  built; the customer-population builder will keep its acquisition logic behind
+  a seam so a "from converted leads" source can be slotted in.
+- **Continuous-time engine**: weekly steps are sufficient for v1; not coupled
+  to this dataset.
+- **LTV regression label** (vs bucket): bucket is the secondary v1 task; a
+  continuous LTV regression target is a later addition.
+- **CLAUDE.md hard-constraint edit**: the lifecycle snapshot-safety clause is
+  added when `LTV-M5` wiring lands, not in the planning PR.
diff --git a/docs/ltv/roadmap.md b/docs/ltv/roadmap.md
new file mode 100644
index 0000000..4fe7be2
--- /dev/null
+++ b/docs/ltv/roadmap.md
@@ -0,0 +1,198 @@
+# LTV Workstream Roadmap — `b2b_saas_ltv_v1`
+
+> Milestone/PR breakdown for the customer-lifetime-value workstream. Design
+> rationale lives in `design.md` (single source of truth). Update the
+> checkboxes as work lands.
+
+## Planning notation
+
+Work items use a deliberate **`LTV-` prefix** scheme so they never collide
+with GitHub PR numbers (`#NNN`):
+
+- **Milestones** — `LTV-M1` … `LTV-M7`. A milestone is a coherent capability.
+- **PRs** — `LTV-Pa`, `LTV-Pb`, … (sequential letters, globally unique across
+  the whole roadmap). Each PR maps to exactly one milestone.
+
+When a PR is opened on GitHub, reference its planning code in the title, e.g.
+`feat(schema): lifecycle entity rows [LTV-Pa]`, and the GitHub number (`#NNN`)
+is recorded back here on merge. The two namespaces stay distinct: `LTV-Pa`
+(plan) ↔ `#123` (GitHub).
+
+GitHub milestone: **`dataset: leadforge-ltv-v1`** — all LTV PRs assign to it.
+Default labels per PR: a `type:` label, relevant `layer:` labels, and
+`dataset: leadforge-ltv-v1`.
+
+---
+
+## Milestone overview
+
+| Milestone | Capability | PRs | GitHub PRs |
+|-----------|------------|-----|------------|
+| `LTV-M0` | Planning + design lock | `LTV-Pa` | _this PR_ |
+| `LTV-M1` | Schema foundation | `LTV-Pb`, `LTV-Pc` | |
+| `LTV-M2` | Customer population + lifecycle world | `LTV-Pd`, `LTV-Pe` | |
+| `LTV-M3` | Lifecycle simulation engine | `LTV-Pf`, `LTV-Pg` | |
+| `LTV-M4` | Customer snapshot + leakage trap | `LTV-Ph` | |
+| `LTV-M5` | Recipe wiring + framework dispatch | `LTV-Pi`, `LTV-Pj` | |
+| `LTV-M6` | Validation + difficulty calibration | `LTV-Pk` | |
+| `LTV-M7` | CLI, notebooks, publish | `LTV-Pl`, `LTV-Pm`, `LTV-Pn` | |
+
+Total: ~13 PRs across 8 milestones (LTV-M0 = planning). Comparable in scope to
+the original M4–M9 framework build.
+
+---
+
+## `LTV-M0` — Planning + design lock
+
+- [ ] **`LTV-Pa`** — _this PR._ Land `docs/ltv/design.md` + `docs/ltv/roadmap.md`;
+  create the `dataset: leadforge-ltv-v1` GitHub milestone + label; record the
+  five locked design decisions (D1–D5). No package code.
+  - Labels: `type: docs`
+  - Deliverable: design doc, roadmap, milestone/label scaffolding.
+
+---
+
+## `LTV-M1` — Schema foundation
+
+- [ ] **`LTV-Pb`** — `feat(schema): lifecycle entity rows`. Add
+  `SubscriptionEventRow`, `HealthSignalRow`, `InvoiceRow` to `entities.py`;
+  extend `CustomerRow` / `SubscriptionRow` with nullable lifecycle fields
+  (lead-scoring output unchanged). Register in `ALL_ROW_TYPES`. Add FK
+  constraints to `relationships.py`. Add ID prefixes (`subev_`, `hsig_`,
+  `inv_`).
+  - Tests: row round-trips, empty-dataframe dtypes, FK constraint registration,
+    lead-scoring schema unaffected.
+  - Labels: `type: feature`, `layer: schema`
+- [ ] **`LTV-Pc`** — `feat(schema): customer snapshot feature spec + tasks`. Add
+  `CUSTOMER_SNAPSHOT_FEATURES` to `features.py` (incl. the `mrr_change_full_period`
+  trap with `leakage_risk=True`); add `CHURN_WITHIN_180_DAYS` and
+  `LTV_BUCKET_6M` task specs to `tasks.py`.
+  - Tests: feature-spec invariants (single target, trap flagged, no
+    zero-variance by construction), task-spec shape.
+  - Labels: `type: feature`, `layer: schema`
+
+---
+
+## `LTV-M2` — Customer population + lifecycle world
+
+- [ ] **`LTV-Pd`** — `feat(simulation): customer population builder`.
+  `build_customer_population()` in `customer_population.py`: customer entities,
+  5 new latent traits, **staggered start dates** within an acquisition window
+  ending at the absolute `observation_date` (D4). Keep a seam for future
+  chained generation (D3). Reuse `RNGRoot` named-substream convention.
+  - Tests: determinism under seed, latent distributions, staggered-start
+    spread, FK integrity, acquisition-window boundary.
+  - Labels: `type: feature`, `layer: simulation`
+- [ ] **`LTV-Pe`** — `feat(mechanisms): lifecycle motif families + policies`. 5
+  retention motif families with latent-mean biases; `assign_lifecycle_mechanisms()`
+  policy mapping motif → churn/expansion/payment mechanism params.
+  - Tests: per-motif param tables, policy dispatch, determinism.
+  - Labels: `type: feature`, `layer: mechanisms`
+
+---
+
+## `LTV-M3` — Lifecycle simulation engine
+
+- [ ] **`LTV-Pf`** — `feat(mechanisms): churn / expansion / payment hazards`.
+  `lifecycle_hazards.py`: Weibull-shaped churn hazard with renewal-date spike,
+  expansion propensity, payment-failure + dunning. Built on `LatentScore` +
+  per-step Bernoulli.
+  - Tests: hazard shape over tenure, renewal spike, dunning escalation,
+    expansion MRR delta bounds.
+  - Labels: `type: feature`, `layer: mechanisms`
+- [ ] **`LTV-Pg`** — `feat(simulation): weekly lifecycle engine`.
+  `simulate_lifecycle()` in `lifecycle.py`: weekly loop (D2) per customer from
+  staggered start through `observation_date + 180d + buffer`; emits
+  `subscription_events`, `health_signals`, `invoices`; updates
+  customer/subscription terminal state. RNG substreams
+  `lifecycle_transitions` / `lifecycle_events` / `lifecycle_post_sim`.
+  - Tests: determinism, churn-rate bounds per difficulty, still-active
+    fraction, weekly health-signal cadence, monthly invoice cadence,
+    every customer simulated through the full label window.
+  - Labels: `type: feature`, `layer: simulation`
+
+---
+
+## `LTV-M4` — Customer snapshot + leakage trap
+
+- [ ] **`LTV-Ph`** — `feat(render): customer snapshot builder`.
+  `build_customer_snapshot()` in `customer_snapshots.py`: **absolute
+  `observation_date` cutoff** (not relative `snapshot_day`); aggregate health
+  / events / invoices over last-12-weeks windows; compute
+  `mrr_change_at_snapshot` (valid) and `mrr_change_full_period` (trap); derive
+  `churned_within_180_days` and `ltv_bucket_6m`; difficulty distortions.
+  - Tests: no post-`observation_date` data in windowed columns; trap-invariant
+    (full ≠ snapshot for non-trivial fraction); label derivation; difficulty
+    distortion exemption for the trap.
+  - Labels: `type: feature`, `layer: render`
+
+---
+
+## `LTV-M5` — Recipe wiring + framework dispatch
+
+- [ ] **`LTV-Pi`** — `feat(api,core): recipe_type dispatch + lifecycle config`.
+  Add `n_customers` + lifecycle fields to `GenerationConfig`; parse
+  `recipe_type` + `lifecycle:` section in `recipes.py`; dispatch the lifecycle
+  path in `Generator.generate()`; bump `BUNDLE_SCHEMA_VERSION` 5 → 6 (D5);
+  record `observation_date` in the manifest. Extend `CLAUDE.md` hard
+  constraints with the lifecycle snapshot-safety clause.
+  - Tests: config precedence, dispatch on recipe_type, lead-scoring path
+    unaffected, manifest schema-version + observation_date, exposure filtering
+    for new tables.
+  - Labels: `type: feature`, `layer: api`, `layer: core`, `layer: render`
+- [ ] **`LTV-Pj`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three
+  recipe YAMLs (recipe/narrative/difficulty_profiles); register in the recipe
+  registry; end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()`
+  smoke test producing a saved bundle.
+  - Tests: recipe loads, full generation round-trip, determinism under same
+    seed, both task splits written, public/instructor exposure split.
+  - Labels: `type: feature`, `layer: recipes`
+
+---
+
+## `LTV-M6` — Validation + difficulty calibration
+
+- [ ] **`LTV-Pk`** — `feat(validation): lifecycle leakage probes + realism bands`.
+  Lifecycle leakage probes (absolute-cutoff window check; banned terminal
+  columns/tables; deterministic reconstruction); difficulty calibration
+  (churn-rate / expansion-rate / still-active bands); trap-invariant guard;
+  cross-seed drift. Dataset-card renderer for the lifecycle narrative.
+  - Tests: probe coverage, band checks per tier, cross-seed stability.
+  - Labels: `type: feature`, `layer: validation`
+
+---
+
+## `LTV-M7` — CLI, notebooks, publish
+
+- [ ] **`LTV-Pl`** — `feat(cli): lifecycle generate flags + inspect surfacing`.
+  `--n-customers`, `--observation-date` (or derived) flags; `inspect` surfaces
+  lifecycle manifest fields (observation_date, task inventory).
+  - Labels: `type: feature`, `layer: cli`
+- [ ] **`LTV-Pm`** — `docs(notebooks): survival + churn teaching sequence`.
+  Notebooks: churn baseline, survival-analysis intro (right-censoring on the
+  relational tables), the `mrr_change_full_period` leakage demo, value-aware
+  LTV ranking.
+  - Labels: `type: docs`, `layer: render`
+- [ ] **`LTV-Pn`** — `feat(release): package + publish b2b_saas_ltv_v1`. Kaggle
+  + HF packaging (reuse Phase-5 packagers), LLM critique run, dataset card,
+  release notes, tag. Publishes under the now-live `leadforge` Kaggle org.
+  - Labels: `type: feature`, `layer: validation`
+
+---
+
+## Dependencies
+
+```
+LTV-M0 (plan)
+  └─ LTV-M1 (schema)
+       └─ LTV-M2 (population + motifs)
+            └─ LTV-M3 (engine)
+                 └─ LTV-M4 (snapshot)
+                      └─ LTV-M5 (wiring + recipe)  ← first end-to-end bundle
+                           └─ LTV-M6 (validation)
+                                └─ LTV-M7 (publish)
+```
+
+`LTV-M5` is the first point where `leadforge generate --recipe
+b2b_saas_ltv_v1` produces a bundle end-to-end. Everything before it is
+bottom-up framework construction; everything after is quality + delivery.