From 146280c65cce8b9f2a4a8575744543a8e7f21a47 Mon Sep 17 00:00:00 2001 From: Shay Palachy Date: Wed, 10 Jun 2026 10:25:35 +0300 Subject: [PATCH] docs(ltv): reframe target from churn classification to pLTV regression [LTV-Pa] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The first planning pass framed the primary task as churn classification (binary/multiclass), which pattern-matched onto the lead-scoring binary task instead of the actual predictive-LTV literature. Corrected to continuous pLTV regression, following Google lifetime_value / ZILN (arXiv:1912.07753) and Voyantis pLTV framing. Decisions added (design.md §2.2): - D1 (corrected): primary task = continuous pLTV regression; ZILN-shaped target (zero mass + lognormal tail); LTV-bucket multiclass dropped. - D6: multiple forward windows 90/365/730d (zero-mass + tail grow with window → built-in difficulty gradient). - D7: value basis = gross revenue (sum of paid invoices in window). - D8: first-class early-pLTV variant (tenure-anchored cold-start cutoff) alongside the calendar-anchored standard regime. - D9: churn kept as a secondary task = the ZILN zero-inflation indicator. Knock-on doc changes: - New §3 pLTV target (ZILN) + §3.1 two observation regimes. - §8 targets: three ltv_revenue_{90,365,730}d regression columns + secondary churned_within_180d. - §9 evaluation: Spearman / normalized Gini / decile calibration / value capture (not AUC); MSE shown as anti-pattern. - roadmap.md: LTV-Pc now regression task specs; LTV-Ph calendar-anchored snapshot; new LTV-Pi early-pLTV task family; LTV-Pj regression task-split writer; LTV-Pl regression metric bands; LTV-Pn ZILN/cold-start notebooks. PR sequence now LTV-Pa..Po. - .agent-plan.md LTV section synced. Co-Authored-By: Claude Opus 4.8 --- .agent-plan.md | 30 ++-- docs/ltv/design.md | 399 +++++++++++++++++++++++++++----------------- docs/ltv/roadmap.md | 167 ++++++++++-------- 3 files changed, 354 insertions(+), 242 deletions(-) diff --git a/.agent-plan.md b/.agent-plan.md index 4bb553b..77a7027 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -14,22 +14,26 @@ ## Next Workstream — `b2b_saas_ltv_v1` LTV Dataset (planning) -Goal: add a customer-lifetime-value dataset family — post-conversion -subscription lifecycle (renewals, expansion, churn, payment failure) on the -same Veridian vertical. New recipe on the existing framework via a -`recipe_type: lifecycle` dispatch hook, **not** a fork. +Goal: add a **predictive-lifetime-value (pLTV) regression** dataset family — +post-conversion subscription lifecycle (renewals, expansion, churn, payment +failure) on the same Veridian vertical. New recipe on the existing framework +via a `recipe_type: lifecycle` dispatch hook, **not** a fork. **Source of truth:** `docs/ltv/design.md` **Roadmap (milestones/PRs):** `docs/ltv/roadmap.md` — planning notation -`LTV-M1…M7` (milestones) / `LTV-Pa…Pn` (PRs), kept distinct from GitHub `#NNN`. -**GitHub milestone:** `dataset: leadforge-ltv-v1` - -Locked decisions (2026-06-10): primary task `churn_within_180_days`; weekly -simulation steps; independent generation (chaining later); staggered start -dates + fixed absolute `observation_date`; bundle schema version 5 → 6. - -Status: `LTV-M0` (planning) — design + roadmap landed. `LTV-M1` (schema -foundation) is next. +`LTV-M0…M7` (milestones) / `LTV-Pa…Po` (PRs), kept distinct from GitHub `#NNN`. +**GitHub milestone:** `dataset: leadforge-ltv-v1` (#8) + +Locked decisions (2026-06-10): **continuous pLTV regression target** (ZILN — +zero-inflated lognormal; gross-revenue value), **multiple forward windows +90/365/730d**, **first-class early/cold-start (tenure-anchored) task variant**, +churn kept as a secondary task; weekly simulation steps; independent generation +(chaining later); staggered start dates + fixed absolute `observation_date`; +bundle schema version 5 → 6. (Framing follows Google `lifetime_value`/ZILN and +Voyantis pLTV; the earlier churn-classification framing was corrected.) + +Status: `LTV-M0` (planning) — design + roadmap landed (#102) and reframed to +pLTV regression. `LTV-M1` (schema foundation) is next. --- diff --git a/docs/ltv/design.md b/docs/ltv/design.md index de75a4c..c472f56 100644 --- a/docs/ltv/design.md +++ b/docs/ltv/design.md @@ -14,19 +14,36 @@ expansion, churn, and lifetime value. ## 1. Goal -Add a customer-lifecycle dataset family to leadforge that picks up where the -lead-scoring dataset leaves off. The lead-scoring recipe ends at `closed_won`; -the LTV recipe begins there and simulates the subscription lifecycle. - -**Teaching goals (analogous to lead scoring):** - -- Predict churn / lifetime value for active customers from a point-in-time - snapshot. -- Teach **survival analysis** (right-censoring discipline), **cohort - analysis**, and **expansion-revenue modelling**. +Add a **predictive-lifetime-value (pLTV) regression** dataset family to +leadforge that picks up where the lead-scoring dataset leaves off. The +lead-scoring recipe ends at `closed_won`; the LTV recipe begins there and +simulates the subscription lifecycle, then asks the canonical pLTV question: + +> **Given what we know about a customer at an observation point, how much +> monetary value will they generate over the next N days?** + +This is a **regression** task, not classification. It follows the framing in +Google's [`lifetime_value`](https://github.com/google/lifetime_value) (the +ZILN paper, [arXiv:1912.07753](https://arxiv.org/abs/1912.07753)) and the +commercial pLTV framing used by [Voyantis](https://www.voyantis.ai/): +continuous future-value prediction, optimised for value-based decisions +(acquisition, bidding, retention prioritisation), often under **early / +cold-start** conditions where a customer's true value isn't yet clear. + +**Teaching goals:** + +- Predict continuous future value for customers from a point-in-time snapshot — + a target that is **zero-inflated and heavy-tailed** by construction. +- Teach the **ZILN** decomposition (`P(value>0) × E[value | value>0]`) and why + MSE is the wrong loss for heavy-tailed value. +- Teach pLTV evaluation: **discrimination** (Spearman / normalized Gini / + decile lift) and **calibration** (decile charts, total-predicted-vs-actual) — + not AUC. +- Teach **early / cold-start prediction**: predict long-horizon value from a + short observation window. - Reproduce realistic confusions: leakage traps around future expansion - events, right-censoring of still-active customers, and time-window - discipline on health-signal features. + events, right-censoring of still-active customers, and time-window discipline + on health-signal features. This is a **new recipe on the existing framework**, not a fork. The CLI, bundle format, RNG system, exposure modes, manifest schema, and validation @@ -38,51 +55,107 @@ lifecycle`. ## 2. Locked design decisions -These five questions were resolved by the maintainer on 2026-06-10. They are -load-bearing for everything below. +### 2.1 First pass — 2026-06-10 (workstream shape) + +| # | Question | Decision | +|---|----------|----------| +| D2 | Simulation time resolution | **Weekly steps** — granular health-signal trend curves. Billing/renewal events resolve to the enclosing week. | +| D3 | Independent vs chained generation | **Independent for v1.** Customer population is self-contained, not chained from a lead-scoring bundle. Chaining is designed-for-later. | +| D4 | Observation cohort | **Staggered start dates, fixed observation date.** Customers acquired across an acquisition window; all observed at one absolute calendar date, so tenure-at-observation varies. | +| D5 | Bundle schema version | **Bump** `BUNDLE_SCHEMA_VERSION` 5 → 6. | + +### 2.2 Second pass — 2026-06-10 (target reframe to pLTV regression) + +The original D1 ("primary task = `churn_within_180_days`") was a +mis-framing — it pattern-matched onto the lead-scoring binary classifier +instead of the actual pLTV literature. Corrected: | # | Question | Decision | |---|----------|----------| -| D1 | Primary task for the first shipped recipe | **`churn_within_180_days`** (binary). `ltv_bucket_6m` ships as a secondary task in the same bundle. | -| D2 | Simulation time resolution | **Weekly steps** — gives granular health-signal trend curves for teaching trend features. Billing/renewal events resolve to the enclosing week. | -| D3 | Independent vs chained generation | **Independent for v1.** The customer population is generated self-contained, not chained from a lead-scoring bundle. Optional chaining is designed-for-later but not built now. | -| D4 | Observation cohort | **Staggered start dates, fixed observation date.** Customers are acquired across an acquisition window; all are observed at one absolute calendar date, so tenure-at-observation varies (cold-start customers vs mature customers). | -| D5 | Bundle schema version | **Bump** `BUNDLE_SCHEMA_VERSION` (currently `5` → `6`). The table inventory and the customers/subscriptions semantics change substantially. | - -### 2.1 Consequence of D4 — absolute observation anchor - -This is the most important framework divergence. The lead-scoring path filters -events by a **per-entity relative** cutoff (`lead_created_at + snapshot_day`). -With staggered starts + a single fixed observation date (D4), the lifecycle -path filters by an **absolute calendar** cutoff (`observation_date`, the same -for every customer). - -Therefore the lifecycle path does **not** reuse `snapshot_day`. It introduces -an `observation_date` concept derived deterministically from the world -calendar (acquisition-window end) and recorded in the manifest. The customer -snapshot builder is a **separate function** (`build_customer_snapshot`) so it -can apply absolute-cutoff filtering without touching the lead-scoring -`build_snapshot` path. - -### 2.2 Consequence of D1 — right-censoring is a property, not a label problem - -Because the primary task is `churn_within_180_days`, every customer gets a -**definite** binary label as long as the simulation runs through -`observation_date + 180 days` for all customers. A customer still active at -`observation_date + 180d` is a clean negative, not a censored row. - -Right-censoring is still taught — but as a property of the **relational -tables** and the **secondary LTV task**: a customer still active at the end of -the simulation horizon is right-censored for *total* lifetime value. The -notebooks teach censoring discipline on the relational data and on -`ltv_bucket_6m`, while the headline binary task stays clean. This is a cleaner -split than making censoring a label-derivation hazard. +| D1 | Primary task type | **Continuous pLTV regression.** Target = future gross revenue over a forward window. ZILN-shaped (zero mass + lognormal tail). The LTV-bucket multiclass idea is dropped. | +| D6 | Target horizon(s) | **Multiple windows: 90 / 365 / 730 days.** Three regression targets per customer. Zero-inflation and tail-heaviness grow with the window, giving a built-in difficulty gradient. | +| D7 | Value basis | **Gross revenue** = sum of paid invoice amounts (`payment_status ∈ {paid, recovered}`) inside the window. Matches the MRR mechanics directly. | +| D8 | Early/cold-start emphasis | **First-class early-pLTV task variant.** A tenure-anchored observation regime (observe each customer at a fixed short tenure, predict long-horizon value) ships alongside the calendar-anchored standard regime. | +| D9 | Churn task | **Kept as a secondary/auxiliary task** (`churn_within_180_days`), exposing the ZILN zero-inflation indicator. Not the headline. | + +### 2.3 Consequence of D4 — absolute observation anchor + +The lead-scoring path filters events by a **per-entity relative** cutoff +(`lead_created_at + snapshot_day`). With staggered starts + a single fixed +observation date (D4), the **calendar-anchored** lifecycle path filters by an +**absolute calendar** cutoff (`observation_date`, the same for every customer), +derived deterministically from the world calendar and recorded in the manifest. + +The **tenure-anchored** early-pLTV regime (D8) does the opposite — a per-customer +relative cutoff at `customer_start + early_tenure_weeks` — which is structurally +the same relative-window logic the lead-scoring snapshot already uses. So both +window types have precedent in the codebase; the customer snapshot builder +supports both via an explicit cutoff argument. + +### 2.4 Consequence of D6 — fully-simulated windows, clean targets + +Because every target window is **fully simulated** (the engine runs each +customer through `observation_date + 730d` for the longest standard window, and +through `customer_start + early_tenure + 730d` for the early variant), all three +regression targets are **complete, not right-censored**. A customer who churns +mid-window simply has low/zero forward revenue — that *is* the zero-inflation, +not censoring. + +Right-censoring is still taught, but as a property of *total* lifetime value on +the **relational tables** (a customer still active at sim-end has censored total +LTV) and in the notebooks — never as a hazard in the headline fixed-window +targets. + +--- + +## 3. The pLTV target (ZILN) + +For each customer and each window `W ∈ {90, 365, 730}` days: + +``` +ltv_revenue_{W}d = Σ amount_usd for invoices with + payment_status ∈ {paid, recovered} + AND cutoff < invoice_date <= cutoff + W days +``` + +where `cutoff` is the observation anchor (absolute `observation_date` for the +standard regime; `customer_start + early_tenure_weeks` for the early-pLTV +regime). + +**Distribution shape (the whole point):** + +- **Zero / near-zero mass** — customers who churn early in the window generate + little/no forward revenue. Mass grows with `W` (more time to churn) and with + difficulty tier (higher churn rate). +- **Lognormal tail** — retained + expanding customers compound MRR; top + accounts dominate total value. + +This is exactly the ZILN setting: model `P(value>0)` and the lognormal `(μ, σ)` +of the positive part. The dataset is built so a ZILN/two-part model beats a +plain MSE regressor — a live lesson, not a claim. + +Why three windows: `90d` is near-deterministic (almost everyone pays ~3 +invoices, low zero mass) → an easy warm-up. `365d` introduces a renewal +decision and meaningful zero mass. `730d` is strongly ZILN (heavy zero mass + +compounded expansion tail) → the hard target. Students can watch a model's +discrimination/calibration degrade as the horizon lengthens. + +### 3.1 Two observation regimes + +| regime | cutoff | tenure at cutoff | use case | +|--------|--------|------------------|----------| +| **calendar-anchored** (standard) | absolute `observation_date` | varies (cold→mature) | "score my current book of business" | +| **tenure-anchored** (early-pLTV) | `customer_start + early_tenure_weeks` (e.g. 4w) | fixed & short | "predict new-customer value early" (Voyantis acquisition framing) | + +Both regimes are emitted from the **same simulated world** — two snapshot +tables + two task families. The early-pLTV regime is the genuine cold-start +hard case: only a few weeks of health signal exist at the cutoff. --- -## 3. Entities and tables +## 4. Entities and tables -### 3.1 New entity rows (in `leadforge/schema/entities.py`) +### 4.1 New entity rows (in `leadforge/schema/entities.py`) **`SubscriptionEventRow`** — the lifecycle backbone; one row per state change. @@ -109,22 +182,22 @@ split than making censoring a label-derivation hazard. | `support_tickets` | Int64 | tickets opened that week | | `nps_score` | Int64 \| null | quarterly survey; null most weeks | -**`InvoiceRow`** — monthly billing; drives the payment-failure mechanism. +**`InvoiceRow`** — monthly billing; drives payment failure **and the pLTV target**. | column | dtype | notes | |--------|-------|-------| | `invoice_id` | string | opaque ID | | `customer_id` | string | FK → customers | | `invoice_date` | string | ISO-8601 | -| `amount_usd` | Int64 | | +| `amount_usd` | Int64 | the unit of pLTV value (§3) | | `payment_status` | string | `paid` / `failed` / `recovered` / `written_off` | -### 3.2 Extended existing entity rows +### 4.2 Extended existing entity rows The current `CustomerRow` (4 fields) and `SubscriptionRow` (5 fields, `subscription_status` hardcoded `"active"`) are shells. The lifecycle recipe -fills them out. **The lead-scoring recipe keeps the thin versions** — the new -fields are nullable/optional so the procurement recipe's output is unchanged. +fills them out with **nullable** fields so the procurement recipe's output is +unchanged. `CustomerRow` gains: `initial_mrr`, `initial_plan`, `contract_term_months`, `csm_rep_id`. @@ -132,130 +205,121 @@ fields are nullable/optional so the procurement recipe's output is unchanged. `SubscriptionRow` gains: `current_mrr`, `subscription_end_at`, `churn_at`, `churn_reason`, `renewal_count`, `expansion_count`. -### 3.3 Public lifecycle table inventory +### 4.3 Public lifecycle table inventory | table | public (`student_public`) | instructor (`research_instructor`) | |-------|---------------------------|-------------------------------------| | accounts | ✓ | ✓ | | customers | ✓ (terminal fields redacted) | ✓ full | | subscriptions | ✓ (terminal fields redacted) | ✓ full | -| subscription_events | ✓ (filtered to `<= observation_date`) | ✓ full horizon | -| health_signals | ✓ (filtered to `<= observation_date`) | ✓ full horizon | -| invoices | ✓ (filtered to `<= observation_date`) | ✓ full horizon | +| subscription_events | ✓ (filtered to `<= cutoff`) | ✓ full horizon | +| health_signals | ✓ (filtered to `<= cutoff`) | ✓ full horizon | +| invoices | ✓ (filtered to `<= cutoff`) | ✓ full horizon | -Contacts/leads/touches/etc. from the lead-scoring world are **not** part of -the LTV bundle (independent generation, D3). +Contacts/leads/touches/etc. from the lead-scoring world are **not** part of the +LTV bundle (independent generation, D3). --- -## 4. Snapshot-safety contract (lifecycle) +## 5. Snapshot-safety contract (lifecycle) -Analogous to the lead-scoring hard constraint in `CLAUDE.md`, but anchored on -the **absolute** `observation_date`: +Analogous to the lead-scoring hard constraint in `CLAUDE.md`, anchored on the +relevant `cutoff` (absolute `observation_date`, or per-customer tenure cutoff +for the early regime): - Every timestamp column in public event tables (`subscription_events.event_timestamp`, `health_signals.period_start`, - `invoices.invoice_date`) must satisfy `<= observation_date`. + `invoices.invoice_date`) must satisfy `<= cutoff`. - No terminal/outcome fields in public `customers` / `subscriptions`: - `churn_at`, `churn_reason`, `subscription_end_at`, and the derived - `churned_within_180_days` label are banned from the public relational - tables. -- `subscription_events` rows with `event_type == "churn"` after - `observation_date` are excluded from public bundles (they encode the label). -- No flat snapshot feature may use events after `observation_date`, **except** - the deliberately-retained leakage trap (§6). - -This contract is enforced by lifecycle-specific leakage probes (see roadmap -`LTV-M6`) and recorded in the manifest. `CLAUDE.md`'s hard-constraints section -gains a lifecycle clause when the recipe wiring lands (`LTV-M5`). + `churn_at`, `churn_reason`, `subscription_end_at`. +- No pLTV target column (`ltv_revenue_*`) or the secondary churn label may + appear in the public relational tables — they are forward-window aggregates + by construction. +- No flat snapshot feature may use events after `cutoff`, **except** the + deliberately-retained leakage trap (§7). + +Enforced by lifecycle-specific leakage probes (roadmap `LTV-M6`) and recorded +in the manifest. `CLAUDE.md`'s hard-constraints section gains a lifecycle +clause when the recipe wiring lands (`LTV-M5`). --- -## 5. Simulation mechanisms +## 6. Simulation mechanisms Three new mechanism types, none of which exist today. All reuse the existing `LatentScore` + per-step Bernoulli pattern from `leadforge/mechanisms/hazards.py`. **Churn hazard (post-conversion).** Weekly probability driven by health -signals + latent traits, structurally different from the flat pre-conversion -`_DAILY_CHURN_RATE`: +signals + latent traits: - Weibull-shaped over tenure: elevated in weeks 1–12 (onboarding instability), low in the steady middle. -- **Renewal-date spike**: at each contract anniversary the churn hazard is - ~10× the background rate (discrete renewal decision). +- **Renewal-date spike**: ~10× background at each contract anniversary. - Drivers: `latent_product_fit` (background), `latent_champion_strength` - (renewal-date), `feature_depth_score` (leading indicator), - unrecovered payment failures (financial trigger). + (renewal-date), `feature_depth_score` (leading indicator), unrecovered + payment failures (financial trigger). **Expansion propensity.** Weekly probability of a plan upgrade / seat add: - Drivers: `latent_adoption_velocity`, `feature_depth_score`, active-user growth, `employee_band` (expansion ceiling). -- Expansion MRR delta: `randint(0.25·mrr, 1.0·mrr)`. +- Expansion MRR delta: `randint(0.25·mrr, 1.0·mrr)`. This is the heavy-tail + generator for the pLTV target. **Payment failure.** Monthly billing event with a failure probability: - Driver: `latent_budget_stability`. -- Dunning: 3 months of `failed` before escalation to `recovered` or - `written_off`; unrecovered → forced churn. +- Dunning: 3 months `failed` before escalation to `recovered` / `written_off`; + unrecovered → forced churn. Failed/written-off invoices do **not** count + toward gross-revenue pLTV (D7). -### 5.1 Lifecycle motif families (new) - -The 5 lead-scoring motifs describe *buying*; the LTV world needs *retention* -motifs: +### 6.1 Lifecycle motif families (new) | family | retention driver | |--------|------------------| | `product_led_retention` | `latent_product_fit` dominant; health signals strongly predictive | | `relationship_led_retention` | `latent_champion_strength` dominant; health weaker | -| `expansion_led_growth` | low churn, high upsell; LTV variance from expansion | +| `expansion_led_growth` | low churn, high upsell; pLTV variance from the expansion tail | | `payment_fragile` | financially-triggered churn; `latent_budget_stability` dominant | | `churner_dominated` | high background churn; strong early-warning signals (teaching tier) | -### 5.2 New customer latent traits +### 6.2 New customer latent traits `latent_product_fit`, `latent_adoption_velocity`, `latent_budget_stability`, `latent_champion_strength`, `latent_organizational_stability`. Sampled from the -same clipped-Gaussian `_sample_latent` helper, with motif-family mean biases +clipped-Gaussian `_sample_latent` helper, with motif-family mean biases mirroring `_MOTIF_LATENT_BIAS`. --- -## 6. Leakage trap +## 7. Leakage trap **Primary trap: `mrr_change_full_period` vs `mrr_change_at_snapshot`.** -- `mrr_change_at_snapshot` = `current_mrr − initial_mrr` measured at - `observation_date`. **Valid.** +- `mrr_change_at_snapshot` = `current_mrr − initial_mrr` at the cutoff. **Valid.** - `mrr_change_full_period` = MRR delta from start to **end of simulation**. - **Leaks** — future expansions (which correlate with high LTV / low churn) - inflate it. Retained in all modes (`leakage_risk=True, + **Leaks** — future expansions (which directly drive the pLTV target) inflate + it. Even more natural against a value target than it was against the + lead-scoring label. Retained in all modes (`leakage_risk=True, redact_in_modes=frozenset()`), documented in the feature dictionary and - release notes, exactly like `total_touches_all`. - -Why it's a good trap: both columns are "MRR delta", differing only by window; -standalone AUC is moderate (looks useful, not obviously inflated); tree models -extract more from it than LR (reproduces the NB03 lesson); removing it causes a -measurable-but-not-catastrophic drop. + release notes. **Secondary trap (advanced tier): `last_health_signal_post_obs`** — a -health-signal reading from *after* `observation_date`, named to look like a -current-state feature. More subtle because the column name doesn't reveal the -time shift. +health-signal reading from *after* the cutoff, named to look like a +current-state feature. A trap-invariant test (analogous to `test_windowed_bundle_trap.py`) asserts `mrr_change_full_period` diverges from `mrr_change_at_snapshot` for a -non-trivial fraction of customers and never contradicts it in sign for -expansion-only customers. +non-trivial fraction of customers. --- -## 7. Customer snapshot features (at `observation_date`) +## 8. Customer snapshot features (at `cutoff`) Grouped like `LEAD_SNAPSHOT_FEATURES`. New constant -`CUSTOMER_SNAPSHOT_FEATURES` in `leadforge/schema/features.py`. +`CUSTOMER_SNAPSHOT_FEATURES` in `leadforge/schema/features.py`. The same +feature set serves both observation regimes (only the cutoff differs). **Account** (from `AccountRow`): `industry`, `region`, `employee_band`, `estimated_revenue_band`. @@ -265,7 +329,7 @@ Grouped like `LEAD_SNAPSHOT_FEATURES`. New constant `expansion_count`, `downgrade_count`, `contract_term_months`, `weeks_to_next_renewal`. -**Health (aggregated over last 12 weeks before `observation_date`):** +**Health (aggregated over last 12 weeks before `cutoff`):** `avg_active_users_l12w`, `active_user_trend_l12w` (slope), `avg_feature_depth_l12w`, `support_ticket_count_l12w`, `last_nps_score` (nullable). @@ -275,16 +339,54 @@ Grouped like `LEAD_SNAPSHOT_FEATURES`. New constant **Leakage trap (all modes):** `mrr_change_full_period`. -**Target (primary, `churn_within_180_days`):** `churned_within_180_days` -(boolean) — True iff a `churn` event falls in -`[observation_date, observation_date + 180d]`. +**Targets:** -**Target (secondary, `ltv_bucket_6m`):** quartile (`low`/`medium`/`high`/`top`) -of revenue collected (paid invoices) in the 6 months after `observation_date`. +| column | type | task | +|--------|------|------| +| `ltv_revenue_90d` | Float64 | primary regression (warm-up horizon) | +| `ltv_revenue_365d` | Float64 | primary regression (standard horizon) | +| `ltv_revenue_730d` | Float64 | primary regression (hard horizon) | +| `churned_within_180d` | boolean | secondary / ZILN zero-inflation indicator | + +For the **early-pLTV** task family the same target columns are recomputed off +the tenure-anchored cutoff and exported under a separate task directory. --- -## 8. Framework changes inventory +## 9. Evaluation & difficulty + +### 9.1 pLTV metrics (not AUC) + +Regression + ranking + calibration, following the ZILN paper: + +- **Discrimination:** Spearman rank correlation; **normalized Gini** (a.k.a. + the value-weighted lift / Lorenz curve); decile lift. +- **Calibration:** decile chart (predicted vs actual mean per decile); + total-predicted-vs-total-actual ratio. +- **Value capture:** top-K / top-decile share of realised value captured (reuses + the lead-scoring `expected_acv`-capture machinery in `release_quality.py`). +- **Point error (reported, not optimised):** MAE / RMSE on `log1p(value)` — + raw-scale MSE is shown as the *anti-pattern*. + +### 9.2 Difficulty profiles + +| dimension | intro | intermediate | advanced | +|-----------|-------|--------------|----------| +| `signal_strength` | 0.90 | 0.70 | 0.50 | +| `noise_scale` | 0.10 | 0.30 | 0.55 | +| `missing_rate` | 0.02 | 0.08 | 0.18 | +| `annual_churn_rate_range` | [0.10, 0.20] | [0.20, 0.35] | [0.30, 0.50] | +| `expansion_rate_range` | [0.15, 0.30] | [0.10, 0.20] | [0.05, 0.15] | +| `still_active_fraction` (≈ censored total LTV) | ~0.40 | ~0.60 | ~0.75 | +| secondary trap `last_health_signal_post_obs` | off | off | on | + +Two orthogonal difficulty axes: the **tier** (signal/noise/prevalence) and the +**horizon** (90 < 365 < 730 → growing zero-mass + tail). Per-tier calibration +bands are fit on the regression metrics, not AUC. + +--- + +## 10. Framework changes inventory ### New files @@ -292,7 +394,7 @@ of revenue collected (paid invoices) in the 6 months after `observation_date`. |------|---------| | `leadforge/simulation/lifecycle.py` | `simulate_lifecycle()` — weekly-step subscription simulator | | `leadforge/simulation/customer_population.py` | `build_customer_population()` — customer entities + latents + staggered starts | -| `leadforge/render/customer_snapshots.py` | `build_customer_snapshot()` — per-customer row at absolute `observation_date` | +| `leadforge/render/customer_snapshots.py` | `build_customer_snapshot(cutoff=…)` — per-customer row at a cutoff; serves both regimes | | `leadforge/mechanisms/lifecycle_hazards.py` | churn hazard, expansion propensity, payment failure | | `leadforge/recipes/b2b_saas_ltv_v1/{recipe,narrative,difficulty_profiles}.yaml` | new recipe | @@ -301,51 +403,36 @@ of revenue collected (paid invoices) in the 6 months after `observation_date`. | file | change | |------|--------| | `leadforge/schema/entities.py` | add 3 rows; extend `CustomerRow`/`SubscriptionRow` | -| `leadforge/schema/features.py` | add `CUSTOMER_SNAPSHOT_FEATURES` | -| `leadforge/schema/tasks.py` | add `CHURN_WITHIN_180_DAYS`, `LTV_BUCKET_6M` | +| `leadforge/schema/features.py` | add `CUSTOMER_SNAPSHOT_FEATURES` (3 regression targets + secondary churn) | +| `leadforge/schema/tasks.py` | add `LTV_REVENUE_{90,365,730}D` regression task specs + `CHURN_WITHIN_180D` | | `leadforge/schema/relationships.py` | FK constraints for new tables | -| `leadforge/core/models.py` | add `n_customers: int \| None`; lifecycle config fields | +| `leadforge/core/models.py` | add `n_customers`; lifecycle config (windows, early-tenure, observation anchor) | | `leadforge/api/recipes.py` | parse `recipe_type` + `lifecycle:` section | | `leadforge/api/generator.py` | dispatch on `recipe.recipe_type` | -| `leadforge/render/manifests.py` | record `observation_date`, bump schema version to 6 | -| `leadforge/validation/*` | lifecycle leakage probes + realism bands | +| `leadforge/render/manifests.py` | record `observation_date` + windows; bump schema version to 6 | +| `leadforge/render/tasks.py` | regression task splits (continuous target) + early-pLTV task dir | +| `leadforge/validation/*` | lifecycle leakage probes + **regression** realism/metric bands | | `CLAUDE.md` | lifecycle snapshot-safety hard-constraint clause; reference docs | ### Unchanged (reused as-is) -CLI commands/flags, `WorldBundle`/`WorldSpec`, RNG architecture, exposure-mode -dispatch, `manifest.json` envelope (additive), determinism/monotonicity -invariants. +CLI commands/flags surface, `WorldBundle`/`WorldSpec`, RNG architecture, +exposure-mode dispatch, `manifest.json` envelope (additive), +determinism/monotonicity invariants. The task-split writer needs a continuous +target path (today it assumes a classification label). --- -## 9. Difficulty profiles - -| dimension | intro | intermediate | advanced | -|-----------|-------|--------------|----------| -| `signal_strength` | 0.90 | 0.70 | 0.50 | -| `noise_scale` | 0.10 | 0.30 | 0.55 | -| `missing_rate` | 0.02 | 0.08 | 0.18 | -| `annual_churn_rate_range` | [0.10, 0.20] | [0.20, 0.35] | [0.30, 0.50] | -| `expansion_rate_range` | [0.15, 0.30] | [0.10, 0.20] | [0.05, 0.15] | -| `still_active_fraction` (≈ right-censored for total LTV) | ~0.40 | ~0.60 | ~0.75 | -| secondary trap `last_health_signal_post_obs` | off | off | on | - -Primary task is `churn_within_180_days` on all three tiers; difficulty is a -prevalence + noise + calibration axis (matching the v1 reframe — *not* a flat -AUC-vs-tier promise). - ---- - -## 10. Open items deferred past planning - -- **Chained generation** (D3 later): an interface to seed the customer - population from a lead-scoring bundle's converted leads. Designed-for but not - built; the customer-population builder will keep its acquisition logic behind - a seam so a "from converted leads" source can be slotted in. -- **Continuous-time engine**: weekly steps are sufficient for v1; not coupled - to this dataset. -- **LTV regression label** (vs bucket): bucket is the secondary v1 task; a - continuous LTV regression target is a later addition. +## 11. Open items deferred past planning + +- **Chained generation** (D3 later): seed the customer population from a + lead-scoring bundle's converted leads. The population builder keeps a seam + for a "from converted leads" source. +- **Continuous-time engine**: weekly steps suffice for v1. +- **Explicit ZILN baseline model** shipped in the package: notebooks + demonstrate ZILN; a first-class `leadforge`-side ZILN baseline is a later + addition. +- **Contribution-margin value basis** (vs gross revenue, D7): would add a cost + model; deferred. - **CLAUDE.md hard-constraint edit**: the lifecycle snapshot-safety clause is - added when `LTV-M5` wiring lands, not in the planning PR. + added when `LTV-M5` wiring lands, not in a planning PR. diff --git a/docs/ltv/roadmap.md b/docs/ltv/roadmap.md index 4fe7be2..3a1614a 100644 --- a/docs/ltv/roadmap.md +++ b/docs/ltv/roadmap.md @@ -1,7 +1,7 @@ # LTV Workstream Roadmap — `b2b_saas_ltv_v1` -> Milestone/PR breakdown for the customer-lifetime-value workstream. Design -> rationale lives in `design.md` (single source of truth). Update the +> Milestone/PR breakdown for the predictive-lifetime-value (pLTV) workstream. +> Design rationale lives in `design.md` (single source of truth). Update the > checkboxes as work lands. ## Planning notation @@ -9,17 +9,17 @@ Work items use a deliberate **`LTV-` prefix** scheme so they never collide with GitHub PR numbers (`#NNN`): -- **Milestones** — `LTV-M1` … `LTV-M7`. A milestone is a coherent capability. +- **Milestones** — `LTV-M0` … `LTV-M7`. A milestone is a coherent capability. - **PRs** — `LTV-Pa`, `LTV-Pb`, … (sequential letters, globally unique across the whole roadmap). Each PR maps to exactly one milestone. When a PR is opened on GitHub, reference its planning code in the title, e.g. -`feat(schema): lifecycle entity rows [LTV-Pa]`, and the GitHub number (`#NNN`) -is recorded back here on merge. The two namespaces stay distinct: `LTV-Pa` +`feat(schema): lifecycle entity rows [LTV-Pb]`, and the GitHub number (`#NNN`) +is recorded back here on merge. The two namespaces stay distinct: `LTV-Pb` (plan) ↔ `#123` (GitHub). -GitHub milestone: **`dataset: leadforge-ltv-v1`** — all LTV PRs assign to it. -Default labels per PR: a `type:` label, relevant `layer:` labels, and +GitHub milestone: **`dataset: leadforge-ltv-v1`** (#8) — all LTV PRs assign to +it. Default labels per PR: a `type:` label, relevant `layer:` labels, and `dataset: leadforge-ltv-v1`. --- @@ -28,27 +28,28 @@ Default labels per PR: a `type:` label, relevant `layer:` labels, and | Milestone | Capability | PRs | GitHub PRs | |-----------|------------|-----|------------| -| `LTV-M0` | Planning + design lock | `LTV-Pa` | _this PR_ | +| `LTV-M0` | Planning + design lock | `LTV-Pa` | #102 (+ pLTV reframe) | | `LTV-M1` | Schema foundation | `LTV-Pb`, `LTV-Pc` | | | `LTV-M2` | Customer population + lifecycle world | `LTV-Pd`, `LTV-Pe` | | | `LTV-M3` | Lifecycle simulation engine | `LTV-Pf`, `LTV-Pg` | | -| `LTV-M4` | Customer snapshot + leakage trap | `LTV-Ph` | | -| `LTV-M5` | Recipe wiring + framework dispatch | `LTV-Pi`, `LTV-Pj` | | -| `LTV-M6` | Validation + difficulty calibration | `LTV-Pk` | | -| `LTV-M7` | CLI, notebooks, publish | `LTV-Pl`, `LTV-Pm`, `LTV-Pn` | | +| `LTV-M4` | Customer snapshots + pLTV targets (both regimes) | `LTV-Ph`, `LTV-Pi` | | +| `LTV-M5` | Recipe wiring + framework dispatch | `LTV-Pj`, `LTV-Pk` | | +| `LTV-M6` | Validation + regression-metric calibration | `LTV-Pl` | | +| `LTV-M7` | CLI, notebooks, publish | `LTV-Pm`, `LTV-Pn`, `LTV-Po` | | -Total: ~13 PRs across 8 milestones (LTV-M0 = planning). Comparable in scope to -the original M4–M9 framework build. +Total: ~15 PRs across 8 milestones (LTV-M0 = planning). --- ## `LTV-M0` — Planning + design lock -- [ ] **`LTV-Pa`** — _this PR._ Land `docs/ltv/design.md` + `docs/ltv/roadmap.md`; +- [x] **`LTV-Pa`** — planning. Land `docs/ltv/design.md` + `docs/ltv/roadmap.md`; create the `dataset: leadforge-ltv-v1` GitHub milestone + label; record the - five locked design decisions (D1–D5). No package code. + locked design decisions. **Merged as #102.** A follow-up docs PR corrected + the target framing from churn classification to **pLTV regression** (ZILN; + multiple windows; gross revenue; first-class early-pLTV variant; churn kept + secondary) — see `design.md` §2.2. No package code. - Labels: `type: docs` - - Deliverable: design doc, roadmap, milestone/label scaffolding. --- @@ -63,12 +64,16 @@ the original M4–M9 framework build. - Tests: row round-trips, empty-dataframe dtypes, FK constraint registration, lead-scoring schema unaffected. - Labels: `type: feature`, `layer: schema` -- [ ] **`LTV-Pc`** — `feat(schema): customer snapshot feature spec + tasks`. Add - `CUSTOMER_SNAPSHOT_FEATURES` to `features.py` (incl. the `mrr_change_full_period` - trap with `leakage_risk=True`); add `CHURN_WITHIN_180_DAYS` and - `LTV_BUCKET_6M` task specs to `tasks.py`. - - Tests: feature-spec invariants (single target, trap flagged, no - zero-variance by construction), task-spec shape. +- [ ] **`LTV-Pc`** — `feat(schema): pLTV feature spec + regression task specs`. + Add `CUSTOMER_SNAPSHOT_FEATURES` to `features.py` — including the three + continuous targets (`ltv_revenue_{90,365,730}d`), the secondary + `churned_within_180d`, and the `mrr_change_full_period` trap + (`leakage_risk=True`). Add **regression** task specs + (`LTV_REVENUE_{90,365,730}D`) + the secondary `CHURN_WITHIN_180D` to + `tasks.py`; extend the task-spec model to carry `task_type` + (`regression` | `classification`). + - Tests: feature-spec invariants (multiple targets allowed, trap flagged, + no zero-variance by construction), regression task-spec shape. - Labels: `type: feature`, `layer: schema` --- @@ -79,13 +84,13 @@ the original M4–M9 framework build. `build_customer_population()` in `customer_population.py`: customer entities, 5 new latent traits, **staggered start dates** within an acquisition window ending at the absolute `observation_date` (D4). Keep a seam for future - chained generation (D3). Reuse `RNGRoot` named-substream convention. + chained generation (D3). Reuse the `RNGRoot` named-substream convention. - Tests: determinism under seed, latent distributions, staggered-start spread, FK integrity, acquisition-window boundary. - Labels: `type: feature`, `layer: simulation` - [ ] **`LTV-Pe`** — `feat(mechanisms): lifecycle motif families + policies`. 5 retention motif families with latent-mean biases; `assign_lifecycle_mechanisms()` - policy mapping motif → churn/expansion/payment mechanism params. + mapping motif → churn/expansion/payment params. - Tests: per-motif param tables, policy dispatch, determinism. - Labels: `type: feature`, `layer: mechanisms` @@ -95,85 +100,101 @@ the original M4–M9 framework build. - [ ] **`LTV-Pf`** — `feat(mechanisms): churn / expansion / payment hazards`. `lifecycle_hazards.py`: Weibull-shaped churn hazard with renewal-date spike, - expansion propensity, payment-failure + dunning. Built on `LatentScore` + - per-step Bernoulli. + expansion propensity (the heavy-tail generator for pLTV), payment failure + + dunning. Built on `LatentScore` + per-step Bernoulli. - Tests: hazard shape over tenure, renewal spike, dunning escalation, - expansion MRR delta bounds. + expansion MRR-delta bounds. - Labels: `type: feature`, `layer: mechanisms` - [ ] **`LTV-Pg`** — `feat(simulation): weekly lifecycle engine`. `simulate_lifecycle()` in `lifecycle.py`: weekly loop (D2) per customer from - staggered start through `observation_date + 180d + buffer`; emits + staggered start through `observation_date + 730d (+ buffer for the early + regime)` so **all three target windows are fully simulated** (D6); emits `subscription_events`, `health_signals`, `invoices`; updates customer/subscription terminal state. RNG substreams `lifecycle_transitions` / `lifecycle_events` / `lifecycle_post_sim`. - Tests: determinism, churn-rate bounds per difficulty, still-active - fraction, weekly health-signal cadence, monthly invoice cadence, - every customer simulated through the full label window. + fraction, weekly health cadence, monthly invoice cadence, every customer + simulated through the longest forward window. - Labels: `type: feature`, `layer: simulation` --- -## `LTV-M4` — Customer snapshot + leakage trap - -- [ ] **`LTV-Ph`** — `feat(render): customer snapshot builder`. - `build_customer_snapshot()` in `customer_snapshots.py`: **absolute - `observation_date` cutoff** (not relative `snapshot_day`); aggregate health - / events / invoices over last-12-weeks windows; compute - `mrr_change_at_snapshot` (valid) and `mrr_change_full_period` (trap); derive - `churned_within_180_days` and `ltv_bucket_6m`; difficulty distortions. - - Tests: no post-`observation_date` data in windowed columns; trap-invariant - (full ≠ snapshot for non-trivial fraction); label derivation; difficulty - distortion exemption for the trap. +## `LTV-M4` — Customer snapshots + pLTV targets (both regimes) + +- [ ] **`LTV-Ph`** — `feat(render): calendar-anchored customer snapshot`. + `build_customer_snapshot(cutoff=observation_date)` in `customer_snapshots.py`: + **absolute `observation_date` cutoff**; aggregate health / events / invoices + over last-12-weeks windows; compute `mrr_change_at_snapshot` (valid) and + `mrr_change_full_period` (trap); compute the three forward-window gross-revenue + targets `ltv_revenue_{90,365,730}d` (D6/D7) and the secondary + `churned_within_180d`; difficulty distortions. + - Tests: no post-cutoff data in windowed feature columns; ZILN target shape + (positive zero-mass + heavy tail; mass grows with window); trap-invariant; + label/target derivation; trap exempt from distortion. + - Labels: `type: feature`, `layer: render` +- [ ] **`LTV-Pi`** — `feat(render): early-pLTV (tenure-anchored) task family`. + Reuse `build_customer_snapshot` with a **per-customer relative cutoff** + (`customer_start + early_tenure_weeks`, e.g. 4w) to emit the cold-start + snapshot + the same three forward-window targets recomputed off that cutoff + (D8). Exported under a separate task directory. + - Tests: per-customer cutoff correctness, short-tenure feature sparsity, + target recomputation parity, no post-cutoff leakage. - Labels: `type: feature`, `layer: render` --- ## `LTV-M5` — Recipe wiring + framework dispatch -- [ ] **`LTV-Pi`** — `feat(api,core): recipe_type dispatch + lifecycle config`. - Add `n_customers` + lifecycle fields to `GenerationConfig`; parse - `recipe_type` + `lifecycle:` section in `recipes.py`; dispatch the lifecycle - path in `Generator.generate()`; bump `BUNDLE_SCHEMA_VERSION` 5 → 6 (D5); - record `observation_date` in the manifest. Extend `CLAUDE.md` hard - constraints with the lifecycle snapshot-safety clause. +- [ ] **`LTV-Pj`** — `feat(api,core,render): recipe_type dispatch + regression + task splits`. Add `n_customers` + lifecycle config (windows, early-tenure, + observation anchor) to `GenerationConfig`; parse `recipe_type` + `lifecycle:` + in `recipes.py`; dispatch the lifecycle path in `Generator.generate()`; bump + `BUNDLE_SCHEMA_VERSION` 5 → 6 (D5); record `observation_date` + windows in the + manifest; teach the task-split writer a **continuous-target** path. Extend + `CLAUDE.md` hard constraints with the lifecycle snapshot-safety clause. - Tests: config precedence, dispatch on recipe_type, lead-scoring path - unaffected, manifest schema-version + observation_date, exposure filtering - for new tables. + unaffected, manifest schema-version + observation_date, regression split + writer, exposure filtering for new tables. - Labels: `type: feature`, `layer: api`, `layer: core`, `layer: render` -- [ ] **`LTV-Pj`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three - recipe YAMLs (recipe/narrative/difficulty_profiles); register in the recipe - registry; end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` - smoke test producing a saved bundle. - - Tests: recipe loads, full generation round-trip, determinism under same - seed, both task splits written, public/instructor exposure split. +- [ ] **`LTV-Pk`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three + recipe YAMLs; register in the recipe registry; end-to-end + `Generator.from_recipe("b2b_saas_ltv_v1").generate()` smoke test producing a + saved bundle with both task families. + - Tests: recipe loads, full generation round-trip, determinism under seed, + all task splits written (3 windows × 2 regimes + secondary churn), + public/instructor exposure split. - Labels: `type: feature`, `layer: recipes` --- -## `LTV-M6` — Validation + difficulty calibration - -- [ ] **`LTV-Pk`** — `feat(validation): lifecycle leakage probes + realism bands`. - Lifecycle leakage probes (absolute-cutoff window check; banned terminal - columns/tables; deterministic reconstruction); difficulty calibration - (churn-rate / expansion-rate / still-active bands); trap-invariant guard; - cross-seed drift. Dataset-card renderer for the lifecycle narrative. - - Tests: probe coverage, band checks per tier, cross-seed stability. +## `LTV-M6` — Validation + regression-metric calibration + +- [ ] **`LTV-Pl`** — `feat(validation): lifecycle leakage probes + pLTV metric + bands`. Lifecycle leakage probes (cutoff window check; banned terminal + columns/tables; banned forward-window target columns in relational tables); + **regression** evaluation (Spearman, normalized Gini, decile calibration, + total-pred-vs-actual, value capture) and per-tier/per-window bands; + trap-invariant guard; cross-seed drift. Dataset-card renderer for the + lifecycle narrative. + - Tests: probe coverage, regression metric bands per tier × window, + cross-seed stability. - Labels: `type: feature`, `layer: validation` --- ## `LTV-M7` — CLI, notebooks, publish -- [ ] **`LTV-Pl`** — `feat(cli): lifecycle generate flags + inspect surfacing`. - `--n-customers`, `--observation-date` (or derived) flags; `inspect` surfaces - lifecycle manifest fields (observation_date, task inventory). +- [ ] **`LTV-Pm`** — `feat(cli): lifecycle generate flags + inspect surfacing`. + `--n-customers`, observation/early-tenure flags; `inspect` surfaces lifecycle + manifest fields (observation_date, windows, task inventory). - Labels: `type: feature`, `layer: cli` -- [ ] **`LTV-Pm`** — `docs(notebooks): survival + churn teaching sequence`. - Notebooks: churn baseline, survival-analysis intro (right-censoring on the - relational tables), the `mrr_change_full_period` leakage demo, value-aware - LTV ranking. +- [ ] **`LTV-Pn`** — `docs(notebooks): pLTV teaching sequence`. Notebooks: + ZILN vs MSE regression baseline, discrimination/calibration metrics + (Spearman / normalized Gini / decile charts), the `mrr_change_full_period` + leakage demo, **early/cold-start pLTV** (predict long-horizon value from a + short window), value-aware ranking, and a right-censoring note on total LTV. - Labels: `type: docs`, `layer: render` -- [ ] **`LTV-Pn`** — `feat(release): package + publish b2b_saas_ltv_v1`. Kaggle +- [ ] **`LTV-Po`** — `feat(release): package + publish b2b_saas_ltv_v1`. Kaggle + HF packaging (reuse Phase-5 packagers), LLM critique run, dataset card, release notes, tag. Publishes under the now-live `leadforge` Kaggle org. - Labels: `type: feature`, `layer: validation` @@ -187,7 +208,7 @@ LTV-M0 (plan) └─ LTV-M1 (schema) └─ LTV-M2 (population + motifs) └─ LTV-M3 (engine) - └─ LTV-M4 (snapshot) + └─ LTV-M4 (snapshots + pLTV targets, both regimes) └─ LTV-M5 (wiring + recipe) ← first end-to-end bundle └─ LTV-M6 (validation) └─ LTV-M7 (publish)