feat(lifecycle): calendar-anchored customer snapshot [LTV-Pl] by shaypal5 · Pull Request #119 · leadforge-dev/leadforge

shaypal5 · 2026-06-12T10:26:07Z

Summary

LTV-Pl (roadmap LTV-M5): the calendar-anchored customer snapshot — the pLTV scheme's ML-ready table, one row per customer active at the observation_date cutoff, with the three continuous ltv_revenue_{90,365,730}d targets.

New: `schemes/lifecycle/features.py`

CUSTOMER_SNAPSHOT_FEATURES — 27-column catalog (firmographics, at-cutoff subscription state, last-12-week health aggregates, financial signals, leakage trap, 4 targets). This discharges the feature-catalog half of LTV-Pc; the regression task specs + task_type move to LTV-Pn with the task-split writer.

Two design.md §8 columns are deliberately omitted (documented in the module docstring + pinned by test):

current_plan — the engine has no plan-change mechanism, so it would duplicate initial_plan exactly;
downgrade_count — no downgrade mechanism, so it would be zero-variance (violates the published-bundle invariant).

New: `schemes/lifecycle/snapshots.py`

build_customer_snapshot(population, sim, *, cutoff=…, difficulty_params=…, seed=…):

At-cutoff state from the event chain, not the terminal row — current_mrr, renewal_count, expansion_count, mrr_change_at_snapshot are reconstructed from subscription_events ≤ cutoff (the subscription row holds end-of-simulation state and would leak).
Health aggregates over (cutoff − 12w, cutoff]; active_user_trend_l12w is an OLS slope; last_nps_score scans the whole history (NPS is quarterly — a 12-week window would miss most customers' latest response purely by phase).
Targets (D6/D7): gross revenue = paid + recovered invoices, attributed by issuance date, summed over (cutoff, cutoff + window]; churned_within_180d from churn_at. cutoff > observation_date is rejected — the engine only guarantees complete forward windows up to the anchor, so a later cutoff would silently censor targets.
Trap (design.md §7): mrr_change_full_period = terminal − initial MRR; diverges from the valid counterpart for ~67% of rows on an expansion-led world.
weeks_to_next_renewal uses the same round(k·term·52/12) boundary as is_renewal_week, so the feature and the hazard spike agree exactly.

Refactor: `render/distortions.py`

The difficulty-distortion algorithm (noise / MCAR missingness / outliers) extracted verbatim from the lead-scoring snapshot builder into scheme-agnostic apply_difficulty_distortions(feature_specs=…, exempt_cols=…); lead-scoring delegates with its catalog + total_touches_all exemption. Verified byte-identical: sha256 of a distorted lead-scoring snapshot is 196bc45f… before and after the extraction. The lifecycle trap is distortion-exempt for the same pedagogical reason.

Tests (39 new; full suite 1754 passed / 51 skipped)

Censoring leakage probe: rebuild the snapshot from a sim result with all post-cutoff events deleted — every non-target, non-trap column must be identical. Any feature that changes, leaks.
Target derivation re-computed row-by-row from the invoice table; failed/written_off exclusion proven on a payment-fragile world (uncollected sum > 0 and totals reconcile).
ZILN target shape (non-negative, right-skewed, heavy tail), window monotonicity (90 ≤ 365 ≤ 730).
Trap-divergence invariant (>10% of rows); trap + targets exempt from distortion; distortion determinism per seed.
Catalog invariants incl. a regression guard pinning the current_plan/downgrade_count omission decision.

Roadmap

docs/ltv/roadmap.md: LTV-Pl ✓; LTV-Pc annotated (catalog half discharged here, task-spec half folds into LTV-Pn).
Next: LTV-Pm — early-pLTV tenure-anchored task family reusing this builder with a per-customer cutoff.

🤖 Generated with Claude Code

Build the pLTV snapshot table: one row per customer active at the observation-date cutoff, flattened from the weekly lifecycle simulation. - schemes/lifecycle/features.py — CUSTOMER_SNAPSHOT_FEATURES catalog: firmographics, at-cutoff subscription state, last-12-week health aggregates, financial signals, the mrr_change_full_period leakage trap (leakage_risk=True, never redacted), and four targets: ltv_revenue_{90,365,730}d (Float64, gross revenue D7) + churned_within_180d (secondary / ZILN zero-inflation indicator D9). Discharges the feature-catalog half of LTV-Pc; the regression task specs move to LTV-Pn with the task-split writer. Deliberately omitted: current_plan (no plan-change mechanism — exact duplicate of initial_plan) and downgrade_count (no downgrade mechanism — zero-variance); re-add only together with the mechanism. - schemes/lifecycle/snapshots.py — build_customer_snapshot(cutoff=…): at-cutoff MRR/renewal/expansion state reconstructed from the event chain (the subscription row holds terminal state and would leak); health *_l12w window (cutoff-12w, cutoff] with OLS active-user trend; last_nps_score looks over the whole history (quarterly cadence would alias a 12-week window); ltv_revenue_* sums paid+recovered invoices attributed by issuance date in (cutoff, cutoff+window]; cutoff > observation_date rejected (forward windows would be silently censored). weeks_to_next_renewal uses the same anniversary boundary as is_renewal_week, so the feature and the hazard spike agree. - render/distortions.py — difficulty distortions extracted verbatim from the lead-scoring snapshot builder into a scheme-agnostic apply_difficulty_distortions(feature_specs=…, exempt_cols=…); lead-scoring delegates with its catalog + total_touches_all exemption. Verified BYTE-IDENTICAL on a distorted lead-scoring snapshot (sha256 196bc45f… before and after). The lifecycle trap mrr_change_full_period is distortion-exempt for the same pedagogical reason as total_touches_all. Tests (39 new): censoring leakage probe (every feature column identical after deleting all post-cutoff events); target derivation against the invoice table incl. failed/written-off exclusion; ZILN target shape (right-skew + heavy tail); trap-divergence invariant (>10% of rows); trap and targets exempt from distortion; renewal-boundary agreement; catalog invariants. Full suite 1754 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

Adds the lifecycle (pLTV) calendar-anchored customer snapshot builder and its feature catalog, plus extracts the snapshot difficulty-distortion logic into a shared scheme-agnostic utility so lifecycle and lead-scoring can reuse identical behavior.

Changes:

Introduces CUSTOMER_SNAPSHOT_FEATURES and build_customer_snapshot(...) to produce an ML-ready, cutoff-safe customer snapshot with forward-window revenue targets and a deliberate leakage trap.
Extracts lead-scoring’s snapshot distortion algorithm into leadforge/render/distortions.py and delegates from lead-scoring + lifecycle.
Adds comprehensive lifecycle snapshot/catalog tests and updates LTV roadmap/agent plan docs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/schemes/lifecycle/test_snapshots.py	Adds end-to-end tests for snapshot safety (cutoff censoring), feature derivations, targets, trap, and distortions.
tests/schemes/lifecycle/test_features.py	Adds invariant tests for the lifecycle snapshot feature catalog (targets, trap, omissions).
leadforge/schemes/lifecycle/snapshots.py	Implements the lifecycle customer snapshot builder and helper aggregations.
leadforge/schemes/lifecycle/features.py	Defines the lifecycle snapshot feature catalog (including targets + leakage trap).
leadforge/schemes/lead_scoring/render/snapshots.py	Refactors lead-scoring snapshot distortions to call the shared distortion utility.
leadforge/render/distortions.py	Adds shared distortion implementation (noise/missingness/outliers) driven by `FeatureSpec`.
docs/ltv/roadmap.md	Updates roadmap status/notes to reflect `LTV-Pl` completion and scope split.
.agent-plan.md	Updates agent plan status notes for the newly opened/merged roadmap items.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…[LTV-Pl] Five findings from hostile self-review of the snapshot PR: 1. SILENT TARGET CENSORING (the serious one). The builder validated cutoff <= observation_date but had no way to verify the sim was run with a horizon covering the target windows: simulate_lifecycle(..., forward_window_days=90) followed by build_customer_snapshot() would emit plausible-looking, deterministic, silently truncated ltv_revenue_365d/730d values — exactly the censoring failure the cutoff check exists to prevent. Fix: LifecycleSimulationResult now records forward_window_days/early_tenure_weeks, and the builder raises unless the recorded horizon covers max(730, 180) days. Regression tests on both sides. 2. The contract-anniversary boundary was COPIED, not shared — and the docstring bragged it was "the same boundary is_renewal_week uses". A claim, not a mechanism: a second copy of round(k*term*52/12) and a third copy of _WEEKS_PER_MONTH, where drift would desynchronize the published weeks_to_next_renewal feature from the hazard spike (the data<->causality agreement this dataset teaches). Fix: public hazards.next_renewal_week() beside is_renewal_week; the snapshot consumes it; duplicates deleted. Property test: the returned week is the FIRST week after the input satisfying is_renewal_week, for every week in [0, 160) across 12/13/24-month terms. 3. A mismatched (population, sim) pair died with a bare KeyError deep in the row loop — the one precondition the function depends on was the one it never checked. Fix: explicit validation with a population/sim-mismatch diagnostic naming the first missing customer. 4. The module-level _EMPTY_*_AGG fallback dicts were handed out SHARED via .get(cid, _EMPTY_...) — read-only today; the first accidental write corrupts every subsequent row. Fix: MappingProxyType (the mechanisms.py precedent), so mutation raises instead. 5. Documentation debts: the design.md §7 secondary advanced-tier trap (last_health_signal_post_obs) is now EXPLICITLY deferred to LTV-Pn in the roadmap (tier-conditional — belongs with difficulty wiring); features.py documents that MCAR missingness stacks on semantic nulls (last_nps_score, weeks_since_last_payment_failure); distortions.py documents the inherited byte-identity-locked Int64->Float64 dtype flip under missingness. Full suite 1768 passed / 51 skipped; ruff + mypy clean; lead-scoring distorted-snapshot hash still byte-identical (sha256 196bc45f...). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Address the two Copilot review threads on #119: 1. APPLIED — the post-noise non-negative clamp lists filtered only on dtype + non_negative, so target columns (and any future non-negative exempt trap) were clip(lower=0)-ed. Behaviorally a no-op today (those columns never receive noise, and non_negative raw values are >= 0 by definition), but the "targets are never distorted" contract must hold by construction, not coincidence. Clamp lists now exclude is_target and exempt_cols exactly like the distortion-eligible lists. Verified byte-identical for lead-scoring (sha256 196bc45f… unchanged). New tests prove a deliberately out-of-contract negative value in a target / exempt column survives distortion untouched. 2. DECLINED with rationale — pd.NA into nullable Int64 instead of the Float64 conversion under missingness: the current behavior is byte-identity-locked with the shipped lead-scoring bundles (v4/v6/v7 and their validation scripts assume Float64). Documented as a Known wart in the module docstring; an opt-in dtype-preserving mode for the lifecycle scheme (no shipped bundles yet) is recorded in the roadmap as deferred to LTV-Pn, where the bundle writer fixes the lifecycle parquet schemas. Full suite 1771 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T11:32:44Z

pr-agent-context report:

This run includes unresolved review comments on PR #119 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/render/distortions.py:105
URL: https://github.com/leadforge-dev/leadforge/pull/119#discussion_r3402599015
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The post-noise non-negative clamps currently include any `non_negative` column in the feature catalog, even if it’s a target or explicitly exempt. This can violate the contract that targets are never distorted (and could also accidentally clamp an exempt leakage-trap column if it were marked non_negative in the future). Filter these clamp lists the same way as the distortion-eligible lists: exclude targets and `exempt_cols`.

## COPILOT-2
Location: leadforge/render/distortions.py:149
URL: https://github.com/leadforge-dev/leadforge/pull/119#discussion_r3402599067
Root author: copilot-pull-request-reviewer

Comment:
    Missingness injection converts `Int64` columns to `Float64` and writes `np.nan`, which changes schema dtypes and loses integer-ness even though pandas nullable `Int64` supports missing values. Consider assigning `pd.NA` directly (and avoid dtype conversion) so distorted snapshots can retain the catalog’s nullable dtypes.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27412981060 attempt 1
Comment timestamp: 2026-06-12T11:31:54.454632+00:00
PR head commit: 93c720d327f6f85d99133883e82a4f1b11e50ed6

Copilot AI review requested due to automatic review settings June 12, 2026 10:26

shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 12, 2026

shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output status: needs review Ready for review dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 12, 2026

Copilot started reviewing on behalf of shaypal5 June 12, 2026 10:26 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread leadforge/render/distortions.py

Comment thread leadforge/render/distortions.py

This comment has been minimized.

Sign in to view

shaypal5 merged commit c0abf4a into main Jun 12, 2026
11 of 13 checks passed

shaypal5 deleted the feat/lifecycle-snapshot branch June 12, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lifecycle): calendar-anchored customer snapshot [LTV-Pl]#119

feat(lifecycle): calendar-anchored customer snapshot [LTV-Pl]#119
shaypal5 merged 3 commits into
mainfrom
feat/lifecycle-snapshot

shaypal5 commented Jun 12, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Jun 12, 2026

Summary

New: schemes/lifecycle/features.py

New: schemes/lifecycle/snapshots.py

Refactor: render/distortions.py

Tests (39 new; full suite 1754 passed / 51 skipped)

Roadmap

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New: `schemes/lifecycle/features.py`

New: `schemes/lifecycle/snapshots.py`

Refactor: `render/distortions.py`