Skip to content

feat(lifecycle): calendar-anchored customer snapshot [LTV-Pl]#119

Merged
shaypal5 merged 3 commits into
mainfrom
feat/lifecycle-snapshot
Jun 12, 2026
Merged

feat(lifecycle): calendar-anchored customer snapshot [LTV-Pl]#119
shaypal5 merged 3 commits into
mainfrom
feat/lifecycle-snapshot

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

LTV-Pl (roadmap LTV-M5): the calendar-anchored customer snapshot — the pLTV scheme's ML-ready table, one row per customer active at the observation_date cutoff, with the three continuous ltv_revenue_{90,365,730}d targets.

New: schemes/lifecycle/features.py

CUSTOMER_SNAPSHOT_FEATURES — 27-column catalog (firmographics, at-cutoff subscription state, last-12-week health aggregates, financial signals, leakage trap, 4 targets). This discharges the feature-catalog half of LTV-Pc; the regression task specs + task_type move to LTV-Pn with the task-split writer.

Two design.md §8 columns are deliberately omitted (documented in the module docstring + pinned by test):

  • current_plan — the engine has no plan-change mechanism, so it would duplicate initial_plan exactly;
  • downgrade_count — no downgrade mechanism, so it would be zero-variance (violates the published-bundle invariant).

New: schemes/lifecycle/snapshots.py

build_customer_snapshot(population, sim, *, cutoff=…, difficulty_params=…, seed=…):

  • At-cutoff state from the event chain, not the terminal rowcurrent_mrr, renewal_count, expansion_count, mrr_change_at_snapshot are reconstructed from subscription_events ≤ cutoff (the subscription row holds end-of-simulation state and would leak).
  • Health aggregates over (cutoff − 12w, cutoff]; active_user_trend_l12w is an OLS slope; last_nps_score scans the whole history (NPS is quarterly — a 12-week window would miss most customers' latest response purely by phase).
  • Targets (D6/D7): gross revenue = paid + recovered invoices, attributed by issuance date, summed over (cutoff, cutoff + window]; churned_within_180d from churn_at. cutoff > observation_date is rejected — the engine only guarantees complete forward windows up to the anchor, so a later cutoff would silently censor targets.
  • Trap (design.md §7): mrr_change_full_period = terminal − initial MRR; diverges from the valid counterpart for ~67% of rows on an expansion-led world.
  • weeks_to_next_renewal uses the same round(k·term·52/12) boundary as is_renewal_week, so the feature and the hazard spike agree exactly.

Refactor: render/distortions.py

The difficulty-distortion algorithm (noise / MCAR missingness / outliers) extracted verbatim from the lead-scoring snapshot builder into scheme-agnostic apply_difficulty_distortions(feature_specs=…, exempt_cols=…); lead-scoring delegates with its catalog + total_touches_all exemption. Verified byte-identical: sha256 of a distorted lead-scoring snapshot is 196bc45f… before and after the extraction. The lifecycle trap is distortion-exempt for the same pedagogical reason.

Tests (39 new; full suite 1754 passed / 51 skipped)

  • Censoring leakage probe: rebuild the snapshot from a sim result with all post-cutoff events deleted — every non-target, non-trap column must be identical. Any feature that changes, leaks.
  • Target derivation re-computed row-by-row from the invoice table; failed/written_off exclusion proven on a payment-fragile world (uncollected sum > 0 and totals reconcile).
  • ZILN target shape (non-negative, right-skewed, heavy tail), window monotonicity (90 ≤ 365 ≤ 730).
  • Trap-divergence invariant (>10% of rows); trap + targets exempt from distortion; distortion determinism per seed.
  • Catalog invariants incl. a regression guard pinning the current_plan/downgrade_count omission decision.

Roadmap

  • docs/ltv/roadmap.md: LTV-Pl ✓; LTV-Pc annotated (catalog half discharged here, task-spec half folds into LTV-Pn).
  • Next: LTV-Pm — early-pLTV tenure-anchored task family reusing this builder with a per-customer cutoff.

🤖 Generated with Claude Code

Build the pLTV snapshot table: one row per customer active at the
observation-date cutoff, flattened from the weekly lifecycle simulation.

- schemes/lifecycle/features.py — CUSTOMER_SNAPSHOT_FEATURES catalog:
  firmographics, at-cutoff subscription state, last-12-week health
  aggregates, financial signals, the mrr_change_full_period leakage trap
  (leakage_risk=True, never redacted), and four targets:
  ltv_revenue_{90,365,730}d (Float64, gross revenue D7) +
  churned_within_180d (secondary / ZILN zero-inflation indicator D9).
  Discharges the feature-catalog half of LTV-Pc; the regression task
  specs move to LTV-Pn with the task-split writer.
  Deliberately omitted: current_plan (no plan-change mechanism — exact
  duplicate of initial_plan) and downgrade_count (no downgrade
  mechanism — zero-variance); re-add only together with the mechanism.

- schemes/lifecycle/snapshots.py — build_customer_snapshot(cutoff=…):
  at-cutoff MRR/renewal/expansion state reconstructed from the event
  chain (the subscription row holds terminal state and would leak);
  health *_l12w window (cutoff-12w, cutoff] with OLS active-user trend;
  last_nps_score looks over the whole history (quarterly cadence would
  alias a 12-week window); ltv_revenue_* sums paid+recovered invoices
  attributed by issuance date in (cutoff, cutoff+window]; cutoff >
  observation_date rejected (forward windows would be silently
  censored). weeks_to_next_renewal uses the same anniversary boundary
  as is_renewal_week, so the feature and the hazard spike agree.

- render/distortions.py — difficulty distortions extracted verbatim
  from the lead-scoring snapshot builder into a scheme-agnostic
  apply_difficulty_distortions(feature_specs=…, exempt_cols=…);
  lead-scoring delegates with its catalog + total_touches_all
  exemption. Verified BYTE-IDENTICAL on a distorted lead-scoring
  snapshot (sha256 196bc45f… before and after). The lifecycle trap
  mrr_change_full_period is distortion-exempt for the same pedagogical
  reason as total_touches_all.

Tests (39 new): censoring leakage probe (every feature column identical
after deleting all post-cutoff events); target derivation against the
invoice table incl. failed/written-off exclusion; ZILN target shape
(right-skew + heavy tail); trap-divergence invariant (>10% of rows);
trap and targets exempt from distortion; renewal-boundary agreement;
catalog invariants.

Full suite 1754 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 12, 2026 10:26
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 12, 2026
@shaypal5 shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output status: needs review Ready for review dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 12, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the lifecycle (pLTV) calendar-anchored customer snapshot builder and its feature catalog, plus extracts the snapshot difficulty-distortion logic into a shared scheme-agnostic utility so lifecycle and lead-scoring can reuse identical behavior.

Changes:

  • Introduces CUSTOMER_SNAPSHOT_FEATURES and build_customer_snapshot(...) to produce an ML-ready, cutoff-safe customer snapshot with forward-window revenue targets and a deliberate leakage trap.
  • Extracts lead-scoring’s snapshot distortion algorithm into leadforge/render/distortions.py and delegates from lead-scoring + lifecycle.
  • Adds comprehensive lifecycle snapshot/catalog tests and updates LTV roadmap/agent plan docs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/schemes/lifecycle/test_snapshots.py Adds end-to-end tests for snapshot safety (cutoff censoring), feature derivations, targets, trap, and distortions.
tests/schemes/lifecycle/test_features.py Adds invariant tests for the lifecycle snapshot feature catalog (targets, trap, omissions).
leadforge/schemes/lifecycle/snapshots.py Implements the lifecycle customer snapshot builder and helper aggregations.
leadforge/schemes/lifecycle/features.py Defines the lifecycle snapshot feature catalog (including targets + leakage trap).
leadforge/schemes/lead_scoring/render/snapshots.py Refactors lead-scoring snapshot distortions to call the shared distortion utility.
leadforge/render/distortions.py Adds shared distortion implementation (noise/missingness/outliers) driven by FeatureSpec.
docs/ltv/roadmap.md Updates roadmap status/notes to reflect LTV-Pl completion and scope split.
.agent-plan.md Updates agent plan status notes for the newly opened/merged roadmap items.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/render/distortions.py
Comment thread leadforge/render/distortions.py
…[LTV-Pl]

Five findings from hostile self-review of the snapshot PR:

1. SILENT TARGET CENSORING (the serious one). The builder validated
   cutoff <= observation_date but had no way to verify the sim was run
   with a horizon covering the target windows: simulate_lifecycle(...,
   forward_window_days=90) followed by build_customer_snapshot() would
   emit plausible-looking, deterministic, silently truncated
   ltv_revenue_365d/730d values — exactly the censoring failure the
   cutoff check exists to prevent. Fix: LifecycleSimulationResult now
   records forward_window_days/early_tenure_weeks, and the builder
   raises unless the recorded horizon covers max(730, 180) days.
   Regression tests on both sides.

2. The contract-anniversary boundary was COPIED, not shared — and the
   docstring bragged it was "the same boundary is_renewal_week uses".
   A claim, not a mechanism: a second copy of round(k*term*52/12) and a
   third copy of _WEEKS_PER_MONTH, where drift would desynchronize the
   published weeks_to_next_renewal feature from the hazard spike (the
   data<->causality agreement this dataset teaches). Fix: public
   hazards.next_renewal_week() beside is_renewal_week; the snapshot
   consumes it; duplicates deleted. Property test: the returned week is
   the FIRST week after the input satisfying is_renewal_week, for every
   week in [0, 160) across 12/13/24-month terms.

3. A mismatched (population, sim) pair died with a bare KeyError deep
   in the row loop — the one precondition the function depends on was
   the one it never checked. Fix: explicit validation with a
   population/sim-mismatch diagnostic naming the first missing customer.

4. The module-level _EMPTY_*_AGG fallback dicts were handed out SHARED
   via .get(cid, _EMPTY_...) — read-only today; the first accidental
   write corrupts every subsequent row. Fix: MappingProxyType (the
   mechanisms.py precedent), so mutation raises instead.

5. Documentation debts: the design.md §7 secondary advanced-tier trap
   (last_health_signal_post_obs) is now EXPLICITLY deferred to LTV-Pn
   in the roadmap (tier-conditional — belongs with difficulty wiring);
   features.py documents that MCAR missingness stacks on semantic nulls
   (last_nps_score, weeks_since_last_payment_failure); distortions.py
   documents the inherited byte-identity-locked Int64->Float64 dtype
   flip under missingness.

Full suite 1768 passed / 51 skipped; ruff + mypy clean; lead-scoring
distorted-snapshot hash still byte-identical (sha256 196bc45f...).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Address the two Copilot review threads on #119:

1. APPLIED — the post-noise non-negative clamp lists filtered only on
   dtype + non_negative, so target columns (and any future non-negative
   exempt trap) were clip(lower=0)-ed.  Behaviorally a no-op today
   (those columns never receive noise, and non_negative raw values are
   >= 0 by definition), but the "targets are never distorted" contract
   must hold by construction, not coincidence.  Clamp lists now exclude
   is_target and exempt_cols exactly like the distortion-eligible
   lists.  Verified byte-identical for lead-scoring (sha256 196bc45f…
   unchanged).  New tests prove a deliberately out-of-contract negative
   value in a target / exempt column survives distortion untouched.

2. DECLINED with rationale — pd.NA into nullable Int64 instead of the
   Float64 conversion under missingness: the current behavior is
   byte-identity-locked with the shipped lead-scoring bundles (v4/v6/v7
   and their validation scripts assume Float64).  Documented as a Known
   wart in the module docstring; an opt-in dtype-preserving mode for
   the lifecycle scheme (no shipped bundles yet) is recorded in the
   roadmap as deferred to LTV-Pn, where the bundle writer fixes the
   lifecycle parquet schemas.

Full suite 1771 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #119 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/render/distortions.py:105
URL: https://github.com/leadforge-dev/leadforge/pull/119#discussion_r3402599015
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The post-noise non-negative clamps currently include any `non_negative` column in the feature catalog, even if it’s a target or explicitly exempt. This can violate the contract that targets are never distorted (and could also accidentally clamp an exempt leakage-trap column if it were marked non_negative in the future). Filter these clamp lists the same way as the distortion-eligible lists: exclude targets and `exempt_cols`.

## COPILOT-2
Location: leadforge/render/distortions.py:149
URL: https://github.com/leadforge-dev/leadforge/pull/119#discussion_r3402599067
Root author: copilot-pull-request-reviewer

Comment:
    Missingness injection converts `Int64` columns to `Float64` and writes `np.nan`, which changes schema dtypes and loses integer-ness even though pandas nullable `Int64` supports missing values. Consider assigning `pd.NA` directly (and avoid dtype conversion) so distorted snapshots can retain the catalog’s nullable dtypes.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27412981060 attempt 1
Comment timestamp: 2026-06-12T11:31:54.454632+00:00
PR head commit: 93c720d327f6f85d99133883e82a4f1b11e50ed6

@shaypal5 shaypal5 merged commit c0abf4a into main Jun 12, 2026
11 of 13 checks passed
@shaypal5 shaypal5 deleted the feat/lifecycle-snapshot branch June 12, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: render render/ bundle and artifact output layer: schema schema/ entity/event contracts status: needs review Ready for review type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants