Skip to content

feat(lifecycle): customer population builder [LTV-Ph]#113

Merged
shaypal5 merged 5 commits into
mainfrom
feat/lifecycle-customer-population
Jun 11, 2026
Merged

feat(lifecycle): customer population builder [LTV-Ph]#113
shaypal5 merged 5 commits into
mainfrom
feat/lifecycle-customer-population

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

First implementation milestone of the pLTV workstream (LTV-Ph, milestone
LTV-M3). Adds the lifecycle customer population builder — the starting
point for the post-conversion subscription simulation.

What's added

leadforge/schemes/lifecycle/population.py:

  • build_customer_population(n_customers, seed, motif_family, *, n_accounts, observation_date, acquisition_window_weeks) → CustomerPopulationResult — single public entry point; fully deterministic via two named RNG substreams.
  • CustomerPopulationResult / CustomerLatentState — output dataclasses.
  • 5 retention motif families (LIFECYCLE_MOTIF_FAMILIES), each with distinct latent-mean biases:
family primary driver
product_led_retention latent_product_fit
relationship_led_retention latent_champion_strength
expansion_led_growth latent_adoption_velocity
payment_fragile latent_budget_stability
churner_dominated latent_product_fit ↓, latent_champion_strength
  • 5 customer latent traits: latent_product_fit, latent_adoption_velocity, latent_budget_stability, latent_champion_strength, latent_organizational_stability.
  • D3 seam (independent generation): opportunity_id=None on every customer; reserved for future chaining from a lead-scoring bundle.
  • D4 staggered starts: customer_start_at sampled uniformly in [obs_date − acquisition_window_weeks, obs_date), so tenure at the observation date naturally varies from near-zero (cold-start) to the full window.
  • Plan + MRR: employee-band-conditional plan tier selection; MRR ranges $1k–$25k/month.

Tests (28)

tests/schemes/lifecycle/test_population.py: shape, determinism, FK integrity (customer→account; latent-state coverage), staggered-start boundary and spread, D3 seam assertion, latent [0,1] bounds + 5-trait completeness, all 5 motif families, motif-bias direction test, field-value assertions.

  • Full suite 1565 passed / 51 skipped (+28); ruff + mypy clean.

Next

LTV-Pi — lifecycle motif families + mechanism policies (assign_lifecycle_mechanisms()): churn hazard, expansion propensity, payment-failure params, keyed to the 5 retention motif families built here.

🤖 Generated with Claude Code

First implementation milestone of the pLTV workstream (LTV-M3). Adds the
lifecycle customer population builder — the starting point for the post-
conversion subscription simulation.

leadforge/schemes/lifecycle/population.py:
- build_customer_population(n_customers, seed, motif_family, *, n_accounts,
  observation_date, acquisition_window_weeks) → CustomerPopulationResult.
  Two named RNG substreams (lifecycle_population_{accounts,customers}) keep
  each generation aspect independently stable.
- CustomerPopulationResult: accounts, customers, latent_state, observation_date.
- CustomerLatentState: account_latents + customer_latents dicts.
- LIFECYCLE_MOTIF_FAMILIES tuple: 5 retention motif families
  (product_led_retention, relationship_led_retention, expansion_led_growth,
  payment_fragile, churner_dominated) each with distinct latent-mean biases.
- 5 customer latent traits: latent_product_fit, latent_adoption_velocity,
  latent_budget_stability, latent_champion_strength,
  latent_organizational_stability.
- D3 seam: opportunity_id=None (independent generation); reserved for future
  chaining from a lead-scoring bundle's converted leads.
- D4 staggered starts: customer_start_at sampled uniformly in
  [obs_date - acquisition_window_weeks, obs_date), varying tenure at snapshot.
- Plan + MRR: employee-band-conditional plan tier (starter/growth/enterprise)
  with MRR ranges $1k-$3.5k / $3.5k-$9k / $9k-$25k per month.
- Contract terms: 12mo (65%) / 24mo (35%).

tests/schemes/lifecycle/test_population.py (28 tests):
- Shape: counts, type assertions, observation_date format.
- Determinism: same seed → identical output; different seeds → different.
- FK integrity: every customer.account_id in the account set; latent state
  covers exactly the customer and account populations.
- Staggered starts: all starts < observation_date; all within acquisition
  window; distribution spans both halves of the window.
- D3 seam: opportunity_id is None for all customers.
- Latent distributions: all values in [0,1]; exactly 5 traits per customer.
- Motif families: all 5 registered; each produces valid output; product_led
  has higher mean latent_product_fit than churner_dominated.
- Entity fields: plan, MRR, contract_term, CSM rep, ID prefixes all valid.

Full suite 1565 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 11, 2026 17:19
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 11, 2026
@shaypal5 shaypal5 added type: feature New capability layer: simulation simulation/ discrete-time engine status: needs review Ready for review dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 11, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the first lifecycle (pLTV) “customer population builder” for the lifecycle scheme, generating accounts + lifecycle customers + latent traits with deterministic RNG substreams, and adds a comprehensive test suite plus roadmap/plan references.

Changes:

  • Added leadforge/schemes/lifecycle/population.py with build_customer_population(...), output dataclasses, motif families, and generation logic.
  • Added tests/schemes/lifecycle/test_population.py covering determinism, FK integrity, staggered start dates, latent bounds/shape, motif families, and key field assertions.
  • Updated LTV roadmap and agent plan docs to reference the milestone/PR.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
leadforge/schemes/lifecycle/population.py New lifecycle population generator (accounts/customers/latents) with motif-family latent biases and staggered acquisition window.
tests/schemes/lifecycle/test_population.py New tests validating shape, determinism, constraints, motif behavior, and field population.
docs/ltv/roadmap.md Roadmap updated to link milestone item to PR #113.
.agent-plan.md Agent plan updated to reflect M3 start / PR tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schemes/lifecycle/population.py Outdated
Comment thread leadforge/schemes/lifecycle/population.py Outdated
Comment thread leadforge/schemes/lifecycle/population.py
Comment thread leadforge/schemes/lifecycle/population.py Outdated
Four issues found by hostile self-review of the initial commit:

1. Account latents used lead-scoring trait names (latent_account_fit,
   latent_budget_readiness, latent_process_maturity) that the lifecycle
   simulation engine will never query. The engine queries
   latent_budget_stability and latent_organizational_stability at the account
   level. All three lead-scoring keys replaced; motif-family bias now wires
   correctly through the account latents (previously bias.get calls on
   account latents always returned 0.0 because no lifecycle motif key matched
   any account latent key — the motif had zero effect on account generation).
   Regression test added.

2. CSM rep IDs used make_id("rep", ...) bypassing ID_PREFIXES["rep"]. The
   rule in core/ids.py is always go through the registry. Fixed.

3. motif_family was a positional parameter — silently passing a wrong third
   argument (e.g. n_accounts as an int) produces a confusing ValueError at
   runtime rather than a TypeError at the call site. Made keyword-only.
   Regression test added (inspect.signature check).

4. The default observation_date formula used a bare `+ 4` weeks buffer with
   no explanation. Extracted to _OBS_DATE_BUFFER_WEEKS constant with a
   comment explaining its purpose (gives earliest-acquired customers a small
   subscription history before the snapshot).

Comment accuracy fix: "shared with the lead-scoring account generator" →
"mirrors the distribution of" (the code is parallel, not shared; cross-scheme
import would create an awkward dependency).

Full suite 1567 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

… [LTV-Ph]

Implements three accepted recommendations from the copilot PR review:

COPILOT-1 (module docstring misleading about RNG independence):
  Reworded to clarify that each substream handles both entity creation AND
  latent draws for its entity type (accounts-substream = account rows + account
  latents; customers-substream = customer rows + customer latents). The
  independence is between account generation and customer generation, not
  between "population" and "latent" draws.

COPILOT-2 (docstring says "1 year" but default is 56 weeks):
  Updated observation_date docstring to show the actual formula
  (_WORLD_BASE_DATE + (acquisition_window_weeks + _OBS_DATE_BUFFER_WEEKS) weeks)
  and note the concrete value with built-in defaults (56 weeks ≈ 13 months).

COPILOT-3 (missing input validation):
  Added ValueError guards for n_customers < 1, n_accounts < 1 (when explicit),
  and acquisition_window_weeks < 1 (0 would make every start == obs_date,
  violating the < obs_date boundary invariant). Pattern mirrors
  core/models.py::_require_positive_int. Four new validation tests added.

COPILOT-4 (ID_PREFIXES["rep"] vs hardcoded string):
  Already fixed in the self-review commit (67f22df). Resolved as already treated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #113 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/schemes/lifecycle/population.py
URL: https://github.com/leadforge-dev/leadforge/pull/113#discussion_r3397854263
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The module docstring says the two RNG substreams make “population and latent draws … independently stable”, but the code only separates account vs customer generation (entities + latents share the same stream). This wording is misleading; either split entity vs latent substreams or update the docstring to reflect what is actually independent.

## COPILOT-2
Location: leadforge/schemes/lifecycle/population.py
URL: https://github.com/leadforge-dev/leadforge/pull/113#discussion_r3397854322
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The docstring claims the default observation_date is `_WORLD_BASE_DATE + 1 year`, but the implementation derives it from `acquisition_window_weeks` (and currently adds an extra 4 weeks). Please align the docstring with the implemented default (or vice versa) to avoid surprising callers.

## COPILOT-3
Location: leadforge/schemes/lifecycle/population.py:241
URL: https://github.com/leadforge-dev/leadforge/pull/113#discussion_r3397854353
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `build_customer_population` doesn’t validate `n_customers`, `n_accounts`, or `acquisition_window_weeks`. Some invalid-but-plausible inputs currently lead to confusing runtime errors (e.g., `n_accounts=0` makes `rng.choice(accounts)` crash), and `acquisition_window_weeks=0` violates the documented `[acq_start, obs_date)` invariant by producing `customer_start_at == obs_date`. Add basic positive-int validation and make the default `observation_date` consistent with the acquisition window without the extra 4-week offset.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27368792404 attempt 1
Comment timestamp: 2026-06-11T18:30:24.054928+00:00
PR head commit: 86548f1f4bb3330155d1f2f72177837791819a94

@shaypal5 shaypal5 merged commit 1321607 into main Jun 11, 2026
10 of 16 checks passed
@shaypal5 shaypal5 deleted the feat/lifecycle-customer-population branch June 11, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: simulation simulation/ discrete-time engine status: needs review Ready for review type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants