Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,13 @@ Any`; `apply_exposure` dispatches hidden truth to a
lead-scoring byte-identical both modes) opened as **#122** (merged). `LTV-Pn.3` (lifecycle `GenerationConfig` fields +
validated regression `TaskManifest` `task_type` + shared `render/tasks.py`
split writer + `schemes/lifecycle/tasks.py` task families; discharges the
`LTV-Pc` regression-task-spec leftover) opened as **#124**. Next: `Pn.4`
(complete `LifecycleScheme.build_world`/`write_bundle` + shared bundle
orchestrator + e2e bundle), `LTV-Po` (recipe).
`LTV-Pc` regression-task-spec leftover) opened as **#124** (merged). `LTV-Pn.4` split into four (build → write → public-safety
→ orchestrator): `LTV-Pn.4a` (`LifecycleScheme.build_world` — deterministic
motif sampling + population + sim + `LifecycleArtifacts`; lifecycle relational
`to_dataframes`; consumes the Pn.3 config fields) opened as **#125**. Next:
`Pn.4b` (instructor `write_bundle` + tasks), `Pn.4c` (student_public
snapshot-safety + CLAUDE.md), `Pn.4d` (shared bundle orchestrator), `LTV-Po`
(recipe).

---

Expand Down
57 changes: 44 additions & 13 deletions docs/ltv/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ protocol + registry, with the package physically reorganized into
| `LTV-M3` | Customer population + lifecycle world | `LTV-Ph`, `LTV-Pi` | #113 (Ph) |
| `LTV-M4` | Lifecycle simulation engine | `LTV-Pj`, `LTV-Pk` | #117 (Pj), #118 (Pk) |
| `LTV-M5` | Customer snapshots + pLTV targets (both regimes) | `LTV-Pl`, `LTV-Pm` | #119 (Pl), #120 (Pm) |
| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3) |
| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a) |
| `LTV-M7` | Validation + regression-metric calibration | `LTV-Pp` | |
| `LTV-M8` | CLI, notebooks, publish | `LTV-Pq`, `LTV-Pr`, `LTV-Ps` | |

Expand Down Expand Up @@ -312,25 +312,56 @@ pipeline + schema bump). Split into four sub-PRs in dependency order:
Pn.4. Lead-scoring data byte-identical (only `world_spec.json` gains the new
config fields, by design).
- Labels: `type: feature`, `layer: api`, `layer: schema`, `layer: render`
- [ ] **`LTV-Pn.4`** — `feat(lifecycle): complete LifecycleScheme + e2e bundle`.
Implement `LifecycleScheme.build_world` (population → sim) and `write_bundle`
(lifecycle relational tables; both regime snapshots → two task families ×
3 windows + secondary churn; dataset card; manifest `observation_date` +
windows via `extra_fields`; lifecycle `write_metadata` hidden-truth hook).
With both schemes' `write_bundle` in hand, **lift the shared bundle
orchestrator with scheme render hooks** out of the two implementations
(carried cleanup #1). First end-to-end lifecycle bundle (programmatic;
recipe wiring is `LTV-Po`). Extend `CLAUDE.md` hard constraints with the
lifecycle snapshot-safety clause + the `schemes/` layout. Carries the
LTV-Pp validation flags: early-regime degenerate-column exemptions; the
dtype-preserving missingness opt-in.
`LTV-Pn.4` is the largest sub-PR — split along the two `GenerationScheme`
methods, then public-safety, then the carried orchestrator cleanup:

- [ ] **`LTV-Pn.4a`** — `feat(lifecycle): build_world + relational tables`.
`LifecycleScheme.build_world`: deterministically sample a motif family from
the seed (honouring the "vary by motif family" invariant), build the customer
population, simulate, and wrap a new `LifecycleArtifacts` on the bundle. New
`schemes/lifecycle/render/relational.py` `to_dataframes` (accounts + the 5
lifecycle tables). Consumes the Pn.3 config fields (`n_customers`,
`forward_windows_days`, `early_tenure_weeks`, `observation_date`).
`write_bundle` still stubbed.
- Tests: determinism, cross-seed motif variability, FK integrity, table shapes.
- Labels: `type: feature`, `layer: api`, `layer: render`
- [ ] **`LTV-Pn.4b`** — `feat(lifecycle): write_bundle (instructor) + tasks`.
Instructor-mode `write_bundle`: relational tables; both regime snapshots →
8 task dirs (3 pLTV regression + churn, × 2 regimes) via the shared writer;
dataset card; feature dictionary; manifest with `generation_scheme` +
`observation_date` + windows (`extra_fields`); lifecycle `write_metadata`
hidden-truth hook (latent registry + mechanism summary). First on-disk
lifecycle bundle. **Must resolve `difficulty_params` from the active profile
and thread it into `build_customer_snapshot` (Pn.4a's `build_world` does not —
without this the snapshot distortions never fire and every tier is identical).**
- Labels: `type: feature`, `layer: api`, `layer: render`
- [ ] **`LTV-Pn.4c`** — `feat(lifecycle): student_public snapshot-safety`.
Public relational filtering (event tables ≤ cutoff; drop terminal
`churn_at`/`churn_reason`/`subscription_end_at`; no target columns); the
early-regime degenerate-column + dtype-preserving-missingness flags from
LTV-Pm. Extend `CLAUDE.md` hard constraints with the lifecycle
snapshot-safety clause + the `schemes/` layout.
- Labels: `type: feature`, `layer: exposure`, `layer: render`, `layer: docs`
- [ ] **`LTV-Pn.4d`** — `refactor: shared bundle orchestrator`. With both
schemes' `write_bundle` in hand, lift the shared orchestrator (mkdir →
relational → tasks → card → dict → exposure → manifest) with scheme render
hooks out of the two implementations (carried cleanup #1). Both bundles
byte-identical.
- Labels: `type: refactor`, `layer: render`, `layer: api`
- [ ] **`LTV-Po`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three
recipe YAMLs (`scheme: lifecycle`); register in the recipe registry;
end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` smoke test.
**Decide narrative consumption:** the lifecycle population hardcodes its
firmographics and `build_world` ignores `narrative` (Pn.4a) — either wire the
recipe's `narrative.yaml` into the population builder or document the
firmographics as scheme-internal.
- Tests: recipe loads, full round-trip, determinism, all task splits (3
windows × 2 regimes + secondary churn), public/instructor split.
- Labels: `type: feature`, `layer: recipes`
- **Deferred (flagged in Pn.4a):** simulation-level difficulty scaling for the
lifecycle engine — making `advanced` a genuinely harder world (not just
noisier snapshots). Currently the motif-calibrated rates are difficulty-
agnostic; revisit alongside `LTV-Pp` difficulty-band validation.

---

Expand Down
91 changes: 79 additions & 12 deletions leadforge/schemes/lifecycle/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
"""The ``lifecycle`` generation scheme (``b2b_saas_ltv_v1``) — scaffold.

Registers the second peer scheme alongside ``lead_scoring``. Its entity rows
and FK constraints live here (``entities`` / ``relationships``); the pipeline
itself (``build_world`` / ``write_bundle``) is built out across LTV-M3…M6 and
currently raises :class:`NotImplementedError`. Registering the stub now lets
the registry, recipe ``scheme:`` resolution, and tests treat lifecycle as a
first-class peer before its internals exist.
"""The ``lifecycle`` generation scheme (``b2b_saas_ltv_v1``).

The second peer scheme alongside ``lead_scoring``. Its entity rows and FK
constraints live here (``entities`` / ``relationships``); the snapshot, feature,
and task definitions live in sibling modules. :meth:`LifecycleScheme.build_world`
is implemented (LTV-Pn.4a); :meth:`write_bundle` / :meth:`write_metadata` are
built out in LTV-Pn.4b–c and currently raise :class:`NotImplementedError`.
"""

from __future__ import annotations

import random
from typing import TYPE_CHECKING, Any

from leadforge.schemes.base import register_scheme
Expand All @@ -21,13 +21,26 @@
from leadforge.narrative.spec import NarrativeSpec

_NOT_IMPLEMENTED = (
"the lifecycle (b2b_saas_ltv_v1) scheme is not implemented yet; "
"its pipeline is built across LTV-M3…M6"
"the lifecycle (b2b_saas_ltv_v1) write path is not implemented yet; "
"it is built across LTV-Pn.4b–c"
)


def _sample_motif_family(rng: random.Random) -> str:
"""Deterministically pick a retention motif family for this world.

Sampling per-seed (rather than hard-coding one family) honours the
"world structure varies via named motif families" invariant — different
seeds yield structurally different worlds.
"""
from leadforge.schemes.lifecycle.population import LIFECYCLE_MOTIF_FAMILIES

# Sort for a stable, order-independent candidate list before sampling.
return rng.choice(sorted(LIFECYCLE_MOTIF_FAMILIES))


class LifecycleScheme:
"""Stub for the customer-lifetime-value (pLTV) generation pipeline."""
"""The customer-lifetime-value (pLTV) generation pipeline."""

name = "lifecycle"

Expand All @@ -37,7 +50,61 @@ def build_world(
narrative: NarrativeSpec,
**options: Any,
) -> WorldBundle:
raise NotImplementedError(_NOT_IMPLEMENTED)
"""Sample a motif family, build the customer population, and simulate.

Deterministic given ``config`` (the population and the per-customer
weekly simulation derive from ``config.seed`` via distinct RNG
substreams). Consumes the lifecycle config fields: ``n_customers``,
``observation_date``, ``early_tenure_weeks``, and
``forward_windows_days`` (the engine simulates through the longest
window so every pLTV target is fully covered).

Not yet applied (tracked, not silent):

- **Difficulty.** ``config.difficulty`` / ``difficulty_params`` are
NOT consumed here, so every difficulty tier currently yields the same
world. Two distinct pieces remain: resolving ``difficulty_params``
from the active profile and threading it into the snapshot
distortions (``LTV-Pn.4b``, where snapshots are built), and
simulation-level difficulty scaling that actually makes harder tiers
harder worlds (deferred — see ``mechanisms.py`` and the roadmap).
- **Narrative.** ``narrative`` is accepted for protocol parity but
unused: the lifecycle population builder generates its own
firmographics from internal distributions, so the recipe's
``narrative.yaml`` will not drive them until ``LTV-Po`` decides
whether the lifecycle scheme should consume the narrative spec.
"""
from leadforge.core.models import WorldBundle, WorldSpec
from leadforge.core.rng import RNGRoot
from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts
from leadforge.schemes.lifecycle.engine import simulate_lifecycle
from leadforge.schemes.lifecycle.population import build_customer_population

motif_rng = RNGRoot(config.seed).child("lifecycle_motif")
motif_family = _sample_motif_family(motif_rng)

population = build_customer_population(
config.n_customers,
config.seed,
motif_family=motif_family,
observation_date=config.observation_date,
)
simulation_result = simulate_lifecycle(
population,
config.seed,
forward_window_days=max(config.forward_windows_days),
early_tenure_weeks=config.early_tenure_weeks,
)

spec = WorldSpec(config=config, narrative=narrative, scheme=self.name)
return WorldBundle(
spec=spec,
artifacts=LifecycleArtifacts(
population=population,
simulation_result=simulation_result,
motif_family=motif_family,
),
)

def write_bundle(
self,
Expand Down
38 changes: 38 additions & 0 deletions leadforge/schemes/lifecycle/artifacts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""In-memory artifacts produced by the lifecycle (pLTV) pipeline.

:class:`LifecycleArtifacts` is the scheme-owned payload carried by a
:class:`~leadforge.core.models.WorldBundle` for the lifecycle scheme — the
lifecycle analogue of
:class:`~leadforge.schemes.lead_scoring.artifacts.LeadScoringArtifacts`. The
bundle's ``artifacts`` field is typed ``Any`` in the shared core layer (it must
not reference a scheme); this scheme defines and unwraps its own container here.
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from leadforge.schemes.lifecycle.engine import LifecycleSimulationResult
from leadforge.schemes.lifecycle.population import CustomerPopulationResult

__all__ = ["LifecycleArtifacts"]


@dataclass
class LifecycleArtifacts:
"""The in-memory result of one lifecycle generation run.

Attributes:
population: Accounts, customers, and latent state from
:func:`~leadforge.schemes.lifecycle.population.build_customer_population`.
simulation_result: Subscriptions and the three event tables from
:func:`~leadforge.schemes.lifecycle.engine.simulate_lifecycle`.
motif_family: The retention motif family sampled for this world (also
recorded on the population; carried here for convenience).
"""

population: CustomerPopulationResult
simulation_result: LifecycleSimulationResult
motif_family: str
2 changes: 2 additions & 0 deletions leadforge/schemes/lifecycle/render/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""Lifecycle-scheme render modules (relational tables; snapshots live one level
up in ``schemes.lifecycle.snapshots``)."""
85 changes: 85 additions & 0 deletions leadforge/schemes/lifecycle/render/relational.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""Lifecycle relational export — one typed DataFrame per relational table.

:func:`to_dataframes` flattens a lifecycle population + simulation result into
the six relational tables: ``accounts`` (shared firmographics) plus the five
lifecycle entity tables (``customers``, ``subscriptions``,
``subscription_events``, ``health_signals``, ``invoices``). Mirrors the
lead-scoring :func:`~leadforge.schemes.lead_scoring.render.relational.to_dataframes`
pattern: a table-source registry drives a uniform row → DataFrame conversion
with dtypes from each entity's ``DTYPE_MAP``.

The conversion loop is intentionally parallel to the lead-scoring one (only the
table-source registry differs). Unifying the two behind a shared
``render``-layer helper is a candidate for ``LTV-Pn.4d`` (the shared bundle
orchestrator), where the byte-identity surface is already under test.
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING

import pandas as pd

from leadforge.schema.entities import AccountRow, EntityRowProtocol
from leadforge.schemes.lifecycle.entities import (
CustomerLifecycleRow,
HealthSignalRow,
InvoiceRow,
SubscriptionEventRow,
SubscriptionLifecycleRow,
)

if TYPE_CHECKING:
from leadforge.schemes.lifecycle.engine import LifecycleSimulationResult
from leadforge.schemes.lifecycle.population import CustomerPopulationResult

__all__ = ["to_dataframes"]


@dataclass(frozen=True)
class _TableSource:
cls: type[EntityRowProtocol]
origin: str # "population" or "simulation"
attr: str


# Table name → where its rows come from. Insertion order is the table order.
_TABLE_SOURCES: dict[str, _TableSource] = {
AccountRow.TABLE_NAME: _TableSource(AccountRow, "population", "accounts"),
CustomerLifecycleRow.TABLE_NAME: _TableSource(CustomerLifecycleRow, "population", "customers"),
SubscriptionLifecycleRow.TABLE_NAME: _TableSource(
SubscriptionLifecycleRow, "simulation", "subscriptions"
),
SubscriptionEventRow.TABLE_NAME: _TableSource(
SubscriptionEventRow, "simulation", "subscription_events"
),
HealthSignalRow.TABLE_NAME: _TableSource(HealthSignalRow, "simulation", "health_signals"),
InvoiceRow.TABLE_NAME: _TableSource(InvoiceRow, "simulation", "invoices"),
}


def to_dataframes(
result: LifecycleSimulationResult,
population: CustomerPopulationResult,
) -> dict[str, pd.DataFrame]:
"""Convert lifecycle output to one typed DataFrame per relational table.

Returns:
Dict mapping table name → ``pd.DataFrame`` with dtypes from each entity
class's ``DTYPE_MAP``. Empty tables are zero-row DataFrames with the
correct schema.
"""
dfs: dict[str, pd.DataFrame] = {}
for table_name, src in _TABLE_SOURCES.items():
obj = population if src.origin == "population" else result
rows = getattr(obj, src.attr)
if rows:
df = pd.DataFrame([row.to_dict() for row in rows])
for col, dtype in src.cls.DTYPE_MAP.items():
if col in df.columns:
df[col] = df[col].astype(dtype)
else:
df = src.cls.empty_dataframe()
dfs[table_name] = df
return dfs
Loading
Loading