From b77d8650a46eebd46740fcbbd94bc01efdeddbbb Mon Sep 17 00:00:00 2001 From: Shay Palachy Date: Sun, 14 Jun 2026 11:30:32 +0300 Subject: [PATCH 1/2] feat(lifecycle): build_world + relational tables [LTV-Pn.4a] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First sub-PR of the split LTV-Pn.4. Implements the lifecycle scheme's build_world half (the write half is Pn.4b-c). build_world (layer: api): - Deterministically samples a retention motif family from config.seed (a named RNG substream), honouring the "world structure varies via named motif families" hard constraint — different seeds yield structurally different worlds (verified: all 5 families appear across seeds). - Builds the customer population, runs the weekly simulation, and wraps the result in a new LifecycleArtifacts container on the WorldBundle (the scheme-owned payload behind the opaque WorldBundle.artifacts from Pn.2). - Consumes the Pn.3 lifecycle config fields: n_customers, observation_date, early_tenure_weeks, and forward_windows_days (the engine simulates through max(windows) = 730d so every pLTV target is fully covered). This makes the config authoritative, as the Pn.3 doc note promised. - narrative is accepted for protocol parity but unused — the lifecycle population builder generates its own firmographics. Lifecycle relational export (layer: render): - New schemes/lifecycle/render/relational.py to_dataframes: six tables (accounts + customers + subscriptions + subscription_events + health_signals + invoices), mirroring the lead-scoring table-source registry pattern with dtypes from each entity's DTYPE_MAP. write_bundle / write_metadata remain stubbed (Pn.4b-c). Tests (10 new): build_world returns a LifecycleArtifacts bundle; consumes config fields (n_customers, early_tenure, full-window coverage); determinism; cross-seed motif variability; relational table set/shape/typing; FK integrity across LIFECYCLE_CONSTRAINTS; relational determinism; tiny-world empties. Registry stub test updated (build_world live; write path still raises). Full suite 1848 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 --- .agent-plan.md | 10 +- docs/ltv/roadmap.md | 47 ++++-- leadforge/schemes/lifecycle/__init__.py | 78 ++++++++-- leadforge/schemes/lifecycle/artifacts.py | 38 +++++ .../schemes/lifecycle/render/__init__.py | 2 + .../schemes/lifecycle/render/relational.py | 80 +++++++++++ tests/schemes/lifecycle/test_build_world.py | 134 ++++++++++++++++++ tests/schemes/test_registry.py | 10 +- 8 files changed, 366 insertions(+), 33 deletions(-) create mode 100644 leadforge/schemes/lifecycle/artifacts.py create mode 100644 leadforge/schemes/lifecycle/render/__init__.py create mode 100644 leadforge/schemes/lifecycle/render/relational.py create mode 100644 tests/schemes/lifecycle/test_build_world.py diff --git a/.agent-plan.md b/.agent-plan.md index 3fc7c97..cc610a6 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -79,9 +79,13 @@ Any`; `apply_exposure` dispatches hidden truth to a lead-scoring byte-identical both modes) opened as **#122** (merged). `LTV-Pn.3` (lifecycle `GenerationConfig` fields + validated regression `TaskManifest` `task_type` + shared `render/tasks.py` split writer + `schemes/lifecycle/tasks.py` task families; discharges the -`LTV-Pc` regression-task-spec leftover) opened as **#124**. Next: `Pn.4` -(complete `LifecycleScheme.build_world`/`write_bundle` + shared bundle -orchestrator + e2e bundle), `LTV-Po` (recipe). +`LTV-Pc` regression-task-spec leftover) opened as **#124** (merged). `LTV-Pn.4` split into four (build → write → public-safety +→ orchestrator): `LTV-Pn.4a` (`LifecycleScheme.build_world` — deterministic +motif sampling + population + sim + `LifecycleArtifacts`; lifecycle relational +`to_dataframes`; consumes the Pn.3 config fields) opened as **#125**. Next: +`Pn.4b` (instructor `write_bundle` + tasks), `Pn.4c` (student_public +snapshot-safety + CLAUDE.md), `Pn.4d` (shared bundle orchestrator), `LTV-Po` +(recipe). --- diff --git a/docs/ltv/roadmap.md b/docs/ltv/roadmap.md index 56356ed..acb2d87 100644 --- a/docs/ltv/roadmap.md +++ b/docs/ltv/roadmap.md @@ -46,7 +46,7 @@ protocol + registry, with the package physically reorganized into | `LTV-M3` | Customer population + lifecycle world | `LTV-Ph`, `LTV-Pi` | #113 (Ph) | | `LTV-M4` | Lifecycle simulation engine | `LTV-Pj`, `LTV-Pk` | #117 (Pj), #118 (Pk) | | `LTV-M5` | Customer snapshots + pLTV targets (both regimes) | `LTV-Pl`, `LTV-Pm` | #119 (Pl), #120 (Pm) | -| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3) | +| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a) | | `LTV-M7` | Validation + regression-metric calibration | `LTV-Pp` | | | `LTV-M8` | CLI, notebooks, publish | `LTV-Pq`, `LTV-Pr`, `LTV-Ps` | | @@ -312,19 +312,40 @@ pipeline + schema bump). Split into four sub-PRs in dependency order: Pn.4. Lead-scoring data byte-identical (only `world_spec.json` gains the new config fields, by design). - Labels: `type: feature`, `layer: api`, `layer: schema`, `layer: render` -- [ ] **`LTV-Pn.4`** — `feat(lifecycle): complete LifecycleScheme + e2e bundle`. - Implement `LifecycleScheme.build_world` (population → sim) and `write_bundle` - (lifecycle relational tables; both regime snapshots → two task families × - 3 windows + secondary churn; dataset card; manifest `observation_date` + - windows via `extra_fields`; lifecycle `write_metadata` hidden-truth hook). - With both schemes' `write_bundle` in hand, **lift the shared bundle - orchestrator with scheme render hooks** out of the two implementations - (carried cleanup #1). First end-to-end lifecycle bundle (programmatic; - recipe wiring is `LTV-Po`). Extend `CLAUDE.md` hard constraints with the - lifecycle snapshot-safety clause + the `schemes/` layout. Carries the - LTV-Pp validation flags: early-regime degenerate-column exemptions; the - dtype-preserving missingness opt-in. +`LTV-Pn.4` is the largest sub-PR — split along the two `GenerationScheme` +methods, then public-safety, then the carried orchestrator cleanup: + +- [ ] **`LTV-Pn.4a`** — `feat(lifecycle): build_world + relational tables`. + `LifecycleScheme.build_world`: deterministically sample a motif family from + the seed (honouring the "vary by motif family" invariant), build the customer + population, simulate, and wrap a new `LifecycleArtifacts` on the bundle. New + `schemes/lifecycle/render/relational.py` `to_dataframes` (accounts + the 5 + lifecycle tables). Consumes the Pn.3 config fields (`n_customers`, + `forward_windows_days`, `early_tenure_weeks`, `observation_date`). + `write_bundle` still stubbed. + - Tests: determinism, cross-seed motif variability, FK integrity, table shapes. - Labels: `type: feature`, `layer: api`, `layer: render` +- [ ] **`LTV-Pn.4b`** — `feat(lifecycle): write_bundle (instructor) + tasks`. + Instructor-mode `write_bundle`: relational tables; both regime snapshots → + 8 task dirs (3 pLTV regression + churn, × 2 regimes) via the shared writer; + dataset card; feature dictionary; manifest with `generation_scheme` + + `observation_date` + windows (`extra_fields`); lifecycle `write_metadata` + hidden-truth hook (latent registry + mechanism summary). First on-disk + lifecycle bundle. + - Labels: `type: feature`, `layer: api`, `layer: render` +- [ ] **`LTV-Pn.4c`** — `feat(lifecycle): student_public snapshot-safety`. + Public relational filtering (event tables ≤ cutoff; drop terminal + `churn_at`/`churn_reason`/`subscription_end_at`; no target columns); the + early-regime degenerate-column + dtype-preserving-missingness flags from + LTV-Pm. Extend `CLAUDE.md` hard constraints with the lifecycle + snapshot-safety clause + the `schemes/` layout. + - Labels: `type: feature`, `layer: exposure`, `layer: render`, `layer: docs` +- [ ] **`LTV-Pn.4d`** — `refactor: shared bundle orchestrator`. With both + schemes' `write_bundle` in hand, lift the shared orchestrator (mkdir → + relational → tasks → card → dict → exposure → manifest) with scheme render + hooks out of the two implementations (carried cleanup #1). Both bundles + byte-identical. + - Labels: `type: refactor`, `layer: render`, `layer: api` - [ ] **`LTV-Po`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three recipe YAMLs (`scheme: lifecycle`); register in the recipe registry; end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` smoke test. diff --git a/leadforge/schemes/lifecycle/__init__.py b/leadforge/schemes/lifecycle/__init__.py index 5caea51..b914f88 100644 --- a/leadforge/schemes/lifecycle/__init__.py +++ b/leadforge/schemes/lifecycle/__init__.py @@ -1,15 +1,15 @@ -"""The ``lifecycle`` generation scheme (``b2b_saas_ltv_v1``) — scaffold. - -Registers the second peer scheme alongside ``lead_scoring``. Its entity rows -and FK constraints live here (``entities`` / ``relationships``); the pipeline -itself (``build_world`` / ``write_bundle``) is built out across LTV-M3…M6 and -currently raises :class:`NotImplementedError`. Registering the stub now lets -the registry, recipe ``scheme:`` resolution, and tests treat lifecycle as a -first-class peer before its internals exist. +"""The ``lifecycle`` generation scheme (``b2b_saas_ltv_v1``). + +The second peer scheme alongside ``lead_scoring``. Its entity rows and FK +constraints live here (``entities`` / ``relationships``); the snapshot, feature, +and task definitions live in sibling modules. :meth:`LifecycleScheme.build_world` +is implemented (LTV-Pn.4a); :meth:`write_bundle` / :meth:`write_metadata` are +built out in LTV-Pn.4b–c and currently raise :class:`NotImplementedError`. """ from __future__ import annotations +import random from typing import TYPE_CHECKING, Any from leadforge.schemes.base import register_scheme @@ -21,13 +21,26 @@ from leadforge.narrative.spec import NarrativeSpec _NOT_IMPLEMENTED = ( - "the lifecycle (b2b_saas_ltv_v1) scheme is not implemented yet; " - "its pipeline is built across LTV-M3…M6" + "the lifecycle (b2b_saas_ltv_v1) write path is not implemented yet; " + "it is built across LTV-Pn.4b–c" ) +def _sample_motif_family(rng: random.Random) -> str: + """Deterministically pick a retention motif family for this world. + + Sampling per-seed (rather than hard-coding one family) honours the + "world structure varies via named motif families" invariant — different + seeds yield structurally different worlds. + """ + from leadforge.schemes.lifecycle.population import LIFECYCLE_MOTIF_FAMILIES + + # Sort for a stable, order-independent candidate list before sampling. + return rng.choice(sorted(LIFECYCLE_MOTIF_FAMILIES)) + + class LifecycleScheme: - """Stub for the customer-lifetime-value (pLTV) generation pipeline.""" + """The customer-lifetime-value (pLTV) generation pipeline.""" name = "lifecycle" @@ -37,7 +50,48 @@ def build_world( narrative: NarrativeSpec, **options: Any, ) -> WorldBundle: - raise NotImplementedError(_NOT_IMPLEMENTED) + """Sample a motif family, build the customer population, and simulate. + + Deterministic given ``config`` (the population and the per-customer + weekly simulation derive from ``config.seed`` via distinct RNG + substreams). Consumes the lifecycle config fields: ``n_customers``, + ``observation_date``, ``early_tenure_weeks``, and + ``forward_windows_days`` (the engine simulates through the longest + window so every pLTV target is fully covered). ``narrative`` is + accepted for protocol parity but unused — the lifecycle population + builder generates its own firmographics. + """ + from leadforge.core.models import WorldBundle, WorldSpec + from leadforge.core.rng import RNGRoot + from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts + from leadforge.schemes.lifecycle.engine import simulate_lifecycle + from leadforge.schemes.lifecycle.population import build_customer_population + + motif_rng = RNGRoot(config.seed).child("lifecycle_motif") + motif_family = _sample_motif_family(motif_rng) + + population = build_customer_population( + config.n_customers, + config.seed, + motif_family=motif_family, + observation_date=config.observation_date, + ) + simulation_result = simulate_lifecycle( + population, + config.seed, + forward_window_days=max(config.forward_windows_days), + early_tenure_weeks=config.early_tenure_weeks, + ) + + spec = WorldSpec(config=config, narrative=narrative, scheme=self.name) + return WorldBundle( + spec=spec, + artifacts=LifecycleArtifacts( + population=population, + simulation_result=simulation_result, + motif_family=motif_family, + ), + ) def write_bundle( self, diff --git a/leadforge/schemes/lifecycle/artifacts.py b/leadforge/schemes/lifecycle/artifacts.py new file mode 100644 index 0000000..036425b --- /dev/null +++ b/leadforge/schemes/lifecycle/artifacts.py @@ -0,0 +1,38 @@ +"""In-memory artifacts produced by the lifecycle (pLTV) pipeline. + +:class:`LifecycleArtifacts` is the scheme-owned payload carried by a +:class:`~leadforge.core.models.WorldBundle` for the lifecycle scheme — the +lifecycle analogue of +:class:`~leadforge.schemes.lead_scoring.artifacts.LeadScoringArtifacts`. The +bundle's ``artifacts`` field is typed ``Any`` in the shared core layer (it must +not reference a scheme); this scheme defines and unwraps its own container here. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from leadforge.schemes.lifecycle.engine import LifecycleSimulationResult + from leadforge.schemes.lifecycle.population import CustomerPopulationResult + +__all__ = ["LifecycleArtifacts"] + + +@dataclass +class LifecycleArtifacts: + """The in-memory result of one lifecycle generation run. + + Attributes: + population: Accounts, customers, and latent state from + :func:`~leadforge.schemes.lifecycle.population.build_customer_population`. + simulation_result: Subscriptions and the three event tables from + :func:`~leadforge.schemes.lifecycle.engine.simulate_lifecycle`. + motif_family: The retention motif family sampled for this world (also + recorded on the population; carried here for convenience). + """ + + population: CustomerPopulationResult + simulation_result: LifecycleSimulationResult + motif_family: str diff --git a/leadforge/schemes/lifecycle/render/__init__.py b/leadforge/schemes/lifecycle/render/__init__.py new file mode 100644 index 0000000..b6d056e --- /dev/null +++ b/leadforge/schemes/lifecycle/render/__init__.py @@ -0,0 +1,2 @@ +"""Lifecycle-scheme render modules (relational tables; snapshots live one level +up in ``schemes.lifecycle.snapshots``).""" diff --git a/leadforge/schemes/lifecycle/render/relational.py b/leadforge/schemes/lifecycle/render/relational.py new file mode 100644 index 0000000..95a5f76 --- /dev/null +++ b/leadforge/schemes/lifecycle/render/relational.py @@ -0,0 +1,80 @@ +"""Lifecycle relational export — one typed DataFrame per relational table. + +:func:`to_dataframes` flattens a lifecycle population + simulation result into +the six relational tables: ``accounts`` (shared firmographics) plus the five +lifecycle entity tables (``customers``, ``subscriptions``, +``subscription_events``, ``health_signals``, ``invoices``). Mirrors the +lead-scoring :func:`~leadforge.schemes.lead_scoring.render.relational.to_dataframes` +pattern: a table-source registry drives a uniform row → DataFrame conversion +with dtypes from each entity's ``DTYPE_MAP``. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import TYPE_CHECKING + +import pandas as pd + +from leadforge.schema.entities import AccountRow, EntityRowProtocol +from leadforge.schemes.lifecycle.entities import ( + CustomerLifecycleRow, + HealthSignalRow, + InvoiceRow, + SubscriptionEventRow, + SubscriptionLifecycleRow, +) + +if TYPE_CHECKING: + from leadforge.schemes.lifecycle.engine import LifecycleSimulationResult + from leadforge.schemes.lifecycle.population import CustomerPopulationResult + +__all__ = ["to_dataframes"] + + +@dataclass(frozen=True) +class _TableSource: + cls: type[EntityRowProtocol] + origin: str # "population" or "simulation" + attr: str + + +# Table name → where its rows come from. Insertion order is the table order. +_TABLE_SOURCES: dict[str, _TableSource] = { + AccountRow.TABLE_NAME: _TableSource(AccountRow, "population", "accounts"), + CustomerLifecycleRow.TABLE_NAME: _TableSource(CustomerLifecycleRow, "population", "customers"), + SubscriptionLifecycleRow.TABLE_NAME: _TableSource( + SubscriptionLifecycleRow, "simulation", "subscriptions" + ), + SubscriptionEventRow.TABLE_NAME: _TableSource( + SubscriptionEventRow, "simulation", "subscription_events" + ), + HealthSignalRow.TABLE_NAME: _TableSource(HealthSignalRow, "simulation", "health_signals"), + InvoiceRow.TABLE_NAME: _TableSource(InvoiceRow, "simulation", "invoices"), +} + + +def to_dataframes( + result: LifecycleSimulationResult, + population: CustomerPopulationResult, +) -> dict[str, pd.DataFrame]: + """Convert lifecycle output to one typed DataFrame per relational table. + + Returns: + Dict mapping table name → ``pd.DataFrame`` with dtypes from each entity + class's ``DTYPE_MAP``. Empty tables are zero-row DataFrames with the + correct schema. + """ + dfs: dict[str, pd.DataFrame] = {} + for table_name, src in _TABLE_SOURCES.items(): + obj = population if src.origin == "population" else result + rows = getattr(obj, src.attr) + if rows: + df = pd.DataFrame([row.to_dict() for row in rows]) + for col, dtype in src.cls.DTYPE_MAP.items(): + if col in df.columns: + df[col] = df[col].astype(dtype) + else: + df = src.cls.empty_dataframe() + dfs[table_name] = df + return dfs diff --git a/tests/schemes/lifecycle/test_build_world.py b/tests/schemes/lifecycle/test_build_world.py new file mode 100644 index 0000000..f74db5f --- /dev/null +++ b/tests/schemes/lifecycle/test_build_world.py @@ -0,0 +1,134 @@ +"""Tests for LifecycleScheme.build_world + relational export (LTV-Pn.4a).""" + +from __future__ import annotations + +import pytest + +from leadforge.core.models import GenerationConfig, WorldBundle +from leadforge.schemes import get_scheme +from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts +from leadforge.schemes.lifecycle.population import LIFECYCLE_MOTIF_FAMILIES +from leadforge.schemes.lifecycle.relationships import LIFECYCLE_CONSTRAINTS +from leadforge.schemes.lifecycle.render.relational import to_dataframes + +_N = 120 + + +def _build(seed: int = 42, n_customers: int = _N) -> WorldBundle: + return get_scheme("lifecycle").build_world( + GenerationConfig(seed=seed, n_customers=n_customers), narrative=None + ) + + +# --------------------------------------------------------------------------- +# build_world +# --------------------------------------------------------------------------- + + +def test_returns_bundle_with_lifecycle_artifacts() -> None: + bundle = _build() + assert isinstance(bundle, WorldBundle) + assert bundle.spec.scheme == "lifecycle" + assert isinstance(bundle.artifacts, LifecycleArtifacts) + arts = bundle.artifacts + assert len(arts.population.customers) == _N + assert len(arts.simulation_result.subscriptions) == _N + assert arts.motif_family in LIFECYCLE_MOTIF_FAMILIES + # The sampled motif is recorded on the population too (engine reads it there). + assert arts.population.motif_family == arts.motif_family + + +def test_consumes_config_lifecycle_fields() -> None: + bundle = get_scheme("lifecycle").build_world( + GenerationConfig(seed=1, n_customers=50, early_tenure_weeks=6), narrative=None + ) + arts = bundle.artifacts + assert len(arts.population.customers) == 50 + # forward_window_days = max(forward_windows_days) = 730 → full coverage. + assert arts.simulation_result.forward_window_days == 730 + assert arts.simulation_result.early_tenure_weeks == 6 + + +def test_deterministic_given_seed() -> None: + a = _build(seed=7) + b = _build(seed=7) + assert a.artifacts.motif_family == b.artifacts.motif_family + assert [s.to_dict() for s in a.artifacts.simulation_result.subscriptions] == [ + s.to_dict() for s in b.artifacts.simulation_result.subscriptions + ] + + +def test_motif_varies_across_seeds() -> None: + motifs = {_build(seed=s, n_customers=60).artifacts.motif_family for s in range(15)} + # The invariant we care about: not a single fixed motif across seeds. + assert len(motifs) >= 3 + assert motifs <= set(LIFECYCLE_MOTIF_FAMILIES) + + +def test_narrative_is_optional() -> None: + # The lifecycle population builder generates its own firmographics; build_world + # accepts narrative for protocol parity but must not require it. + bundle = get_scheme("lifecycle").build_world(GenerationConfig(seed=3), narrative=None) + assert isinstance(bundle.artifacts, LifecycleArtifacts) + + +# --------------------------------------------------------------------------- +# relational export +# --------------------------------------------------------------------------- + + +def test_relational_tables_present_and_typed() -> None: + arts = _build().artifacts + dfs = to_dataframes(arts.simulation_result, arts.population) + assert set(dfs) == { + "accounts", + "customers", + "subscriptions", + "subscription_events", + "health_signals", + "invoices", + } + assert len(dfs["customers"]) == _N + assert len(dfs["subscriptions"]) == _N + assert len(dfs["accounts"]) == len(arts.population.accounts) + # Non-empty event tables for a 120-customer world. + assert len(dfs["health_signals"]) > 0 + assert len(dfs["invoices"]) > 0 + + +def test_relational_fk_integrity() -> None: + arts = _build().artifacts + dfs = to_dataframes(arts.simulation_result, arts.population) + for fk in LIFECYCLE_CONSTRAINTS: + child = dfs[fk.child_table] + parent = dfs[fk.parent_table] + if child.empty: + continue + parent_keys = set(parent[fk.parent_column]) + orphans = set(child[fk.child_column]) - parent_keys + assert not orphans, f"{fk.child_table}.{fk.child_column} has orphans: {list(orphans)[:3]}" + + +def test_relational_deterministic() -> None: + a_arts = _build(seed=9).artifacts + a = to_dataframes(a_arts.simulation_result, a_arts.population) + b_arts = _build(seed=9).artifacts + b = to_dataframes(b_arts.simulation_result, b_arts.population) + for name in a: + assert a[name].equals(b[name]), name + + +def test_empty_population_yields_typed_empty_tables() -> None: + # A degenerate-but-valid tiny world still produces correctly-typed tables. + arts = _build(seed=2, n_customers=1).artifacts + dfs = to_dataframes(arts.simulation_result, arts.population) + assert len(dfs["customers"]) == 1 + for name, df in dfs.items(): + assert list(df.columns), f"{name} has no columns" + + +def test_write_bundle_still_stubbed(tmp_path) -> None: + # build_world works; the on-disk write path lands in Pn.4b. + bundle = _build() + with pytest.raises(NotImplementedError): + get_scheme("lifecycle").write_bundle(bundle, str(tmp_path)) diff --git a/tests/schemes/test_registry.py b/tests/schemes/test_registry.py index 5664fae..eb2d869 100644 --- a/tests/schemes/test_registry.py +++ b/tests/schemes/test_registry.py @@ -35,14 +35,14 @@ def test_lifecycle_scheme_registered() -> None: assert LIFECYCLE_SCHEME.name == "lifecycle" -def test_lifecycle_scheme_is_a_stub() -> None: - # Pipeline not implemented yet (built across LTV-M3…M6); calling it must - # fail loudly rather than silently doing nothing. +def test_lifecycle_write_path_is_stubbed() -> None: + # build_world is implemented (LTV-Pn.4a); the on-disk write path lands in + # Pn.4b–c and must fail loudly until then rather than silently no-op. sch = get_scheme("lifecycle") - with pytest.raises(NotImplementedError): - sch.build_world(None, None) # type: ignore[arg-type] with pytest.raises(NotImplementedError): sch.write_bundle(None, "out") # type: ignore[arg-type] + with pytest.raises(NotImplementedError): + sch.write_metadata(None, None) # type: ignore[arg-type] def test_lead_scoring_scheme_name() -> None: From 56d302b2f964b212ef92e3a39896f596ee1f0dc9 Mon Sep 17 00:00:00 2001 From: Shay Palachy Date: Sun, 14 Jun 2026 12:13:16 +0300 Subject: [PATCH 2/2] docs(lifecycle): disclose build_world's deferred inputs [LTV-Pn.4a] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-review of build_world: it silently dropped several config inputs. The deferrals are legitimate, but silence is the footgun — surface them at the call site and pin them to concrete PRs. 1. Difficulty (the real one). build_world does not consume config.difficulty / difficulty_params — so every difficulty tier currently yields the SAME lifecycle world, and when Pn.4b builds snapshots it would pass difficulty_params=None (no distortions ever fire). Documented in the build_world docstring; roadmap Pn.4b now explicitly owns resolving difficulty_params and threading it into build_customer_snapshot; simulation-level difficulty scaling is recorded as a flagged deferral near Po/Pp. New test_difficulty_not_yet_differentiating pins the gap as a tested fact (intro == advanced today) — it must be flipped when Pn.4b closes it. 2. Narrative ignored. The lifecycle population hardcodes firmographics and build_world ignores `narrative`, so the recipe's narrative.yaml will not drive them. Documented; roadmap Po now carries the decide-narrative- consumption task. 3. to_dataframes near-duplicates the lead-scoring conversion loop. Noted the deliberate parallel + earmarked unification behind a shared render helper for Pn.4d (the orchestrator PR, where the byte-identity surface is tested). Docs/tests only — build_world behavior unchanged. Full suite 1849 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 --- docs/ltv/roadmap.md | 12 +++++++++++- leadforge/schemes/lifecycle/__init__.py | 19 ++++++++++++++++--- .../schemes/lifecycle/render/relational.py | 5 +++++ tests/schemes/lifecycle/test_build_world.py | 18 ++++++++++++++++++ 4 files changed, 50 insertions(+), 4 deletions(-) diff --git a/docs/ltv/roadmap.md b/docs/ltv/roadmap.md index acb2d87..085b28c 100644 --- a/docs/ltv/roadmap.md +++ b/docs/ltv/roadmap.md @@ -331,7 +331,9 @@ methods, then public-safety, then the carried orchestrator cleanup: dataset card; feature dictionary; manifest with `generation_scheme` + `observation_date` + windows (`extra_fields`); lifecycle `write_metadata` hidden-truth hook (latent registry + mechanism summary). First on-disk - lifecycle bundle. + lifecycle bundle. **Must resolve `difficulty_params` from the active profile + and thread it into `build_customer_snapshot` (Pn.4a's `build_world` does not — + without this the snapshot distortions never fire and every tier is identical).** - Labels: `type: feature`, `layer: api`, `layer: render` - [ ] **`LTV-Pn.4c`** — `feat(lifecycle): student_public snapshot-safety`. Public relational filtering (event tables ≤ cutoff; drop terminal @@ -349,9 +351,17 @@ methods, then public-safety, then the carried orchestrator cleanup: - [ ] **`LTV-Po`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets`. The three recipe YAMLs (`scheme: lifecycle`); register in the recipe registry; end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` smoke test. + **Decide narrative consumption:** the lifecycle population hardcodes its + firmographics and `build_world` ignores `narrative` (Pn.4a) — either wire the + recipe's `narrative.yaml` into the population builder or document the + firmographics as scheme-internal. - Tests: recipe loads, full round-trip, determinism, all task splits (3 windows × 2 regimes + secondary churn), public/instructor split. - Labels: `type: feature`, `layer: recipes` +- **Deferred (flagged in Pn.4a):** simulation-level difficulty scaling for the + lifecycle engine — making `advanced` a genuinely harder world (not just + noisier snapshots). Currently the motif-calibrated rates are difficulty- + agnostic; revisit alongside `LTV-Pp` difficulty-band validation. --- diff --git a/leadforge/schemes/lifecycle/__init__.py b/leadforge/schemes/lifecycle/__init__.py index b914f88..ebbbaf8 100644 --- a/leadforge/schemes/lifecycle/__init__.py +++ b/leadforge/schemes/lifecycle/__init__.py @@ -57,9 +57,22 @@ def build_world( substreams). Consumes the lifecycle config fields: ``n_customers``, ``observation_date``, ``early_tenure_weeks``, and ``forward_windows_days`` (the engine simulates through the longest - window so every pLTV target is fully covered). ``narrative`` is - accepted for protocol parity but unused — the lifecycle population - builder generates its own firmographics. + window so every pLTV target is fully covered). + + Not yet applied (tracked, not silent): + + - **Difficulty.** ``config.difficulty`` / ``difficulty_params`` are + NOT consumed here, so every difficulty tier currently yields the same + world. Two distinct pieces remain: resolving ``difficulty_params`` + from the active profile and threading it into the snapshot + distortions (``LTV-Pn.4b``, where snapshots are built), and + simulation-level difficulty scaling that actually makes harder tiers + harder worlds (deferred — see ``mechanisms.py`` and the roadmap). + - **Narrative.** ``narrative`` is accepted for protocol parity but + unused: the lifecycle population builder generates its own + firmographics from internal distributions, so the recipe's + ``narrative.yaml`` will not drive them until ``LTV-Po`` decides + whether the lifecycle scheme should consume the narrative spec. """ from leadforge.core.models import WorldBundle, WorldSpec from leadforge.core.rng import RNGRoot diff --git a/leadforge/schemes/lifecycle/render/relational.py b/leadforge/schemes/lifecycle/render/relational.py index 95a5f76..3ce2d69 100644 --- a/leadforge/schemes/lifecycle/render/relational.py +++ b/leadforge/schemes/lifecycle/render/relational.py @@ -7,6 +7,11 @@ lead-scoring :func:`~leadforge.schemes.lead_scoring.render.relational.to_dataframes` pattern: a table-source registry drives a uniform row → DataFrame conversion with dtypes from each entity's ``DTYPE_MAP``. + +The conversion loop is intentionally parallel to the lead-scoring one (only the +table-source registry differs). Unifying the two behind a shared +``render``-layer helper is a candidate for ``LTV-Pn.4d`` (the shared bundle +orchestrator), where the byte-identity surface is already under test. """ from __future__ import annotations diff --git a/tests/schemes/lifecycle/test_build_world.py b/tests/schemes/lifecycle/test_build_world.py index f74db5f..bd1b754 100644 --- a/tests/schemes/lifecycle/test_build_world.py +++ b/tests/schemes/lifecycle/test_build_world.py @@ -65,6 +65,24 @@ def test_motif_varies_across_seeds() -> None: assert motifs <= set(LIFECYCLE_MOTIF_FAMILIES) +def test_difficulty_not_yet_differentiating() -> None: + """Tracked-gap guard (LTV-Pn.4a): build_world does not yet consume + config.difficulty, so every tier yields the same world. When Pn.4b wires + difficulty in, this test must be updated to assert the tiers DIFFER — + flipping it is the reminder that the gap is closed. + """ + intro = get_scheme("lifecycle").build_world( + GenerationConfig(seed=5, n_customers=60, difficulty="intro"), narrative=None + ) + advanced = get_scheme("lifecycle").build_world( + GenerationConfig(seed=5, n_customers=60, difficulty="advanced"), narrative=None + ) + assert intro.artifacts.motif_family == advanced.artifacts.motif_family + assert [s.to_dict() for s in intro.artifacts.simulation_result.subscriptions] == [ + s.to_dict() for s in advanced.artifacts.simulation_result.subscriptions + ] + + def test_narrative_is_optional() -> None: # The lifecycle population builder generates its own firmographics; build_world # accepts narrative for protocol parity but must not require it.