Skip to content

feat: lifecycle config + regression task model [LTV-Pn.3]#124

Merged
shaypal5 merged 2 commits into
mainfrom
feat/lifecycle-config-regression-tasks
Jun 14, 2026
Merged

feat: lifecycle config + regression task model [LTV-Pn.3]#124
shaypal5 merged 2 commits into
mainfrom
feat/lifecycle-config-regression-tasks

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Third sub-PR of the split LTV-Pn (M6): the config + task-model plumbing the lifecycle pipeline needs, with no end-to-end wiring yet (that's Pn.4). Also discharges the long-deferred LTV-Pc regression-task-spec leftover.

GenerationConfig — lifecycle fields (layer: api)

New validated fields, consumed only by the lifecycle scheme (lead-scoring ignores them, just as the lifecycle side will ignore n_leads): n_customers, forward_windows_days (strictly-increasing tuple), early_tenure_weeks, observation_date (ISO string or None). Validation is isolated in _validate_lifecycle_fields. Kept flat on the shared config to match existing precedent (n_leads/snapshot_day); a nested per-scheme config is noted as a possible future refactor.

Lead-scoring data (tables/, tasks/) is byte-identical in both modes; only metadata/world_spec.json changes — by design, since it dumps the full config.

TaskManifest — regression support (layer: schema)

VALID_TASK_TYPES = {binary_classification, regression} with __post_init__ validation; docstrings broadened so label_column / label_window_days read target-agnostically.

Shared split writer (layer: render)

The deterministic shuffle/split/write logic is lifted to scheme-agnostic leadforge/render/tasks.py — target-agnostic (it never inspects the label), so it serves continuous pLTV targets. schemes.lead_scoring.render.tasks is now a thin wrapper defaulting the task — byte-identical tasks/ output. This repopulates the render.tasks path that LTV-Pf.2 vacated, now as shared envelope code (peer to render.manifests / render.relational_io); the module-layout lock test is updated to that intended end state.

Lifecycle task families (schemes/lifecycle/tasks.py)

lifecycle_task_manifests(regime) builds, per regime: three pltv_revenue_{90,365,730}d regression tasks + a churned_within_180d classification task. Calendar regime unprefixed; early-pLTV regime early_-prefixed so both families get distinct task directories. Targets/windows mirror the snapshot catalog so specs and columns can't drift. Completes LTV-Pc.

Tests (53 new; full suite 1835 passed / 51 skipped)

Config validation (bounds, sorted windows, ISO date); TaskManifest regression-type accept + invalid-type reject; shared writer on a continuous target (values preserved, deterministic, manifest emits regression); lifecycle task-family shape / cross-regime id uniqueness / prefixing / target-column match.

Next: Pn.4 — complete LifecycleScheme.build_world/write_bundle, lift the shared bundle orchestrator, first e2e lifecycle bundle.

🤖 Generated with Claude Code

Third sub-PR of the split LTV-Pn. Adds the config + task-model plumbing the
lifecycle pipeline needs, with no end-to-end wiring yet (that is Pn.4). Also
discharges the long-deferred LTV-Pc regression-task-spec leftover.

GenerationConfig (layer: api):
- New validated lifecycle fields, consumed only by the lifecycle scheme (the
  lead-scoring path ignores them, like it ignores n_leads on the lifecycle
  side): n_customers, forward_windows_days (strictly-increasing tuple),
  early_tenure_weeks, observation_date (ISO string or None). Validation lives
  in a focused _validate_lifecycle_fields helper. Kept flat on the shared
  config to match existing precedent (n_leads / snapshot_day); a nested
  per-scheme config is noted as a possible future refactor.
- Lead-scoring DATA (tables/, tasks/) byte-identical both modes; only
  metadata/world_spec.json changes, by design (it dumps the full config).

TaskManifest (layer: schema):
- Added VALID_TASK_TYPES = {binary_classification, regression} with
  __post_init__ validation; broadened docstrings so label_column /
  label_window_days read target-agnostically (continuous targets included).

Shared split writer (layer: render):
- Lifted the deterministic shuffle/split/write logic to scheme-agnostic
  leadforge/render/tasks.py (target-agnostic: never inspects the label, so it
  serves continuous pLTV targets). leadforge.schemes.lead_scoring.render.tasks
  is now a thin wrapper that defaults the task — byte-identical tasks/ output.
- This repopulates the leadforge.render.tasks path that LTV-Pf.2 vacated, now
  as shared envelope code (peer to render.manifests / render.relational_io);
  the module-layout lock test is updated to reflect that intended end state.

Lifecycle task families (schemes/lifecycle/tasks.py):
- lifecycle_task_manifests(regime) builds, per regime, three pltv_revenue_{90,
  365,730}d regression tasks + a churned_within_180d classification task.
  Calendar regime unprefixed; early-pLTV regime early_-prefixed so both
  families occupy distinct task directories. Targets/windows mirror the
  snapshot catalog so specs and columns cannot drift. Completes LTV-Pc.

Tests (53 new): config field validation; TaskManifest regression type +
rejection; shared writer on a continuous target (values preserved,
deterministic); lifecycle task-family shape/uniqueness/prefix/targets. Full
suite 1835 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 13, 2026 07:17
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 13, 2026
@shaypal5 shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output layer: api api/ public Python surface dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 13, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds lifecycle-scheme configuration fields and regression-capable task manifests, plus a shared deterministic task-split writer that both lead-scoring and lifecycle can use (with lifecycle wiring deferred to the next sub-PR).

Changes:

  • Extend GenerationConfig with validated lifecycle fields (n_customers, forward_windows_days, early_tenure_weeks, observation_date).
  • Add task_type validation to TaskManifest and introduce VALID_TASK_TYPES including regression.
  • Lift the deterministic shuffle/split/write logic into leadforge/render/tasks.py and make lead-scoring delegate; add lifecycle task-family definitions and targeted tests/docs updates.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/schemes/test_module_layout.py Updates module-layout lock tests to reflect shared leadforge.render.tasks repopulation.
tests/schemes/lifecycle/test_tasks.py Adds tests for regression task types and lifecycle task-family invariants.
tests/render/test_shared_task_writer.py Adds tests for scheme-agnostic split writer, including continuous target preservation and determinism.
tests/core/test_config_lifecycle_fields.py Adds coverage for new lifecycle config defaults and validation.
leadforge/schemes/lifecycle/tasks.py Defines per-regime lifecycle task manifests (pLTV regression + churn classification).
leadforge/schemes/lead_scoring/render/tasks.py Replaces lead-scoring split writer with a thin wrapper over the shared writer.
leadforge/schema/tasks.py Introduces VALID_TASK_TYPES and validates TaskManifest.task_type; updates docs to be target-agnostic.
leadforge/render/tasks.py New shared deterministic shuffle/split/write implementation for task exports.
leadforge/core/models.py Adds lifecycle config fields and isolated validation in GenerationConfig.
docs/ltv/roadmap.md Marks roadmap items as completed and documents the new lifecycle/task plumbing.
.agent-plan.md Updates the running plan/status text for the LTV workstream.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/ltv/roadmap.md
@@ -81,7 +81,8 @@ Total: ~19 PRs across 9 milestones.
scope (folds into `LTV-Pn`):** regression task specs + a `task_type`
(`regression` | `classification`) on the task model — they belong with the
…Pn.3]

Self-review finding: the new lifecycle GenerationConfig fields
(forward_windows_days, n_customers, early_tenure_weeks, observation_date) are
not consumed by any pipeline yet — wiring is Pn.4 — and the window/tenure
defaults *duplicate* the scheme's canonical constants
(schemes.lifecycle.snapshots.FORWARD_WINDOWS_DAYS / DEFAULT_EARLY_TENURE_WEEKS)
plus the engine's early_tenure default.

The duplication is forced: core.models must not import a scheme (the Pn.2
layering cleanup), so the config copy can't reference the scheme constant. The
risk is silent drift — a default-config bundle would carry windows/tenure that
disagree with the columns the snapshot builder actually produces.

Fix (no behavior change):
- tests/schemes/lifecycle/test_config_consistency.py pins the three copies
  equal (config forward windows == snapshot constant; config early tenure ==
  snapshot constant; engine early-tenure default == snapshot constant). This
  test layer may import both core and the scheme, so it can enforce what the
  layering forbids core from importing.
- Documented on GenerationConfig: the fields are not yet consumed, Pn.4 makes
  the config authoritative, and the duplication is guarded by the test above.

Full suite 1838 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes an unresolved review comment on PR #124 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: docs/ltv/roadmap.md:82
URL: https://github.com/leadforge-dev/leadforge/pull/124#discussion_r3407595317
Root author: copilot-pull-request-reviewer

Comment:
    The roadmap still documents the `task_type` enum as `regression | classification`, but the implemented/validated values are `binary_classification` and `regression` (see `VALID_TASK_TYPES`). Using `classification` here is misleading for contributors and readers.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27463528933 attempt 1
Comment timestamp: 2026-06-13T09:58:01.938335+00:00
PR head commit: 628ced51c20e9f204a7227d9c0f4b063c184c3e1

@shaypal5 shaypal5 merged commit 8296f4e into main Jun 14, 2026
10 checks passed
@shaypal5 shaypal5 deleted the feat/lifecycle-config-regression-tasks branch June 14, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: api api/ public Python surface layer: render render/ bundle and artifact output layer: schema schema/ entity/event contracts type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants