feat: lifecycle config + regression task model [LTV-Pn.3]#124
Merged
Conversation
Third sub-PR of the split LTV-Pn. Adds the config + task-model plumbing the
lifecycle pipeline needs, with no end-to-end wiring yet (that is Pn.4). Also
discharges the long-deferred LTV-Pc regression-task-spec leftover.
GenerationConfig (layer: api):
- New validated lifecycle fields, consumed only by the lifecycle scheme (the
lead-scoring path ignores them, like it ignores n_leads on the lifecycle
side): n_customers, forward_windows_days (strictly-increasing tuple),
early_tenure_weeks, observation_date (ISO string or None). Validation lives
in a focused _validate_lifecycle_fields helper. Kept flat on the shared
config to match existing precedent (n_leads / snapshot_day); a nested
per-scheme config is noted as a possible future refactor.
- Lead-scoring DATA (tables/, tasks/) byte-identical both modes; only
metadata/world_spec.json changes, by design (it dumps the full config).
TaskManifest (layer: schema):
- Added VALID_TASK_TYPES = {binary_classification, regression} with
__post_init__ validation; broadened docstrings so label_column /
label_window_days read target-agnostically (continuous targets included).
Shared split writer (layer: render):
- Lifted the deterministic shuffle/split/write logic to scheme-agnostic
leadforge/render/tasks.py (target-agnostic: never inspects the label, so it
serves continuous pLTV targets). leadforge.schemes.lead_scoring.render.tasks
is now a thin wrapper that defaults the task — byte-identical tasks/ output.
- This repopulates the leadforge.render.tasks path that LTV-Pf.2 vacated, now
as shared envelope code (peer to render.manifests / render.relational_io);
the module-layout lock test is updated to reflect that intended end state.
Lifecycle task families (schemes/lifecycle/tasks.py):
- lifecycle_task_manifests(regime) builds, per regime, three pltv_revenue_{90,
365,730}d regression tasks + a churned_within_180d classification task.
Calendar regime unprefixed; early-pLTV regime early_-prefixed so both
families occupy distinct task directories. Targets/windows mirror the
snapshot catalog so specs and columns cannot drift. Completes LTV-Pc.
Tests (53 new): config field validation; TaskManifest regression type +
rejection; shared writer on a continuous target (values preserved,
deterministic); lifecycle task-family shape/uniqueness/prefix/targets. Full
suite 1835 passed / 51 skipped; ruff + mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds lifecycle-scheme configuration fields and regression-capable task manifests, plus a shared deterministic task-split writer that both lead-scoring and lifecycle can use (with lifecycle wiring deferred to the next sub-PR).
Changes:
- Extend
GenerationConfigwith validated lifecycle fields (n_customers,forward_windows_days,early_tenure_weeks,observation_date). - Add
task_typevalidation toTaskManifestand introduceVALID_TASK_TYPESincludingregression. - Lift the deterministic shuffle/split/write logic into
leadforge/render/tasks.pyand make lead-scoring delegate; add lifecycle task-family definitions and targeted tests/docs updates.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/schemes/test_module_layout.py | Updates module-layout lock tests to reflect shared leadforge.render.tasks repopulation. |
| tests/schemes/lifecycle/test_tasks.py | Adds tests for regression task types and lifecycle task-family invariants. |
| tests/render/test_shared_task_writer.py | Adds tests for scheme-agnostic split writer, including continuous target preservation and determinism. |
| tests/core/test_config_lifecycle_fields.py | Adds coverage for new lifecycle config defaults and validation. |
| leadforge/schemes/lifecycle/tasks.py | Defines per-regime lifecycle task manifests (pLTV regression + churn classification). |
| leadforge/schemes/lead_scoring/render/tasks.py | Replaces lead-scoring split writer with a thin wrapper over the shared writer. |
| leadforge/schema/tasks.py | Introduces VALID_TASK_TYPES and validates TaskManifest.task_type; updates docs to be target-agnostic. |
| leadforge/render/tasks.py | New shared deterministic shuffle/split/write implementation for task exports. |
| leadforge/core/models.py | Adds lifecycle config fields and isolated validation in GenerationConfig. |
| docs/ltv/roadmap.md | Marks roadmap items as completed and documents the new lifecycle/task plumbing. |
| .agent-plan.md | Updates the running plan/status text for the LTV workstream. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -81,7 +81,8 @@ Total: ~19 PRs across 9 milestones. | |||
| scope (folds into `LTV-Pn`):** regression task specs + a `task_type` | |||
| (`regression` | `classification`) on the task model — they belong with the | |||
…Pn.3] Self-review finding: the new lifecycle GenerationConfig fields (forward_windows_days, n_customers, early_tenure_weeks, observation_date) are not consumed by any pipeline yet — wiring is Pn.4 — and the window/tenure defaults *duplicate* the scheme's canonical constants (schemes.lifecycle.snapshots.FORWARD_WINDOWS_DAYS / DEFAULT_EARLY_TENURE_WEEKS) plus the engine's early_tenure default. The duplication is forced: core.models must not import a scheme (the Pn.2 layering cleanup), so the config copy can't reference the scheme constant. The risk is silent drift — a default-config bundle would carry windows/tenure that disagree with the columns the snapshot builder actually produces. Fix (no behavior change): - tests/schemes/lifecycle/test_config_consistency.py pins the three copies equal (config forward windows == snapshot constant; config early tenure == snapshot constant; engine early-tenure default == snapshot constant). This test layer may import both core and the scheme, so it can enforce what the layering forbids core from importing. - Documented on GenerationConfig: the fields are not yet consumed, Pn.4 makes the config authoritative, and the duplication is guarded by the test above. Full suite 1838 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
pr-agent-context report: This run includes an unresolved review comment on PR #124 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: docs/ltv/roadmap.md:82
URL: https://github.com/leadforge-dev/leadforge/pull/124#discussion_r3407595317
Root author: copilot-pull-request-reviewer
Comment:
The roadmap still documents the `task_type` enum as `regression | classification`, but the implemented/validated values are `binary_classification` and `regression` (see `VALID_TASK_TYPES`). Using `classification` here is misleading for contributors and readers.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third sub-PR of the split
LTV-Pn(M6): the config + task-model plumbing the lifecycle pipeline needs, with no end-to-end wiring yet (that's Pn.4). Also discharges the long-deferredLTV-Pcregression-task-spec leftover.GenerationConfig— lifecycle fields (layer: api)New validated fields, consumed only by the lifecycle scheme (lead-scoring ignores them, just as the lifecycle side will ignore
n_leads):n_customers,forward_windows_days(strictly-increasing tuple),early_tenure_weeks,observation_date(ISO string orNone). Validation is isolated in_validate_lifecycle_fields. Kept flat on the shared config to match existing precedent (n_leads/snapshot_day); a nested per-scheme config is noted as a possible future refactor.Lead-scoring data (
tables/,tasks/) is byte-identical in both modes; onlymetadata/world_spec.jsonchanges — by design, since it dumps the full config.TaskManifest— regression support (layer: schema)VALID_TASK_TYPES = {binary_classification, regression}with__post_init__validation; docstrings broadened solabel_column/label_window_daysread target-agnostically.Shared split writer (
layer: render)The deterministic shuffle/split/write logic is lifted to scheme-agnostic
leadforge/render/tasks.py— target-agnostic (it never inspects the label), so it serves continuous pLTV targets.schemes.lead_scoring.render.tasksis now a thin wrapper defaulting the task — byte-identicaltasks/output. This repopulates therender.taskspath that LTV-Pf.2 vacated, now as shared envelope code (peer torender.manifests/render.relational_io); the module-layout lock test is updated to that intended end state.Lifecycle task families (
schemes/lifecycle/tasks.py)lifecycle_task_manifests(regime)builds, per regime: threepltv_revenue_{90,365,730}dregression tasks + achurned_within_180dclassification task. Calendar regime unprefixed; early-pLTV regimeearly_-prefixed so both families get distinct task directories. Targets/windows mirror the snapshot catalog so specs and columns can't drift. CompletesLTV-Pc.Tests (53 new; full suite 1835 passed / 51 skipped)
Config validation (bounds, sorted windows, ISO date);
TaskManifestregression-type accept + invalid-type reject; shared writer on a continuous target (values preserved, deterministic, manifest emitsregression); lifecycle task-family shape / cross-regime id uniqueness / prefixing / target-column match.Next: Pn.4 — complete
LifecycleScheme.build_world/write_bundle, lift the shared bundle orchestrator, first e2e lifecycle bundle.🤖 Generated with Claude Code