feat(lifecycle): instructor write_bundle + tasks [LTV-Pn.4b] by shaypal5 · Pull Request #126 · leadforge-dev/leadforge

shaypal5 · 2026-06-14T09:41:45Z

Summary

Second sub-PR of the split LTV-Pn.4 — the lifecycle scheme's instructor-mode write path, producing the first end-to-end lifecycle bundle on disk.

gen = get_scheme("lifecycle")
bundle = gen.build_world(config, narrative=None)   # Pn.4a
gen.write_bundle(bundle, "./out/ltv_bundle")        # this PR

`write_bundle` (research_instructor)

Relational — to_dataframes → write_relational_tables (6 tables).
Tasks — both regime snapshots (calendar + early-pLTV) split into 8 task dirs via the shared render.tasks writer: per regime, 3 pltv_revenue_{90,365,730}d regression tasks + 1 churned_within_180d classification; early regime early_-prefixed.
Dataset card — new schemes/lifecycle/render/dataset_card.py. The lead-scoring card is hard-coupled to the binary-conversion framing, so lifecycle renders its own (pLTV regression framing, the two regimes, the 8-task table, the mrr_change_full_period trap note).
Feature dictionary over CUSTOMER_SNAPSHOT_FEATURES.
Manifest — generation_scheme=lifecycle, motif_family, and extra_fields = {observation_date, forward_windows_days, early_tenure_weeks}.
write_metadata hook — new render/metadata.py serialises the per-entity latent registry + the motif's mechanism parameters (unwrapping the MappingProxyType weight maps); no hidden graph (lifecycle has none).

Difficulty (the Pn.4a pin)

config.difficulty_params is threaded into both snapshot builders — proven by a test where strong distortion knobs perturb the task features while targets stay untouched. This discharges the threading half of the Pn.4a obligation; recipe-driven resolution of difficulty_params lands in LTV-Po.

Safety

student_public is refused (raises, pointing at LTV-Pn.4c) rather than emit a bundle that isn't yet snapshot-safe — a public lifecycle bundle can never accidentally ship before 4c adds the event-cutoff filtering, terminal-column drops, and per-task target projection.

Tests (11 new; full suite 1860 passed / 51 skipped)

Required files; 6 tables; 8 task dirs (splits + manifest); task-type correctness; manifest scheme + lifecycle fields; metadata files (no graph); full-bundle SHA-256 determinism; difficulty threading; student_public refusal; unpopulated / wrong-scheme bundle rejection. Obsolete stub tests updated.

Flagged (out of scope)

validation.bundle_checks.validate_bundle is lead-scoring-coupled (applies lead-scoring FK/table/task checks) and errors on a lifecycle bundle. Scheme-aware validation is LTV-Pp; this PR validates the bundle via the build/write tests, not the CLI validator.

Next: Pn.4c — student_public snapshot-safety + CLAUDE.md hard-constraints clause.

🤖 Generated with Claude Code

Second sub-PR of the split LTV-Pn.4. Implements the lifecycle scheme's instructor-mode write path — the first end-to-end lifecycle bundle on disk. write_bundle (research_instructor): - Relational: to_dataframes → write_relational_tables (6 tables). - Tasks: both observation-regime snapshots (calendar + early-pLTV) split into 8 task dirs via the shared render.tasks writer — per regime, 3 pLTV regression tasks (pltv_revenue_{90,365,730}d) + 1 churn classification (churned_within_180d); the early regime prefixed early_. - Dataset card: new schemes/lifecycle/render/dataset_card.py (the lead-scoring card is hard-coupled to the conversion framing; lifecycle renders its own — pLTV regression framing, two regimes, the 8-task table, the mrr_change_full_period trap note). - Feature dictionary over CUSTOMER_SNAPSHOT_FEATURES. - Manifest: build_manifest with generation_scheme=lifecycle, motif_family, and extra_fields = {observation_date, forward_windows_days, early_tenure_weeks}. - write_metadata hook: schemes/lifecycle/render/metadata.py serialises the per-entity latent registry + the motif's mechanism parameters (unwrapping the MappingProxyType weight maps); no hidden graph (lifecycle has none). Difficulty: config.difficulty_params is threaded into both snapshot builders (tested — strong knobs perturb task features, targets untouched). This discharges the threading half of the LTV-Pn.4a pin; recipe-driven *resolution* of difficulty_params is LTV-Po. Safety: student_public is refused (raises NotImplementedError pointing at LTV-Pn.4c) rather than emit a bundle that is not yet snapshot-safe. Tests (11 new): required bundle files; 6 tables; 8 task dirs with train/valid/test + manifest; task-type correctness; manifest scheme + lifecycle fields; metadata files (no graph); full-bundle determinism (SHA-256); difficulty threading; student_public refusal; unpopulated / wrong-scheme bundle rejection. Obsolete stub tests updated. Full suite 1860 passed / 51 skipped; ruff + mypy clean. Known (flagged): validate_bundle is lead-scoring-coupled and errors on a lifecycle bundle — scheme-aware validation is LTV-Pp. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Implements the lifecycle scheme’s instructor-mode on-disk publication path (write_bundle + write_metadata) so lifecycle worlds can be serialized as complete bundles (tables, tasks, manifest, dataset card, feature dictionary, and instructor-only hidden-truth metadata).

Changes:

Add LifecycleScheme.write_bundle / write_metadata implementation for research_instructor, including 6 relational tables and 8 task directories across the two regimes.
Introduce lifecycle-specific renderers for dataset card + hidden-truth metadata serialization.
Add/adjust tests to validate bundle shape, determinism, metadata outputs, exposure guardrails, and rejection of invalid bundles; update roadmap/agent plan docs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/schemes/test_scheme_metadata_hook.py	Updates lifecycle metadata-hook test to assert proper rejection behavior.
tests/schemes/test_registry.py	Updates lifecycle registry test to assert write-path rejects bad/unpopulated bundles.
tests/schemes/lifecycle/test_write_bundle.py	New end-to-end lifecycle bundle write tests (shape, manifest, metadata, determinism, guards).
leadforge/schemes/lifecycle/render/metadata.py	New lifecycle hidden-truth JSON serialization helpers.
leadforge/schemes/lifecycle/render/dataset_card.py	New lifecycle-specific dataset card renderer.
leadforge/schemes/lifecycle/init.py	Implements lifecycle `write_bundle` and `write_metadata`.
docs/ltv/roadmap.md	Marks Pn.4b complete and documents what landed.
.agent-plan.md	Updates agent plan status/context for the Pn.4 split.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…LTV-Pn.4b] Self-review of the write path found a real task-construction bug: every one of the 8 task dirs received the FULL snapshot, so each task's parquet carried all four targets (ltv_revenue_{90,365,730}d + churned_within_180d). To predict ltv_revenue_90d a consumer could read ltv_revenue_365d (corr ~0.88) and 730d — its own supersets — straight off the row. X = df.drop(columns=[label]) trains on the answer. This is not the intended mrr_change_full_period trap; it is wrong in instructor mode too (these splits are meant to be standalone ML-ready), so it belongs here, not deferred to the public-mode PR. Fix: before splitting, project each task's snapshot to drop every OTHER target column, keeping features + the task's own target + the deliberate mrr_change_full_period trap (leakage_risk but not a target). Each task dir is now a true single-target dataset. Test: test_each_task_split_has_only_its_own_target asserts each task parquet contains exactly its own target among the target columns, and that the trap survives. Full suite 1861 passed / 51 skipped; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…V-Pn.4b] Addresses the two Copilot review threads on #126 and the CI lint failure. COPILOT-1 (manifest forward_windows_days vs. actual exported windows): The manifest recorded config.forward_windows_days, but the snapshots/tasks export the fixed snapshots.FORWARD_WINDOWS_DAYS (config is not threaded into the snapshot builder yet). An override would make the manifest disagree with the task dirs, and a shorter override would under-simulate and fail opaquely in the snapshot builder. Fixes both: - The manifest now records FORWARD_WINDOWS_DAYS (the source of truth for what is actually exported), so it is always accurate. - build_world validates config.forward_windows_days == FORWARD_WINDOWS_DAYS and raises InvalidConfigError early with a clear message, converting the opaque late failure into an explicit one. Default configs are unaffected (the Pn.3 consistency test already pins them equal). - New test_rejects_unsupported_forward_windows_override. COPILOT-2: removed the duplicate "Labels" bullet in the LTV-Pn.4b roadmap entry (copy/paste artifact). CI Lint & format: ruff-format leadforge/schemes/lifecycle/__init__.py (the leak-fix edit had dropped a blank line; ruff check passed but format did not). Full suite 1862 passed / 51 skipped; ruff check + format + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-14T11:24:06Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #126 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27497270635 attempt 1
Comment timestamp: 2026-06-14T11:23:18.263840+00:00
PR head commit: 0a591abe9c0a06637ffaaca7c9314322c71ed016

Copilot AI review requested due to automatic review settings June 14, 2026 09:41

shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 14, 2026

shaypal5 added type: feature New capability layer: render render/ bundle and artifact output layer: api api/ public Python surface dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 14, 2026

Copilot started reviewing on behalf of shaypal5 June 14, 2026 09:42 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Jun 14, 2026

View reviewed changes

Comment thread leadforge/schemes/lifecycle/__init__.py

Comment thread docs/ltv/roadmap.md Outdated

This comment has been minimized.

Sign in to view

shaypal5 merged commit ca42ecc into main Jun 14, 2026
11 of 12 checks passed

shaypal5 deleted the feat/lifecycle-write-bundle branch June 14, 2026 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lifecycle): instructor write_bundle + tasks [LTV-Pn.4b]#126

feat(lifecycle): instructor write_bundle + tasks [LTV-Pn.4b]#126
shaypal5 merged 3 commits into
mainfrom
feat/lifecycle-write-bundle

shaypal5 commented Jun 14, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Jun 14, 2026

Summary

write_bundle (research_instructor)

Difficulty (the Pn.4a pin)

Safety

Tests (11 new; full suite 1860 passed / 51 skipped)

Flagged (out of scope)

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`write_bundle` (research_instructor)