Skip to content

feat(lifecycle): student_public snapshot-safety [LTV-Pn.4c]#127

Merged
shaypal5 merged 2 commits into
mainfrom
feat/lifecycle-public-snapshot-safety
Jun 15, 2026
Merged

feat(lifecycle): student_public snapshot-safety [LTV-Pn.4c]#127
shaypal5 merged 2 commits into
mainfrom
feat/lifecycle-public-snapshot-safety

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Third sub-PR of the split LTV-Pn.4 — the snapshot-safe public export for the lifecycle scheme. student_public lifecycle bundles can no longer leak the pLTV / churn targets through the relational tables. Removes the student_public guard added in Pn.4b.

schemes/lifecycle/render/relational_snapshot_safe.py

to_dataframes_snapshot_safe(dfs, cutoff):

  • event tables (subscription_events / health_signals / invoices) row-filtered to timestamp <= observation_date;
  • subscriptions drops its stateful/terminal columns, keeping only the at-signing identity (subscription_id, customer_id, plan_name, subscription_start_at, contract_term_months);
  • accounts / customers pass through.

Spec correction worth flagging: design.md §5 named only the three terminal fields (churn_at / churn_reason / subscription_end_at), but the four stateful columns (subscription_status, current_mrr, renewal_count, expansion_count) also hold end-of-simulation values that leak the targets — status reveals churn, current_mrr reflects post-cutoff expansion, the counts reveal future renewals/expansions. So LIFECYCLE_BANNED_SUBSCRIPTION_COLUMNS (in validation.leakage_probes) extends the spec to all seven.

Wiring

  • write_bundle drops the public guard and projects the relational tables snapshot-safe for modes whose BundleFilter.relational_snapshot_safe is set; records relational_snapshot_safe + structural_redactions in the manifest.
  • build_manifest gains a pass-through structural_redactions param — the last lead-scoring coupling in the manifest builder (it previously hardcoded the lead-scoring banned constants). Default None preserves lead-scoring behaviour; lifecycle passes its own. Lead-scoring bundles verified byte-identical in both modes, and the v6 contract test still passes.
  • CLAUDE.md gains the lifecycle snapshot-safety hard-constraint clause.

Public task safety already holds from Pn.4b (per-task single-target splits + cutoff-bounded features); the mrr_change_full_period trap is retained in all modes.

Tests (23 new; full suite 1872 passed / 51 skipped)

Event tables <= cutoff; subscriptions column set; no target column in any public relational table; no metadata/; manifest flags + structural_redactions; public-task single-target + trap; public-bundle SHA-256 determinism; instructor mode unaffected; to_dataframes_snapshot_safe unit behaviour (passthrough, empty-cutoff reject, no input mutation). Obsolete Pn.4a/4b stub tests updated/removed.

Caveat (documented)

The relational cutoff is the calendar regime's observation_date. The early-pLTV (tenure-anchored) task's snapshot-safe data is its own task parquet; relational-table feature engineering aligns with the calendar regime.

Next: Pn.4d — lift the shared bundle orchestrator from both schemes (carried cleanup #1).

🤖 Generated with Claude Code

Third sub-PR of the split LTV-Pn.4. Implements the snapshot-safe public export
for the lifecycle scheme, so student_public bundles can no longer leak the
pLTV / churn targets via the relational tables.

New schemes/lifecycle/render/relational_snapshot_safe.py:
- to_dataframes_snapshot_safe(dfs, cutoff): event tables
  (subscription_events / health_signals / invoices) row-filtered to
  timestamp <= observation_date; subscriptions drops its stateful/terminal
  columns; accounts/customers pass through.
- design §5 named only the THREE terminal fields (churn_at / churn_reason /
  subscription_end_at), but the FOUR stateful columns (subscription_status,
  current_mrr, renewal_count, expansion_count) also hold end-of-sim values
  that leak the targets (status reveals churn; current_mrr reflects post-cutoff
  expansion; the counts reveal future renewals/expansions).  The banned set
  (validation.leakage_probes.LIFECYCLE_BANNED_SUBSCRIPTION_COLUMNS) therefore
  extends the spec; only the at-signing identity is retained.

write_bundle: drops the student_public guard and wires the projection for
snapshot-safe modes; records relational_snapshot_safe + structural_redactions
in the manifest.

build_manifest: gains a pass-through structural_redactions param (the last
lead-scoring coupling in the manifest builder — it previously hardcoded the
lead-scoring banned constants).  Default None keeps lead-scoring behaviour;
lifecycle passes its own.  Lead-scoring bundles verified byte-identical (both
modes) and the v6 contract test still passes.

CLAUDE.md: new lifecycle snapshot-safety hard-constraint clause.

Public task safety: the per-task single-target splits + cutoff-bounded
features from LTV-Pn.4b already satisfy it; the trap is retained in all modes.

Tests (23 new): event tables <= cutoff; subscriptions column set; no target
columns in any public relational table; no metadata/; manifest flags +
structural_redactions; public-task single-target + trap; public determinism;
instructor unaffected; snapshot-safe unit behaviour (passthrough, empty-cutoff
reject, no input mutation).  Obsolete Pn.4a/4b stub tests updated/removed.
Full suite 1872 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 14, 2026 13:12
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 14, 2026
@shaypal5 shaypal5 added type: feature New capability layer: render render/ bundle and artifact output layer: exposure exposure/ truth-mode filtering dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 14, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements snapshot-safe student_public bundle export for the lifecycle scheme by projecting relational tables to a cutoff-bounded, non-leaky shape and recording the applied redactions in the manifest.

Changes:

  • Add lifecycle-specific to_dataframes_snapshot_safe(...) to filter event tables to <= observation_date and drop stateful/terminal subscriptions columns.
  • Wire snapshot-safe relational projection + structural_redactions passthrough into LifecycleScheme.write_bundle and build_manifest.
  • Add/adjust lifecycle tests to assert public relational snapshot-safety, determinism, and that instructor mode remains unaffected.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/schemes/lifecycle/test_write_bundle.py Updates public-mode behavior test to expect a successful write with no metadata/.
tests/schemes/lifecycle/test_public_snapshot_safety.py New tests asserting lifecycle public relational snapshot-safety, manifest flags, and determinism.
tests/schemes/lifecycle/test_build_world.py Removes obsolete write-bundle stubbing test.
leadforge/validation/leakage_probes.py Adds lifecycle snapshot-safety contract constants (banned subscription columns + filtered tables).
leadforge/schemes/lifecycle/render/relational_snapshot_safe.py New lifecycle snapshot-safe relational projection implementation.
leadforge/schemes/lifecycle/__init__.py Wires snapshot-safe relational projection + manifest recording into lifecycle write_bundle.
leadforge/render/manifests.py Adds structural_redactions passthrough parameter (scheme-supplied override).
docs/ltv/roadmap.md Marks Pn.4c complete and documents the implemented snapshot-safety scope.
CLAUDE.md Adds lifecycle-specific hard constraint for public snapshot-safe relational exports.
.agent-plan.md Updates plan status to reflect Pn.4c PR opened and what it delivers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def test_public_tasks_still_single_target_and_keep_trap(tmp_path) -> None:
out, _ = _public_bundle(tmp_path)
targets = {"ltv_revenue_90d", "ltv_revenue_365d", "ltv_revenue_730d", "churned_within_180d"}
….4c]

Self-review found a severe leak in the public export: the early-pLTV
(tenure-anchored) task target is reconstructible from the public relational
tables.  Verified empirically — early ltv_revenue_90d was reconstructed EXACTLY
for 52/60 customers by joining the public invoices (<= observation_date) onto
customer_start_at and summing the (early_cutoff, early_cutoff+90d] window.

Root cause: the public relational tables are cut at observation_date (calendar
regime), but the early regime's forward window precedes observation_date — the
invoices between the early cutoff and observation_date ARE the early target
window.  A single observation_date-anchored relational export cannot serve both
regimes.

Fix: omit the early-pLTV task family from snapshot-safe (student_public)
bundles; it ships in research_instructor (full truth) only.  The calendar
family is published — its targets fall after observation_date and so are absent
from the public relational tables (verified by a reconstruction probe: 0
calendar targets recoverable).  Public bundles now carry the 4 calendar task
dirs; instructor keeps all 8.

Docs: relational_snapshot_safe docstring + CLAUDE.md clause now state the
early-omitted-in-public rule and its reason; roadmap records the design
decision and the tension with D8 (first-class early-pLTV), flagged for
LTV-Po / a design-doc update if public early-pLTV is wanted (would need
per-regime relational exports).

Tests: test_public_omits_early_pltv_family; a reconstruction-probe leakage
test (published calendar target NOT recoverable from public relational);
instructor-keeps-both-regimes (8 tasks). Full suite 1874 passed / 51 skipped;
ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes an unresolved review comment on PR #127 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: tests/schemes/lifecycle/test_public_snapshot_safety.py:99
URL: https://github.com/leadforge-dev/leadforge/pull/127#discussion_r3409590208
Root author: copilot-pull-request-reviewer

Comment:
    This line will violate the repo’s Ruff line-length=100 setting (pyproject.toml [tool.ruff].line-length=100), causing an E501 failure in CI. Please wrap the set literal across multiple lines (similar to banned_targets above).

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27541170253 attempt 1
Comment timestamp: 2026-06-15T10:50:29.523488+00:00
PR head commit: 3fbced2c677a90d76df020896e5009da1c642d03

@shaypal5 shaypal5 merged commit e61589a into main Jun 15, 2026
10 checks passed
@shaypal5 shaypal5 deleted the feat/lifecycle-public-snapshot-safety branch June 15, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: exposure exposure/ truth-mode filtering layer: render render/ bundle and artifact output type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants