Skip to content

docs: add mandatory branch/PR workflow rules#1

Merged
shaypal5 merged 4 commits into
mainfrom
docs/add-branch-pr-workflow
Apr 18, 2026
Merged

docs: add mandatory branch/PR workflow rules#1
shaypal5 merged 4 commits into
mainfrom
docs/add-branch-pr-workflow

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

  • Adds a Branch & PR Workflow section to CLAUDE.md as the first section (before all other rules), locking the rule that no work goes directly to main
  • Installs .git/hooks/pre-push to mechanically block any push targeting main, regardless of who or what triggers it
  • Updates .agent-plan.md to reflect that these workflow rules are now locked

Workflow rules locked

  1. git checkout main && git pull before starting any work
  2. git checkout -b <descriptive-branch-name> — always branch from latest main
  3. Commit work to the branch
  4. Update .agent-plan.md to reflect state after the PR merges; commit to same branch
  5. Open a PR on GitHub with a detailed description — never push to main directly

Enforcement layers

Layer Mechanism
Primary CLAUDE.md — read by Claude at session start
Secondary Agent memory (feedback entry)
Mechanical .git/hooks/pre-push — blocks push to main at git level

🤖 Generated with Claude Code

Locks the rule that no work goes directly to main. Every unit of work
must branch from latest main, commit there, update .agent-plan.md in
the same PR, and open a GitHub PR. Also installs a git pre-push hook
(.git/hooks/pre-push) that mechanically blocks any push targeting main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds explicit “Branch & PR Workflow” rules to the repo’s agent-facing documentation and updates the agent plan to reflect the new workflow constraints.

Changes:

  • Adds a mandatory Branch & PR Workflow section at the top of CLAUDE.md.
  • Updates .agent-plan.md current-state text to mention the workflow rules and their enforcement.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
CLAUDE.md Documents mandatory branching/PR workflow rules, including “never push to main”.
.agent-plan.md Updates current system state to reference the new workflow rules and claimed enforcement mechanism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .agent-plan.md Outdated
Comment thread CLAUDE.md Outdated
shaypal5 and others added 2 commits April 18, 2026 09:30
Extends the branch/PR workflow in CLAUDE.md with two required steps:
apply labels (type + layer taxonomy) and assign to a milestone. Adds
the full label taxonomy and a milestone map table to CLAUDE.md.

GitHub side: creates 22 labels across type/layer/status groups and
6 milestones (v0.1.0 through v1.0.0) matching the roadmap release gates.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Integrates shaypal5/pr-agent-context@v4 with two workflows:

- .github/workflows/pr-agent-context.yml
  Runs on pull_request (opened/synchronize/reopened). Calls the
  reusable workflow in ci mode with publish_mode=append and
  include_outdated_review_threads=true so stale-but-unresolved
  threads always appear in the generated agent context comment.
  Coverage artifacts are collected under the pr-agent-context-coverage
  prefix for use by the refresh flow via cross-run lookup.

- .github/workflows/pr-agent-context-refresh.yml
  Triggered by pull_request_review, pull_request_review_comment, and
  completed non-Actions check_run events. Runs in refresh mode with
  publish_mode=append (new comment per refresh, previous managed
  comments hidden), wait_for_reviews_to_settle=true, and
  enable_cross_run_coverage_lookup=true to reuse coverage artifacts
  from the CI run on the same head SHA.

Both workflows explicitly set include_outdated_review_threads=true
per project configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 06:41
@shaypal5 shaypal5 added the type: ci CI/CD pipeline changes label Apr 18, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/pr-agent-context.yml
Comment thread .github/workflows/pr-agent-context-refresh.yml
@github-actions

This comment has been minimized.

- CLAUDE.md: scope the force-push prohibition to `main` explicitly
  (`git push --force origin main`) and add a note clarifying that the
  local pre-push hook is a personal convenience, not a versioned
  repo-wide enforcement; GitHub branch protection is the team-level guard
- .agent-plan.md: correct state description to say "local convenience hook
  (not versioned) and GitHub branch protection" instead of implying the
  hook is a reliable repo-wide mechanism

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #1.

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: .agent-plan.md
URL: https://github.com/leadforge-dev/leadforge/pull/1#discussion_r3104702550
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The statement that the workflow is “enforced via `.git/hooks/pre-push`” is misleading/inaccurate in a repo context: `.git/hooks` isn’t versioned/cloned for other developers, and this PR doesn’t add any tracked hook script that would be installed automatically. Consider replacing this with a versioned hooks directory (and documented `core.hooksPath` setup), or describing enforcement via GitHub branch protection/CI checks instead of `.git/hooks`.

## COPILOT-2
Location: CLAUDE.md
URL: https://github.com/leadforge-dev/leadforge/pull/1#discussion_r3104702559
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The examples list `git push --force` without specifying `main`, but the sentence is about commands that target `main` directly. As written, it can be read as banning force-pushes to any branch; if the intent is only to block pushes to `main`, consider tightening the wording/examples to explicitly include `main` (or split into two rules: no pushes to `main`, and separate guidance on force-push usage).
    ~~~suggestion
    Never use `git push origin main`, `git push --force origin main`, or any variant that targets `main` directly.
    ~~~

## COPILOT-3
Location: .github/workflows/pr-agent-context.yml:14
URL: https://github.com/leadforge-dev/leadforge/pull/1#discussion_r3104738699
Root author: copilot-pull-request-reviewer

Comment:
    This workflow calls a third-party reusable workflow pinned only to the `v4` tag. For supply-chain security and reproducibility, pin `uses:` to an immutable commit SHA (and optionally keep a comment noting the corresponding release tag).
    ~~~suggestion
        uses: shaypal5/pr-agent-context/.github/workflows/pr-agent-context.yml@<FULL_40_CHAR_COMMIT_SHA_FOR_V4> # v4
    ~~~

## COPILOT-4
Location: .github/workflows/pr-agent-context-refresh.yml:39
URL: https://github.com/leadforge-dev/leadforge/pull/1#discussion_r3104738714
Root author: copilot-pull-request-reviewer

Comment:
    This workflow calls a third-party reusable workflow pinned only to the `v4` tag. For supply-chain security and reproducibility, pin `uses:` to an immutable commit SHA (and optionally keep a comment noting the corresponding release tag).
    ~~~suggestion
        uses: shaypal5/pr-agent-context/.github/workflows/pr-agent-context.yml@<FULL_40_CHAR_COMMIT_SHA_FOR_V4> # v4
    ~~~

Run metadata:

Tool ref: v4
Tool version: 4.0.18
Trigger: commit pushed
Workflow run: 24599246501 attempt 1
Comment timestamp: 2026-04-18T06:52:54.524480+00:00
PR head commit: 270394b4d26bf28533e751aa2aa093c1991ff912

@shaypal5 shaypal5 merged commit 878dd58 into main Apr 18, 2026
1 check passed
@shaypal5 shaypal5 deleted the docs/add-branch-pr-workflow branch April 18, 2026 06:56
shaypal5 added a commit that referenced this pull request May 1, 2026
Review feedback addressed:

- Remove primary_task/label_window_days as explicit kwargs from
  resolve_config() and Generator.from_recipe() — these fields are
  resolved from recipe YAML and override dict only, not casually
  overridable, since the generation pipeline doesn't yet support
  arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2)
- Add label_window_days <= horizon_days validation in
  GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3)
- Add tests for invalid primary_task on GenerationConfig: empty
  string, non-string type (shaypal5 #6, pr-agent-context)
- Add tests for invalid label_window_days on Recipe.from_dict: bool,
  non-positive, float (shaypal5 #7, pr-agent-context)
- Add test for label_window_days > horizon_days rejection
- Fix existing test using horizon_days=30 (now conflicts with default
  label_window_days=90)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 1, 2026
* feat: carry primary_task and label_window_days into WorldSpec for dataset card

Add `primary_task` and `label_window_days` fields to `GenerationConfig`
(with defaults preserving current behavior). Propagate through
`Recipe.from_dict()`, `resolve_config()`, and `Generator.from_recipe()`
so recipe YAML can override them. Update `render_dataset_card()` to read
from `world_spec.config` instead of hard-coded string literals.

Closes #6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update .agent-plan.md for WorldSpec task fields (PR #36)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback — tighten scope, add validation + tests

Review feedback addressed:

- Remove primary_task/label_window_days as explicit kwargs from
  resolve_config() and Generator.from_recipe() — these fields are
  resolved from recipe YAML and override dict only, not casually
  overridable, since the generation pipeline doesn't yet support
  arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2)
- Add label_window_days <= horizon_days validation in
  GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3)
- Add tests for invalid primary_task on GenerationConfig: empty
  string, non-string type (shaypal5 #6, pr-agent-context)
- Add tests for invalid label_window_days on Recipe.from_dict: bool,
  non-positive, float (shaypal5 #7, pr-agent-context)
- Add test for label_window_days > horizon_days rejection
- Fix existing test using horizon_days=30 (now conflicts with default
  label_window_days=90)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 4, 2026
* fix: redact current_stage from student_public bundles

Add an `is_leakage_trap` flag to FeatureSpec and an exposure-layer
column-redaction pass that drops `leakage_risk and not is_leakage_trap`
columns from student_public snapshots, task splits, feature dictionary,
and dataset card. `current_stage` (true label leak) is now removed in
student_public; `total_touches_all` (intentional pedagogical trap) is
preserved via the new flag.

- `schema/features.py`: add `is_leakage_trap` field; mark
  `total_touches_all`; export `STUDENT_PUBLIC_REDACTED_COLUMNS`.
- `schema/dictionaries.py`: emit `is_leakage_trap` column in
  `feature_dictionary.csv`.
- `exposure/filters.py`: `BundleFilter.redacted_columns`; populated for
  student_public.
- `api/bundle.py`: drop redacted columns from snapshot before splits;
  thread the visible feature tuple to dataset card and dictionary writers.
- `narrative/dataset_card.py`: accept a `features` arg so categories /
  leakage section reflect what is actually published.
- `validation/bundle_checks.py`: new `_check_exposure_redaction()`
  enforces CLAUDE.md invariant #1 — redacted columns must not appear in
  the published task splits or dictionary.
- `validation/invariants.py`: relax exposure-monotonicity from byte
  identity to subset-with-shared-content for `feature_dictionary.csv`
  and task splits, so redaction doesn't trip the check.

Tests: extend `tests/schema/test_features.py` with five new assertions
covering the trap flag, the redaction set, and `current_stage`'s role.

All 880 existing tests still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update release docs and add end-to-end redaction tests

- `tests/exposure/test_redaction.py` (19 tests): bundle-level proof that
  `current_stage` is absent from every student_public task split, that
  `total_touches_all` (the pedagogical trap) is preserved, that the
  research_instructor bundle still contains both, that the student
  feature dictionary differs from the instructor's by exactly the
  redacted set, that shared columns hold identical values across modes,
  and that `validate_bundle()` flags a tampered student_public bundle
  whose task split still carries `current_stage`.
- `scripts/build_public_release.py`: drop `_FLAT_CSV_DROP_COLS` — the
  bundle writer now redacts at source, so the flat CSV inherits it.
- `release/README.md`, `release/HF_DATASET_CARD.md`: update column
  counts (33 features + 1 trap + 1 target in student_public; 34 + 1 + 1
  in the instructor companion) and rewrite the leakage section to
  distinguish stripped columns from the deliberate trap.
- `.agent-plan.md`: flip "Known issue: current_stage leakage" to
  resolved with a link to the redaction mechanism.

Bundles regenerated; all four pass `leadforge validate`;
`scripts/verify_hash_determinism.py` confirms 73/73 files identical
across two consecutive builds. Full test suite: 899 passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor: collapse two-flag redaction into prescriptive redact_in_modes

Addresses senior-dev review of the previous two commits:

1. **Single source of truth.** Replaced the awkward
   `leakage_risk + is_leakage_trap` flag pair with a single prescriptive
   field `FeatureSpec.redact_in_modes: frozenset[ExposureMode]`.
   `leakage_risk` reverts to purely descriptive ("post-snapshot
   correlated"). `current_stage` carries `redact_in_modes={student_public}`;
   `total_touches_all` keeps `leakage_risk=True` with empty `redact_in_modes`
   (deliberate trap). New `redacted_columns_for(mode, features=...)`
   parameterizable function replaces the frozen module-level set.

2. **Validation independent of the writer.** `_check_exposure_redaction`
   now derives the expected redaction set directly from `LEAD_SNAPSHOT_FEATURES`
   via `redacted_columns_for`, *not* from `BundleFilter`. A bug in the
   filter no longer agrees-with-itself past the validator. Added a real
   regression test that writes a real student_public bundle, mutates a
   parquet to reinsert `current_stage`, and asserts validation fails.

3. **Self-describing manifest.** `manifest.json` now records
   `redacted_columns: [...]`. The validator cross-checks the manifest's
   declared set against the feature-spec-derived expected set; a
   second new test mutates the manifest to claim nothing was redacted
   and asserts validation flags the disagreement.

4. **Stricter exposure monotonicity.** `check_exposure_monotonicity`
   now asserts that `instructor_columns - student_columns` *equals*
   `redacted_columns_for(student_public)` — not just "is a superset".
   A future column drop in student_public outside the redaction set is
   now caught.

5. **Reverted feature_dictionary.csv schema.** The previous commit added
   an `is_leakage_trap` column to the published CSV without flagging the
   schema change. Removed: the redaction policy is package-internal and
   the bundle's actual published schema is observable from the parquet
   files and `manifest.redacted_columns`. CSV stays back-compat with
   prior releases (6 columns: name, dtype, description, category,
   is_target, leakage_risk).

6. **De-duplicated documentation.** `release/README.md` no longer
   maintains a per-category feature count table by hand — that data is
   in the auto-generated `dataset_card.md`. Removed the duplicated
   table; added explicit caveats covering known structural-leakage
   issues (event aggregates over the label window, `is_mql` zero
   variance, `is_sql=False` near-deterministic for non-conversion).

7. **Documented the structural follow-up.** `.agent-plan.md` now
   includes a clearly-scoped "Follow-up: structural leakage" section
   covering: (a) windowed-snapshot fix for event aggregates;
   (b) measured `is_sql` leakage P(conv|is_sql=False) = 0.038/0.015/0.006
   across tiers; (c) `is_mql` is constant True (zero variance). Suggests
   filing a tracked issue (deferred — needs user confirmation per
   shared-state-action rules).

Removed: `BundleFilter.redacted_columns` (replaced by direct calls to
`redacted_columns_for(mode)` from the bundle writer).

Tests: schema tests rewritten for `redact_in_modes` semantics; redaction
tests updated for the new function-based API; +2 manifest-redaction
tests; +1 real regression test (mutated parquet) replacing the previous
manifest-string-edit test. 902 tests passing (was 899).

Hash determinism verified: 73/73 files identical across two consecutive
builds. All four release bundles regenerated; `feature_dictionary.csv`
is now back to its pre-PR 6-column schema.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 6, 2026
Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 6, 2026
* PR 5.2: HuggingFace release packager + load_dataset smoke test

Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md`
with G12.1-compliant YAML frontmatter (pretty_name, license, language,
task_categories, size_categories, tags, three configs with `default: true`
on intermediate per G12.2), inlining the rewritten `release/README.md`
body with HF-specific link rewrites.  `--variant=instructor` packages the
companion repo (G12.4) from `release/intermediate_instructor/` into a
separate `release/huggingface-instructor/` upload tree.  G12.3 covered
by a parametrised `load_dataset()` smoke test gated on the optional
`datasets` SDK.

Extract shared release-packaging primitives (link rewriter, dir-safety
guard, cover-image validator) into `scripts/_release_common.py`; refactor
the Kaggle packager to import them.  `release/kaggle/dataset-metadata.json`
is byte-stable across the refactor.

Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the
generated card.  Gitignore `release/huggingface{,-instructor}/*` except
the committed README.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 self-review fixes (Kaggle + HF packagers)

Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 Copilot-review fixes (Kaggle + HF packagers)

Fold Copilot's two real findings on the self-review revision back in.

COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips.  A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired.  Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers.  New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.

COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback.  Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path).  Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly.  New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.

COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2.  Resolved as already treated; no code change.

Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 13, 2026
…n.2] (#122)

* refactor: scheme-agnostic WorldBundle + exposure metadata hook [LTV-Pn.2]

Second sub-PR of the split LTV-Pn. Removes the last core/shared-layer
references to the lead-scoring scheme (carried cleanups #2 and #3), so the
lifecycle scheme can plug into the same envelope.

WorldBundle (cleanup #3):
- Replaced the three lead-scoring-typed fields (population / simulation_result
  / world_graph) with a single opaque `artifacts: Any`. Each scheme defines
  and unwraps its own container; lead-scoring adds LeadScoringArtifacts
  (schemes/lead_scoring/artifacts.py). core.models no longer imports any
  lead_scoring type — the layering inversion introduced in LTV-Pf.1 is gone.

Exposure (cleanup #2):
- apply_exposure is now scheme-agnostic: it writes the generic, spec-only
  world_spec.json (kept in exposure/metadata.py as write_world_spec_json) and
  dispatches the scheme-specific hidden-truth files to a new
  GenerationScheme.write_metadata(bundle, meta_dir) hook, resolved from
  bundle.spec.scheme via the registry. The lead-scoring graph / latent
  registry / mechanism-summary writers moved out of exposure/ into
  LeadScoringScheme.write_metadata. exposure/ no longer references lead_scoring.
- Protocol gains write_metadata; the lifecycle stub implements it (raises
  NotImplementedError until Pn.4).

Byte-identity: the full lead-scoring bundle is byte-identical across BOTH
exposure modes (research_instructor 21 files, student_public 14) — verified
against a pre-refactor SHA-256 reference. The metadata writers moved, not
changed; world_spec.json content/order is preserved.

Re-scope: the shared bundle orchestrator (cleanup #1) moves from this PR to
LTV-Pn.4. Per the roadmap's own note it is best designed with the second
scheme's write_bundle in hand; extracting it now against one scheme would
guess the hook shape. Roadmap + deferred-cleanups updated (#2, #3 done; #1 →
Pn.4).

Tests: new tests/schemes/test_scheme_metadata_hook.py (artifacts populated;
WorldBundle has only spec+artifacts; generic world_spec writer; lead-scoring
hook emits the 4 hidden-truth files; unpopulated-bundle + lifecycle-stub
raise). Updated field-access sites in 5 test modules. Full suite 1808 passed /
51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(scripts): migrate build_v*_snapshot to bundle.artifacts [LTV-Pn.2]

Self-review finding on the WorldBundle generalization: five dataset-build
scripts still read the removed bundle.simulation_result / bundle.population
fields and would AttributeError at runtime.

This slipped both nets:
- mypy leadforge does not type-check scripts/, and
- the existing tests/scripts/test_build_v*_snapshot.py cover only the
  pipelines.build_v* transform helpers (pure DataFrame functions), never the
  generate_bundle() entry point that touches the bundle.
CI's "Validate v6/v7" jobs validate a pre-built CSV; they do not regenerate,
so they would not have caught it either.

Fix: build_v4/v5/v6/v7_snapshot.py and build_midproject_lead_scoring.py now
read bundle.artifacts.simulation_result / bundle.artifacts.population. Smoke-
ran the v6 builder end-to-end through Generator.generate() to confirm.

Guard: tests/scripts/test_build_v6_snapshot.py gains
TestGenerateBundleArtifactsPath, which loads the script via importlib and runs
generate_bundle(small) so a future WorldBundle field rename can't silently
break the generate path again. The other four scripts share the identical
access pattern.

Full suite 1809 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(exposure): clear metadata/ before rewrite; fix stale docstring [LTV-Pn.2]

Two Copilot review findings on #122.

1. apply_exposure reused an existing metadata/ via mkdir(exist_ok=True) when
   writing hidden truth, so a reused output path could retain stale files.
   Pre-existing behavior, but Pn.2 makes it newly dangerous: once the
   lifecycle scheme writes a different hidden-truth file set, regenerating a
   lifecycle bundle over a path that previously held a lead-scoring bundle
   would orphan graph.graphml / mechanism_summary.json into the new bundle.
   Now apply_exposure always removes any existing metadata/ first, then
   recreates it when writing — so contents exactly match the current bundle
   (mirroring the non-writing branch, which already rmtree'd it). Byte-identity
   preserved for both modes (fresh paths have no metadata/, so the rmtree is a
   guarded no-op). Regression tests: a pre-seeded stale file is gone after an
   instructor rewrite; student_public still removes the dir entirely.

2. The lead_scoring.write_bundle docstring still described apply_exposure as
   writing the lead-scoring hidden graph + latent registry directly. Updated
   to reflect the Pn.1/Pn.2 reality: build_manifest and apply_exposure are
   scheme-agnostic, and hidden truth is delegated to write_metadata; the
   remaining shared-orchestrator extraction is deferred to Pn.4.

Full suite 1811 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 16, 2026
* refactor: shared bundle-writing orchestrator [LTV-Pn.4d]

Final sub-PR of the split LTV-Pn.4 (and the last carried M2 cleanup, #1).
Both schemes' write_bundle performed the same on-disk sequence — mkdir →
relational tables → task splits → dataset card → feature dictionary → exposure
metadata → manifest — differing only in the *content*.  Lift that sequence into
one place.

- New render/bundle.py: write_bundle_envelope(bundle, root, *, relational,
  tasks, dataset_card, feature_specs, generation_scheme, redacted, motif_family,
  relational_snapshot_safe, structural_redactions, extra_fields,
  generation_timestamp) + a TaskExport(manifest, frame) record.  It owns the I/O
  and the file-ordering the manifest's hashing depends on; it contains no
  scheme-specific logic.
- LeadScoringScheme.write_bundle and LifecycleScheme.write_bundle now compute
  only their scheme-specific content (which relational frames + snapshot-safe
  projection, which task exports, which card, which visible features, which
  manifest params) and delegate to the envelope.
- The dataset card needs table row counts; each scheme computes
  {name: len(df)} on its final frames (identical to write_relational_tables'
  counts — redaction drops columns, not rows) and renders the card before
  delegating.

Byte-identity: all FOUR bundles — lead_scoring + lifecycle, each in
research_instructor + student_public — verified byte-identical against a
pre-refactor SHA-256 reference of every file.  The orchestration moved; no
output changed.

Tests: new tests/render/test_bundle_envelope.py pins the envelope contract
(all artefacts written, manifest passthrough fields, multiple task dirs,
no metadata in snapshot-safe mode) on a minimal bundle.  Full suite 1877
passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(lead_scoring): clarify the tasks wrapper is off the write path [LTV-Pn.4d]

Self-review of the orchestrator extraction: lead-scoring's write_bundle now
writes tasks through the shared bundle envelope, so the
schemes.lead_scoring.render.tasks wrapper is no longer on the write-bundle path
— its docstring still claimed it kept the default "so existing call sites are
unchanged," which is now stale.  The wrapper is not dead: it remains a tested
convenience (its 7 test consumers rely on the CONVERTED_WITHIN_90_DAYS default),
so removing it would only churn tests to pass an explicit task and reverse the
deliberate Pn.3 layout.  Docstring updated to state it's the scheme's
default-task helper, off the write path since Pn.4d.

No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

foundation type: ci CI/CD pipeline changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants