PR 6.3: break-me guide + issue templates + v2 decision log by shaypal5 · Pull Request #75 · leadforge-dev/leadforge

shaypal5 · 2026-05-07T20:58:22Z

Summary

Phase 6 closer. Five new artefacts plus three follow-up syncs:

docs/release/break_me_guide.md — adversarial playbook.
Meta-recipe (read the dictionary → ablate, don't just probe → check
the time window → treat the train/test split as untrusted) plus
9 patterns grouped by category (leakage / split discipline /
metric and ranking traps / robustness and realism). Each pattern
carries a how to detect on any dataset recipe and a worked
example pointer back into the v1 bundle (notebook section,
validation_report.json JSON path, or feature_dictionary.csv
field). Delivers the three explicit promises notebook 04 §10 made
(target-encoding leakage, train-test contamination via account_id,
cohort-by-segment) plus six others.
docs/release/v2_decision_log.md — empty stub with the schema
documented in the preamble. Seven columns
(received_at / source / topic / severity / verdict /
next_step / link) and verdict vocabulary
(accepted-for-v2 / deferred / wont-fix / needs-investigation)
defined upfront so the first real entry doesn't trigger a refactor.
.github/ISSUE_TEMPLATE/dataset_breakage_report.yml — GitHub
Issue Forms. Bundle-level findings (leakage, contamination, ranking
surprises, notebook failures). Fields: tier dropdown, seed input,
bundle hash, suggested triage label, severity, summary, repro,
expected-vs-actual citing JSON paths, environment, two confirmation
checkboxes.
.github/ISSUE_TEMPLATE/realism_feedback.yml — GitHub Issue
Forms. Distributional / realism critiques. Fields: aspect, tier(s)
affected, domain experience one-liner (helps weight findings), claim,
data observation with pandas-snippet placeholder, suggested fix,
severity, two confirmations.
release/README.md — "Maintenance, adversarial framing, license"
section rewritten. Dead (PR 6.3) forward-pointer replaced with
real Markdown links to the break-me guide, both issue templates, and
the v2 decision log. _release_common.py's existing
](../foo) → GitHub-blob-URL rewriter handles the Kaggle/HF
rendering automatically; the regenerated
release/kaggle/dataset-metadata.json and
release/huggingface/README.md are bundled in this commit and
pass the audit-artifact-sync tests.

Notebook 03 §7 and notebook 04 §10 forward-pointers upgraded from
plain docs/release/break_me_guide.md text to Markdown links
pointing at the GitHub blob URL — a relative path would break on
Kaggle/HF where notebooks ship without the docs/ tree; the blob
URL works in both contexts.

Hostile-reviewer self-review pass

Caught two factual hallucinations in the break-me guide before
shipping: claimed "15 industries" for industry (actually 4:
logistics / healthcare_non_clinical / manufacturing /
professional_services) and used loose segment-column names
("employee tier", "ARR band") instead of the actual columns
(employee_band, estimated_revenue_band). Both fixed.

Out of scope (follow-ups)

The triage labels referenced in the break-me guide and issue
templates (needs-triage, dataset: leadforge-lead-scoring-v1,
realism, critical-leakage, etc.) are not yet defined in the
repo's GitHub label catalogue. Issue Forms fail-open on missing
labels — the form still renders, the label just isn't applied — so
this isn't a merge blocker, but creating these labels is the natural
follow-up admin task before the dataset goes public.
Phase 7 (LLM critique + publish): out of scope, lands next.

Test plan

🤖 Generated with Claude Code

Phase 6 closer. Five new artefacts plus three follow-up syncs: - docs/release/break_me_guide.md — adversarial playbook. Meta-recipe (read the dictionary → ablate, don't just probe → check the time window → treat the train/test split as untrusted) plus 9 patterns grouped by category. Each pattern carries a "how to detect on any dataset" recipe and a worked-example pointer back into the v1 bundle (notebook §, validation_report JSON path, or feature_dictionary.csv field). Delivers the three explicit promises notebook 04 §10 made (target-encoding leakage, train-test contamination via account_id, cohort-by-segment) plus six others (naming smells, standalone-AUC vs tree-ablation gap, time-window violations, value-aware ranking inversions, threshold-vs-rank ties, calibration drift across segments). - docs/release/v2_decision_log.md — empty stub with the schema (7 columns: received_at / source / topic / severity / verdict / next_step / link) and verdict vocabulary documented in the preamble. - .github/ISSUE_TEMPLATE/dataset_breakage_report.yml — GitHub Issue Forms. Fields: tier, seed, bundle hash, suggested triage label, severity, summary, repro, expected-vs-actual, environment, two confirmation checkboxes. - .github/ISSUE_TEMPLATE/realism_feedback.yml — GitHub Issue Forms. Fields: aspect, tier(s)-affected, domain experience, claim, data observation, suggested fix, severity, two confirmations. - release/README.md — "Maintenance, adversarial framing, license" section now links to the break-me guide, both issue templates, and the v2 decision log. _release_common.py's existing relative-link rewriter handles the Kaggle/HF rendering automatically; the regenerated release/kaggle/dataset-metadata.json and release/huggingface/README.md sync are bundled in this commit and pass the audit-artifact-sync tests. Notebook 03 §7 and notebook 04 §10 forward-pointers upgraded from plain `docs/release/break_me_guide.md` text to Markdown links pointing at the GitHub blob URL — relative path would break on Kaggle/HF where notebooks ship without the docs/ tree, the blob URL works in both contexts. Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy clean; leakage probes 0/3 on every public tier; hash determinism PASS 67/67; validate_release_candidate --no-rebuild exits 0; BUNDLE_SCHEMA_VERSION unchanged at 5 (this PR is documentation-only). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds Phase 6.3 “adversarial framing” artifacts to the release docs/tooling so external reviewers can report dataset breakage/realism issues consistently, and so v2 dispositions can be tracked over time.

Changes:

Introduces a break-me guide (9 adversarial patterns) and an empty-but-schematized v2 decision log.
Adds two GitHub Issue Forms (breakage + realism feedback) to standardize incoming reports.
Updates release/readme + notebook forward-pointers and re-syncs Kaggle/HF rendered artifacts to use GitHub blob URLs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`scripts/build_release_notebook_04.py`	Updates notebook 04 builder text to link to the break-me guide via GitHub blob URL.
`scripts/build_release_notebook_03.py`	Updates notebook 03 builder text to link to the break-me guide via GitHub blob URL.
`release/README.md`	Rewrites “Maintenance…” section to link to break-me guide, issue templates, and v2 decision log.
`release/notebooks/04_lift_calibration_value_ranking.ipynb`	Updates the rendered notebook markdown to include the break-me guide link.
`release/notebooks/03_leakage_and_time_windows.ipynb`	Updates the rendered notebook markdown to include the break-me guide link.
`release/kaggle/dataset-metadata.json`	Regenerates Kaggle metadata description with updated “Maintenance…” section links.
`release/huggingface/README.md`	Regenerates HF README with updated “Maintenance…” section links.
`docs/release/v2_decision_log.md`	Adds an empty decision log stub with schema/vocabulary documented.
`docs/release/break_me_guide.md`	Adds the adversarial “break-me” playbook with patterns + worked pointers.
`.github/ISSUE_TEMPLATE/realism_feedback.yml`	Adds a GitHub Issue Form for distributional/realism critiques.
`.github/ISSUE_TEMPLATE/dataset_breakage_report.yml`	Adds a GitHub Issue Form for bundle-level breakage findings.
`.agent-plan.md`	Marks Phase 6.3 deliverables as complete and documents what landed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Hostile-reviewer pass on the PR 6.3 diff caught seven issues. Fixes: 1. **post_snapshot_aggregates baseline misidentified.** The break-me guide patterns 2/3 called the validator's baseline "a fitted LR"; the actual implementation in leadforge/validation/release_quality.py is HistGBM. Fixed, with a link to the source so the model identity is verifiable. 2. **manifest.json `bundle_hash` is fictitious.** The real manifest has per-file sha256s nested under `tasks.<task>.test_sha256` and `tables.<name>.sha256`, no top-level `bundle_hash`. Fixed both the break-me guide ("What to do when you find one" §1) and the breakage issue template's bundle-hash field; placeholder now carries a real hash from the as-shipped intermediate bundle. 3. **Fabricated "70+ leads" in pattern 8.** The slate-inflation number was made up — the as-shipped intermediate bundle's actually_above readout matches capacity. Reframed as a defensive instrument, with a more honest LR-vs-GBM tie-rate observation. 4. **Misleading "instructor" tier dropdown.** Only `intermediate_instructor` ships in v1; the unqualified label suggested multiple instructor companions. Renamed to `intermediate_instructor`. 5. **Underspecified "~0.55 AUC".** Pinned to "intermediate, median across seeds 42–46" with the cross-tier range (0.52–0.61) so readers know the variance. 6. **Speculative pattern-9 calibration claim.** "Consistently over-predicts on small-revenue accounts" was hypothetical, not observed. Reframed as "in principle… whether v1 actually exhibits such drift is an open question". 7. **Issue templates referenced labels that didn't exist.** Created `dataset: leadforge-lead-scoring-v1`, `needs-triage`, `realism` so the templates' `labels:` fields actually fire on issue submission. (Triage labels in the dropdowns — critical-leakage etc. — stay as free-text choices the maintainer applies post-triage.) Net: 1260/1260 tests still pass; ruff + mypy clean; YAML still validates; no new bundle regeneration. BUNDLE_SCHEMA_VERSION unchanged at 5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Hostile-reviewer pass on round 1 caught 8 more issues; one of them (SOURCE_TREE_BLOCK staleness) was an issue I'd already deferred once, which I'm calling a cop-out and fixing now. Also folds in a planning update for the v1 release roadmap: insert a new PR before the publish PR that lets the maintainer render local mock-page previews of the Kaggle and HuggingFace dataset pages from the *exact* upload artefacts before any platform upload — staging gate so styling, link, embed, and YAML-rendering issues are caught before they hit the live page where rollback is expensive. ### Round-2 fixes (8) 1. **Pattern 6 metric mismatch.** Cohort shift uses HistGBM (release_quality.py:463/514), so the cross-seed band to compare against is `gbm_auc.spread`, not `lr_auc.spread`. Fixed. 2. **Pattern 5 numerically anchored.** "Two leads in the same account can land in train and test" was vague; the empirical reality is **518 of 557 test accounts (93 %) appear in train** — and the same numbers hold across all three tiers because the splitter is `lead_id`-keyed. Added a copy-pasteable pandas snippet so reporters can verify in 4 lines. 3. **Pattern 4 self-contradiction.** Original text said "we deliberately don't show that" while showing the leakage one-liner inline. Reworded so the contrast is clear: notebook 02 doesn't show the leakage variant; the guide does, so reviewers recognise it in code. 4. **Cross-platform hash command.** Issue template said `sha256sum tasks/.../test.parquet` — that's GNU coreutils, not present on macOS. Added `shasum -a 256` (macOS) and a portable Python one-liner; led with "easiest source: copy from manifest.json". 5. **Relative links within docs/.** Round 1 used absolute GitHub URLs for every in-repo link in the break-me guide, breaking local-clone reading (clicks went to GitHub web instead of opening the local file). Switched to relative paths within the docs tree; kept absolute URLs only where they're load-bearing (notebook forward-pointers ship to Kaggle/HF without `docs/`; issue-template Markdown bodies render via GitHub Issues; README's `](../foo)` gets rewritten by `_release_common.py` for Kaggle/HF). 6. **`expected_acv_capture_at_k` JSON path lands on a dict.** `$.tiers.<tier>.per_seed[*].expected_acv_capture_at_k` resolves to `{"50": …, "100": …}` keyed by string K. Pinned the path to `…expected_acv_capture_at_k.50` so a reader following the citation hits a scalar. 7. **Pattern 8 GBM-tie speculation.** "On a coarse-grained GBM probability output ties are routine" was over-stated — HistGBM's `predict_proba` is continuous via leaf-score sums. Dropped the LR-vs-GBM theorising; kept the empirical observation that on the as-shipped intermediate bundle the realised count matches capacity. 8. **SOURCE_TREE_BLOCK staleness.** Both `release/README.md` and `_release_common.py`'s `SOURCE_TREE_BLOCK` constant listed `notebooks/01_baseline_lead_scoring.ipynb` as the only notebook; four ship now. Updated to a one-liner that names all four. Audit-sync test passes. ### Planning update — local mock-page preview PR `docs/release/v1_release_roadmap.md` and `.agent-plan.md` updated: - Phase 7 grows from 2 PRs to 3. - New **PR 7.2** sits between the LLM critique PR (now 7.1) and the publish PR (now 7.3): `scripts/preview_kaggle_page.py` and `scripts/preview_hf_page.py` render offline HTML mocks from the *exact* upload artefacts (`release/kaggle/dataset-metadata.json` + inlined README + cover image; `release/huggingface/README.md` with frontmatter + body), serve over `localhost`, accept `--variant=public|instructor` (HF) and `--port` / `--open-browser` flags. Tests cover required-field presence, link resolution, schema-column listing, configs-block round-trip. - The publish PR's runbook (`v1_release_notes.md`) cites the preview commands as required pre-flight. - Phase summary table + total PR count (14 → 15) updated for consistency. Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy clean; leakage probes 0/3 on every tier; hash determinism PASS 67/67; `validate_release_candidate --no-rebuild` exits 0; BUNDLE_SCHEMA_VERSION unchanged at 5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three Copilot comments on the issue templates, all accepted: 1. **COPILOT-1** (dataset_breakage_report.yml:9 → realism template link). The relative `realism_feedback.yml` link doesn't resolve when the description renders in the GitHub Issue Forms chooser. Replaced with the "open another template directly" URL form (`/issues/new?template=realism_feedback.yml`), which lands a misrouted reporter on the realism *form*, not on raw YAML. 2. **COPILOT-2** (dataset_breakage_report.yml:139 → `render: text`). `render: text` isn't a documented value for Issue Forms textareas (the supported set is language identifiers from GitHub's syntax highlighter, plus `markdown`). The intent here is plain-text formatting, which is the default when `render` is omitted. Dropped the `render` line entirely. 3. **COPILOT-3** (realism_feedback.yml:58 → placeholder mentions `retail`). Same hallucination class as the round-1 "15 industries" fix. The actual industries are logistics, healthcare_non_clinical, manufacturing, professional_services. Rewrote the placeholder to use the real names so a reporter glancing at the example doesn't think `retail` is a category they should be filing about. Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy clean; YAML validates; BUNDLE_SCHEMA_VERSION unchanged at 5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-07T21:34:39Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #75 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25523305756 attempt 1
Comment timestamp: 2026-05-07T21:33:49.952451+00:00
PR head commit: 4f70a16527fd3f9f0864584a2e69214912eaecff

Copilot AI review requested due to automatic review settings May 7, 2026 20:58

shaypal5 added this to the dataset: leadforge-lead-scoring-v1 milestone May 7, 2026

shaypal5 added type: docs Documentation or narrative changes layer: recipes recipes/ recipe assets and registry labels May 7, 2026

Copilot started reviewing on behalf of shaypal5 May 7, 2026 20:58 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread .github/ISSUE_TEMPLATE/dataset_breakage_report.yml Outdated

Comment thread .github/ISSUE_TEMPLATE/dataset_breakage_report.yml Outdated

Comment thread .github/ISSUE_TEMPLATE/realism_feedback.yml Outdated

This comment has been minimized.

Sign in to view

shaypal5 merged commit e124fda into main May 7, 2026
9 checks passed

shaypal5 deleted the docs/break-me-guide-and-issue-templates branch May 7, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR 6.3: break-me guide + issue templates + v2 decision log#75

PR 6.3: break-me guide + issue templates + v2 decision log#75
shaypal5 merged 4 commits into
mainfrom
docs/break-me-guide-and-issue-templates

shaypal5 commented May 7, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 7, 2026

Summary

Hostile-reviewer self-review pass

Out of scope (follow-ups)

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants