Skip to content

PR 6.3: break-me guide + issue templates + v2 decision log#75

Merged
shaypal5 merged 4 commits into
mainfrom
docs/break-me-guide-and-issue-templates
May 7, 2026
Merged

PR 6.3: break-me guide + issue templates + v2 decision log#75
shaypal5 merged 4 commits into
mainfrom
docs/break-me-guide-and-issue-templates

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 6 closer. Five new artefacts plus three follow-up syncs:

  • docs/release/break_me_guide.md — adversarial playbook.
    Meta-recipe (read the dictionary → ablate, don't just probe → check
    the time window → treat the train/test split as untrusted) plus
    9 patterns grouped by category (leakage / split discipline /
    metric and ranking traps / robustness and realism). Each pattern
    carries a how to detect on any dataset recipe and a worked
    example
    pointer back into the v1 bundle (notebook section,
    validation_report.json JSON path, or feature_dictionary.csv
    field). Delivers the three explicit promises notebook 04 §10 made
    (target-encoding leakage, train-test contamination via account_id,
    cohort-by-segment) plus six others.

  • docs/release/v2_decision_log.md — empty stub with the schema
    documented in the preamble. Seven columns
    (received_at / source / topic / severity / verdict /
    next_step / link) and verdict vocabulary
    (accepted-for-v2 / deferred / wont-fix / needs-investigation)
    defined upfront so the first real entry doesn't trigger a refactor.

  • .github/ISSUE_TEMPLATE/dataset_breakage_report.yml — GitHub
    Issue Forms. Bundle-level findings (leakage, contamination, ranking
    surprises, notebook failures). Fields: tier dropdown, seed input,
    bundle hash, suggested triage label, severity, summary, repro,
    expected-vs-actual citing JSON paths, environment, two confirmation
    checkboxes.

  • .github/ISSUE_TEMPLATE/realism_feedback.yml — GitHub Issue
    Forms. Distributional / realism critiques. Fields: aspect, tier(s)
    affected, domain experience one-liner (helps weight findings), claim,
    data observation with pandas-snippet placeholder, suggested fix,
    severity, two confirmations.

  • release/README.md — "Maintenance, adversarial framing, license"
    section rewritten. Dead (PR 6.3) forward-pointer replaced with
    real Markdown links to the break-me guide, both issue templates, and
    the v2 decision log. _release_common.py's existing
    ](../foo) → GitHub-blob-URL rewriter handles the Kaggle/HF
    rendering automatically; the regenerated
    release/kaggle/dataset-metadata.json and
    release/huggingface/README.md are bundled in this commit and
    pass the audit-artifact-sync tests.

Notebook 03 §7 and notebook 04 §10 forward-pointers upgraded from
plain docs/release/break_me_guide.md text to Markdown links
pointing at the GitHub blob URL — a relative path would break on
Kaggle/HF where notebooks ship without the docs/ tree; the blob
URL works in both contexts.

Hostile-reviewer self-review pass

  • Caught two factual hallucinations in the break-me guide before
    shipping: claimed "15 industries" for industry (actually 4:
    logistics / healthcare_non_clinical / manufacturing /
    professional_services) and used loose segment-column names
    ("employee tier", "ARR band") instead of the actual columns
    (employee_band, estimated_revenue_band). Both fixed.

Out of scope (follow-ups)

  • The triage labels referenced in the break-me guide and issue
    templates (needs-triage, dataset: leadforge-lead-scoring-v1,
    realism, critical-leakage, etc.) are not yet defined in the
    repo's GitHub label catalogue. Issue Forms fail-open on missing
    labels — the form still renders, the label just isn't applied — so
    this isn't a merge blocker, but creating these labels is the natural
    follow-up admin task before the dataset goes public.
  • Phase 7 (LLM critique + publish): out of scope, lands next.

Test plan

  • ruff check . / ruff format --check . — All checks passed
  • mypy leadforge/ — Success: no issues found in 82 source files
  • pytest -q — 1260 passed, 5 skipped (publish-extra-gated)
  • python scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65 — exits 0; 0/3 path prediction rates on every public tier
  • python scripts/verify_hash_determinism.py — PASS 67/67 files identical
  • python scripts/validate_release_candidate.py --no-rebuild — exits 0
  • BUNDLE_SCHEMA_VERSION unchanged at 5 (documentation-only PR)
  • Notebook builders re-run; byte-identical to committed .ipynb files; tests/scripts/test_release_notebook_builders.py + tests/release/notebooks/ green
  • Issue template YAML validates (yaml.safe_load round-trip)
  • Markdown links in the new guide resolve (file paths exist; GitHub blob URLs use the _release_common.py GITHUB_BLOB_BASE constant)

🤖 Generated with Claude Code

Phase 6 closer.  Five new artefacts plus three follow-up syncs:

- docs/release/break_me_guide.md — adversarial playbook.  Meta-recipe
  (read the dictionary → ablate, don't just probe → check the time
  window → treat the train/test split as untrusted) plus 9 patterns
  grouped by category.  Each pattern carries a "how to detect on any
  dataset" recipe and a worked-example pointer back into the v1 bundle
  (notebook §, validation_report JSON path, or feature_dictionary.csv
  field).  Delivers the three explicit promises notebook 04 §10 made
  (target-encoding leakage, train-test contamination via account_id,
  cohort-by-segment) plus six others (naming smells, standalone-AUC vs
  tree-ablation gap, time-window violations, value-aware ranking
  inversions, threshold-vs-rank ties, calibration drift across
  segments).

- docs/release/v2_decision_log.md — empty stub with the schema (7
  columns: received_at / source / topic / severity / verdict /
  next_step / link) and verdict vocabulary documented in the preamble.

- .github/ISSUE_TEMPLATE/dataset_breakage_report.yml — GitHub Issue
  Forms.  Fields: tier, seed, bundle hash, suggested triage label,
  severity, summary, repro, expected-vs-actual, environment, two
  confirmation checkboxes.

- .github/ISSUE_TEMPLATE/realism_feedback.yml — GitHub Issue Forms.
  Fields: aspect, tier(s)-affected, domain experience, claim, data
  observation, suggested fix, severity, two confirmations.

- release/README.md — "Maintenance, adversarial framing, license"
  section now links to the break-me guide, both issue templates, and
  the v2 decision log.  _release_common.py's existing relative-link
  rewriter handles the Kaggle/HF rendering automatically; the
  regenerated release/kaggle/dataset-metadata.json and
  release/huggingface/README.md sync are bundled in this commit and
  pass the audit-artifact-sync tests.

Notebook 03 §7 and notebook 04 §10 forward-pointers upgraded from
plain `docs/release/break_me_guide.md` text to Markdown links pointing
at the GitHub blob URL — relative path would break on Kaggle/HF where
notebooks ship without the docs/ tree, the blob URL works in both
contexts.

Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy
clean; leakage probes 0/3 on every public tier; hash determinism PASS
67/67; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5 (this PR is documentation-only).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 20:58
@shaypal5 shaypal5 added type: docs Documentation or narrative changes layer: recipes recipes/ recipe assets and registry labels May 7, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase 6.3 “adversarial framing” artifacts to the release docs/tooling so external reviewers can report dataset breakage/realism issues consistently, and so v2 dispositions can be tracked over time.

Changes:

  • Introduces a break-me guide (9 adversarial patterns) and an empty-but-schematized v2 decision log.
  • Adds two GitHub Issue Forms (breakage + realism feedback) to standardize incoming reports.
  • Updates release/readme + notebook forward-pointers and re-syncs Kaggle/HF rendered artifacts to use GitHub blob URLs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/build_release_notebook_04.py Updates notebook 04 builder text to link to the break-me guide via GitHub blob URL.
scripts/build_release_notebook_03.py Updates notebook 03 builder text to link to the break-me guide via GitHub blob URL.
release/README.md Rewrites “Maintenance…” section to link to break-me guide, issue templates, and v2 decision log.
release/notebooks/04_lift_calibration_value_ranking.ipynb Updates the rendered notebook markdown to include the break-me guide link.
release/notebooks/03_leakage_and_time_windows.ipynb Updates the rendered notebook markdown to include the break-me guide link.
release/kaggle/dataset-metadata.json Regenerates Kaggle metadata description with updated “Maintenance…” section links.
release/huggingface/README.md Regenerates HF README with updated “Maintenance…” section links.
docs/release/v2_decision_log.md Adds an empty decision log stub with schema/vocabulary documented.
docs/release/break_me_guide.md Adds the adversarial “break-me” playbook with patterns + worked pointers.
.github/ISSUE_TEMPLATE/realism_feedback.yml Adds a GitHub Issue Form for distributional/realism critiques.
.github/ISSUE_TEMPLATE/dataset_breakage_report.yml Adds a GitHub Issue Form for bundle-level breakage findings.
.agent-plan.md Marks Phase 6.3 deliverables as complete and documents what landed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/ISSUE_TEMPLATE/dataset_breakage_report.yml Outdated
Comment thread .github/ISSUE_TEMPLATE/dataset_breakage_report.yml Outdated
Comment thread .github/ISSUE_TEMPLATE/realism_feedback.yml Outdated
Hostile-reviewer pass on the PR 6.3 diff caught seven issues. Fixes:

1. **post_snapshot_aggregates baseline misidentified.** The break-me
   guide patterns 2/3 called the validator's baseline "a fitted LR";
   the actual implementation in
   leadforge/validation/release_quality.py is HistGBM. Fixed, with a
   link to the source so the model identity is verifiable.

2. **manifest.json `bundle_hash` is fictitious.** The real manifest
   has per-file sha256s nested under `tasks.<task>.test_sha256` and
   `tables.<name>.sha256`, no top-level `bundle_hash`. Fixed both
   the break-me guide ("What to do when you find one" §1) and the
   breakage issue template's bundle-hash field; placeholder now
   carries a real hash from the as-shipped intermediate bundle.

3. **Fabricated "70+ leads" in pattern 8.** The slate-inflation
   number was made up — the as-shipped intermediate bundle's
   actually_above readout matches capacity. Reframed as a defensive
   instrument, with a more honest LR-vs-GBM tie-rate observation.

4. **Misleading "instructor" tier dropdown.** Only
   `intermediate_instructor` ships in v1; the unqualified label
   suggested multiple instructor companions. Renamed to
   `intermediate_instructor`.

5. **Underspecified "~0.55 AUC".** Pinned to "intermediate, median
   across seeds 42–46" with the cross-tier range (0.52–0.61) so
   readers know the variance.

6. **Speculative pattern-9 calibration claim.** "Consistently
   over-predicts on small-revenue accounts" was hypothetical, not
   observed. Reframed as "in principle… whether v1 actually
   exhibits such drift is an open question".

7. **Issue templates referenced labels that didn't exist.** Created
   `dataset: leadforge-lead-scoring-v1`, `needs-triage`, `realism`
   so the templates' `labels:` fields actually fire on issue
   submission. (Triage labels in the dropdowns — critical-leakage
   etc. — stay as free-text choices the maintainer applies
   post-triage.)

Net: 1260/1260 tests still pass; ruff + mypy clean; YAML still
validates; no new bundle regeneration. BUNDLE_SCHEMA_VERSION
unchanged at 5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Hostile-reviewer pass on round 1 caught 8 more issues; one of them
(SOURCE_TREE_BLOCK staleness) was an issue I'd already deferred once,
which I'm calling a cop-out and fixing now. Also folds in a planning
update for the v1 release roadmap: insert a new PR before the publish
PR that lets the maintainer render local mock-page previews of the
Kaggle and HuggingFace dataset pages from the *exact* upload artefacts
before any platform upload — staging gate so styling, link, embed,
and YAML-rendering issues are caught before they hit the live page
where rollback is expensive.

### Round-2 fixes (8)

1. **Pattern 6 metric mismatch.** Cohort shift uses HistGBM (release_quality.py:463/514), so the cross-seed band to compare against is `gbm_auc.spread`, not `lr_auc.spread`. Fixed.

2. **Pattern 5 numerically anchored.** "Two leads in the same account can land in train and test" was vague; the empirical reality is **518 of 557 test accounts (93 %) appear in train** — and the same numbers hold across all three tiers because the splitter is `lead_id`-keyed. Added a copy-pasteable pandas snippet so reporters can verify in 4 lines.

3. **Pattern 4 self-contradiction.** Original text said "we deliberately don't show that" while showing the leakage one-liner inline. Reworded so the contrast is clear: notebook 02 doesn't show the leakage variant; the guide does, so reviewers recognise it in code.

4. **Cross-platform hash command.** Issue template said `sha256sum tasks/.../test.parquet` — that's GNU coreutils, not present on macOS. Added `shasum -a 256` (macOS) and a portable Python one-liner; led with "easiest source: copy from manifest.json".

5. **Relative links within docs/.** Round 1 used absolute GitHub URLs for every in-repo link in the break-me guide, breaking local-clone reading (clicks went to GitHub web instead of opening the local file). Switched to relative paths within the docs tree; kept absolute URLs only where they're load-bearing (notebook forward-pointers ship to Kaggle/HF without `docs/`; issue-template Markdown bodies render via GitHub Issues; README's `](../foo)` gets rewritten by `_release_common.py` for Kaggle/HF).

6. **`expected_acv_capture_at_k` JSON path lands on a dict.** `$.tiers.<tier>.per_seed[*].expected_acv_capture_at_k` resolves to `{"50": …, "100": …}` keyed by string K. Pinned the path to `…expected_acv_capture_at_k.50` so a reader following the citation hits a scalar.

7. **Pattern 8 GBM-tie speculation.** "On a coarse-grained GBM probability output ties are routine" was over-stated — HistGBM's `predict_proba` is continuous via leaf-score sums. Dropped the LR-vs-GBM theorising; kept the empirical observation that on the as-shipped intermediate bundle the realised count matches capacity.

8. **SOURCE_TREE_BLOCK staleness.** Both `release/README.md` and `_release_common.py`'s `SOURCE_TREE_BLOCK` constant listed `notebooks/01_baseline_lead_scoring.ipynb` as the only notebook; four ship now. Updated to a one-liner that names all four. Audit-sync test passes.

### Planning update — local mock-page preview PR

`docs/release/v1_release_roadmap.md` and `.agent-plan.md` updated:

- Phase 7 grows from 2 PRs to 3.
- New **PR 7.2** sits between the LLM critique PR (now 7.1) and the publish PR (now 7.3): `scripts/preview_kaggle_page.py` and `scripts/preview_hf_page.py` render offline HTML mocks from the *exact* upload artefacts (`release/kaggle/dataset-metadata.json` + inlined README + cover image; `release/huggingface/README.md` with frontmatter + body), serve over `localhost`, accept `--variant=public|instructor` (HF) and `--port` / `--open-browser` flags. Tests cover required-field presence, link resolution, schema-column listing, configs-block round-trip.
- The publish PR's runbook (`v1_release_notes.md`) cites the preview commands as required pre-flight.
- Phase summary table + total PR count (14 → 15) updated for consistency.

Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy
clean; leakage probes 0/3 on every tier; hash determinism PASS 67/67;
`validate_release_candidate --no-rebuild` exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Three Copilot comments on the issue templates, all accepted:

1. **COPILOT-1** (dataset_breakage_report.yml:9 → realism template
   link). The relative `realism_feedback.yml` link doesn't resolve
   when the description renders in the GitHub Issue Forms chooser.
   Replaced with the "open another template directly" URL form
   (`/issues/new?template=realism_feedback.yml`), which lands a
   misrouted reporter on the realism *form*, not on raw YAML.

2. **COPILOT-2** (dataset_breakage_report.yml:139 → `render: text`).
   `render: text` isn't a documented value for Issue Forms textareas
   (the supported set is language identifiers from GitHub's syntax
   highlighter, plus `markdown`). The intent here is plain-text
   formatting, which is the default when `render` is omitted. Dropped
   the `render` line entirely.

3. **COPILOT-3** (realism_feedback.yml:58 → placeholder mentions
   `retail`). Same hallucination class as the round-1 "15 industries"
   fix. The actual industries are logistics, healthcare_non_clinical,
   manufacturing, professional_services. Rewrote the placeholder to
   use the real names so a reporter glancing at the example doesn't
   think `retail` is a category they should be filing about.

Net: 1260/1260 tests pass + 5 publish-extra-gated skips; ruff + mypy
clean; YAML validates; BUNDLE_SCHEMA_VERSION unchanged at 5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #75 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25523305756 attempt 1
Comment timestamp: 2026-05-07T21:33:49.952451+00:00
PR head commit: 4f70a16527fd3f9f0864584a2e69214912eaecff

@shaypal5 shaypal5 merged commit e124fda into main May 7, 2026
9 checks passed
@shaypal5 shaypal5 deleted the docs/break-me-guide-and-issue-templates branch May 7, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: recipes recipes/ recipe assets and registry type: docs Documentation or narrative changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants