refactor: consolidate v4 planning docs + spike experiment by shaypal5 · Pull Request #20 · leadforge-dev/leadforge

shaypal5 · 2026-04-29T13:06:00Z

Summary

Follow-up to #19. Self-review identified 8 issues with the v4 planning PR — this addresses all of them:

Spike experiment (scripts/spike_category_signal.py): validated that CategoricalInfluence scaling was wrong (not wired into simulation). Correct approach: correlate observables with latent traits in population.py. Scale 1.8 → AUC 0.694.
Doc consolidation: 5 overlapping docs → 2 (design.md + validation_spec.md). Single source of truth for shared constants.
Known limitations section: explicit is_sql invariant workaround + role_function gap.
Tuning protocol: decision table for when validation checks fail.
Trimmed AGENTS.md: v4-specific content replaced with pointer to docs/v4/.
ACV mapping: explicit band→midpoint table + null-band behavior.
Merged M1+M2: engine + build pipeline as single milestone (can't validate independently).

Test plan

590 tests passing
Spike experiment runs and produces expected results
All file references in CLAUDE.md, AGENTS.md, .agent-plan.md updated
No stale references to deleted files

🤖 Generated with Claude Code

Self-review of PR #19 identified 8 issues. This commit addresses all of them: 1. Ran spike experiment validating category signal approach — the spec's CategoricalInfluence scaling was wrong (not wired into simulation). Correct approach: correlate observables with latent traits in population.py. Scale 1.8 gives AUC 0.694, within [0.65, 0.90] target. 2. Consolidated 5 overlapping spec docs into 2: design.md (single source of truth for requirements, contract, engine changes, plan) + validation_spec.md. 3. Added "Known limitations" section (is_sql invariant, role_function gap). 4. Added missingness rationale (detectability at n=1000, not arbitrary). 5. Added tuning protocol decision table for when validation checks fail. 6. Trimmed AGENTS.md to durable conventions + pointer to docs/v4/. 7. Added explicit ACV band→midpoint mapping table and null-band behavior. 8. Merged v4-M1 and v4-M2 into single milestone (engine + build pipeline can't be validated independently). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Aligns the local pre-commit hook with CI's ruff version (unpinned, currently 0.11.x). The old v0.4.5 hook accepted formatting that the CI ruff rejects, allowing format violations to slip through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-29T13:10:52Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#20. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25110782532 attempt 1
Comment timestamp: 2026-04-29T13:10:00.453021+00:00
PR head commit: 506ea24c83245374160bdcafd3f46e5dc1afb381

Copilot

Pull request overview

This PR follows up on the v4 planning work by consolidating multiple overlapping v4 spec documents into a single design doc + a companion validation spec, and adds a spike script to empirically validate the proposed “category signal” approach for v4.

Changes:

Added docs/v4/design.md as the consolidated single source of truth (requirements + contract + engine changes + implementation plan) and added a companion note to validation_spec.md.
Added scripts/spike_category_signal.py to measure category spread + AUC under different population-level correlation scales.
Removed superseded v4 planning docs and updated repo pointers (CLAUDE.md, AGENTS.md, .agent-plan.md) to reference the consolidated docs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`scripts/spike_category_signal.py`	New spike experiment script for measuring category→conversion signal and LR AUC under different correlation scales.
`docs/v4/design.md`	New consolidated v4 design doc (requirements/contract/engine plan) intended as the single source of truth.
`docs/v4/validation_spec.md`	Adds a pointer clarifying this doc is the automated-check companion to `design.md`.
`docs/v4/planning_pr_review.md`	Adds the self-review writeup and treatment plan for PR #19 issues.
`CLAUDE.md`	Updates v4 doc references to point to `docs/v4/design.md` + related consolidated artifacts.
`AGENTS.md`	Removes v4-specific step-by-step guide and replaces with pointers to `docs/v4/`.
`.agent-plan.md`	Updates v4 milestone/checklist references to match the consolidated doc structure and spike script.
`docs/v4/lead_scoring_v4_requirements.md`	Deleted (content consolidated into `docs/v4/design.md`).
`docs/v4/dataset_contract.md`	Deleted (content consolidated into `docs/v4/design.md`).
`docs/v4/engine_changes_spec.md`	Deleted (content consolidated into `docs/v4/design.md`).
`docs/v4/implementation_plan.md`	Deleted (content consolidated into `docs/v4/design.md`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T13:12:33Z

+Include an `expected_acv` numeric feature so students can compute `P(conversion) × expected_acv` and practice value-aware ranking.
+
+**ACV derivation (single source of truth):**
+
+| Condition | Value |
+|---|---|
+| Opportunity created by snapshot day | Opportunity's `estimated_acv` |
+| No opportunity, `estimated_revenue_band` known | Band midpoint (see table below) |
+| No opportunity, band unknown | NaN |
+
+**Revenue band → ACV midpoint mapping:**
+
+| Band | Midpoint ($k) |
+|---|---|
+| $1M–$10M | 25 |
+| $10M–$50M | 55 |
+| $50M–$200M | 85 |
+| $200M+ | 140 |
+
+These midpoints are derived from the engine's `_EMPLOYEE_ACV_RANGES` in `simulation/engine.py`, which maps employee bands to ACV ranges. Since the dataset exposes `estimated_revenue_band` (not employee band), the midpoints approximate the overlap between revenue bands and the engine's ACV sampling.


The ACV midpoint mapping table labels values as "Midpoint ($k)", but the engine’s OpportunityRow.estimated_acv values are in raw USD (see _EMPLOYEE_ACV_RANGES in leadforge/simulation/engine.py). Please clarify the units for expected_acv (USD vs thousands) and ensure the midpoint values/implementation match that unit consistently.

Suggested change

Include an `expected_acv` numeric feature so students can compute `P(conversion) × expected_acv` and practice value-aware ranking.

**ACV derivation (single source of truth):**

| Condition | Value |

|---|---|

| Opportunity created by snapshot day | Opportunity's `estimated_acv` |

| No opportunity, `estimated_revenue_band` known | Band midpoint (see table below) |

| No opportunity, band unknown | NaN |

**Revenue band → ACV midpoint mapping:**

| Band | Midpoint ($k) |

|---|---|

| $1M–$10M | 25 |

| $10M–$50M | 55 |

| $50M–$200M | 85 |

| $200M+ | 140 |

These midpoints are derived from the engine's `_EMPLOYEE_ACV_RANGES` in `simulation/engine.py`, which maps employee bands to ACV ranges. Since the dataset exposes `estimated_revenue_band` (not employee band), the midpoints approximate the overlap between revenue bands and the engine's ACV sampling.

Include an `expected_acv` numeric feature in raw USD so students can compute `P(conversion) × expected_acv` and practice value-aware ranking.

**ACV derivation (single source of truth):**

| Condition | Value |

|---|---|

| Opportunity created by snapshot day | Opportunity's `estimated_acv` |

| No opportunity, `estimated_revenue_band` known | Band midpoint in raw USD (see table below) |

| No opportunity, band unknown | NaN |

**Revenue band → ACV midpoint mapping:**

| Band | Midpoint (USD) |

|---|---|

| $1M–$10M | 25000 |

| $10M–$50M | 55000 |

| $50M–$200M | 85000 |

| $200M+ | 140000 |

These midpoints are derived from the engine's `_EMPLOYEE_ACV_RANGES` in `simulation/engine.py`, which maps employee bands to ACV ranges. Since the dataset exposes `estimated_revenue_band` (not employee band), the midpoints approximate the overlap between revenue bands and the engine's ACV sampling. The fallback midpoint values are expressed in raw USD so they use the same unit as `OpportunityRow.estimated_acv` and keep `expected_acv` consistent regardless of derivation path.

Copilot · 2026-04-29T13:12:33Z

+## Target column set
+
+| # | Column | Type | Source | Notes |
+|---|---|---|---|---|
+| 1 | `industry` | categorical | account | 4 values |
+| 2 | `region` | categorical | account | US, UK |
+| 3 | `company_size` | categorical | account | 4 bands |
+| 4 | `company_revenue` | categorical | account | 4 bands |
+| 5 | `contact_role` | categorical | contact | 4 roles |
+| 6 | `seniority` | categorical | contact | 5 levels (~8% missing for partner_referral) |
+| 7 | `lead_source` | categorical | lead | 3 channels |
+| 8 | `opportunity_created` | binary 0/1 | derived | Opp opened by snapshot day |
+| 9 | `demo_completed` | binary 0/1 | derived | Demo done by snapshot day |
+| 10 | `expected_acv` | numeric | derived | See R1 ACV derivation table |
+| 11 | `inbound_touches` | integer | events ≤ snapshot | Inbound touchpoints |
+| 12 | `outbound_touches` | integer | events ≤ snapshot | Outbound touchpoints |
+| 13 | `touches_week_1` | integer | events ≤ day 7 | First-week touch intensity |
+| 14 | `web_sessions` | integer | events ≤ snapshot | Sessions (~15% missing for outbound) |
+| 15 | `sales_activities` | integer | events ≤ snapshot | Sales activities count |
+| 16 | `days_since_last_touch` | float | events ≤ snapshot | Natural NaN when no touches |
+| 17 | `total_touches_all` | integer | **ALL events** | Leakage trap — full 90-day window |
+| 18 | `converted` | binary 0/1 | target | Converted within 90 days |


Within this doc, the target column set uses names like company_size, company_revenue, contact_role, and converted, but the engine-facing sections and spike refer to current bundle/snapshot names like employee_band, estimated_revenue_band, role_function, and converted_within_90_days. Since this file is intended as the single source of truth, add an explicit mapping (raw bundle column → final CSV column) or standardize on one naming scheme throughout to avoid implementation/validation mismatches.

Copilot · 2026-04-29T13:12:34Z

+| $1M–$10M | 25 |
+| $10M–$50M | 55 |
+| $50M–$200M | 85 |


Several band labels in the mapping/correlation tables use an en-dash ($1M–$10M, $10M–$50M, etc.), but the engine’s estimated_revenue_band values use a plain hyphen ($1M-$10M, …). If these strings are meant to be used as literal keys (e.g., in YAML config / lookup tables), please normalize them to the exact engine labels to avoid hard-to-spot mismatches.

Suggested change

| $1M–$10M | 25 |

| $10M–$50M | 55 |

| $50M–$200M | 85 |

| $1M-$10M | 25 |

| $10M-$50M | 55 |

| $50M-$200M | 85 |

Copilot · 2026-04-29T13:12:34Z

+            )
+
+
+def run_pipeline(label: str, gen: Generator, scale: float | None = None) -> None:


run_pipeline is annotated as returning None, but it returns auc (a float). Update the return annotation to float (or stop returning a value) so type checkers and readers match the actual behavior.

Suggested change

def run_pipeline(label: str, gen: Generator, scale: float | None = None) -> None:

def run_pipeline(label: str, gen: Generator, scale: float | None = None) -> float:

Copilot · 2026-04-29T13:12:34Z

+3. Correlated observables (1.8x boost) — stronger correlation
+
+Reports category spread (max - min conversion rate) per categorical feature
+and logistic regression AUC at day-21 snapshot.


The script header/docstring says results are measured on a day-21 snapshot, but run_pipeline() calls build_snapshot(sim, pop) without any day-21 gating (the current build_snapshot aggregates over the full simulation output). Either update the description/output text to reflect the actual window used, or implement explicit day-21 filtering in this spike so the reported AUC/spreads correspond to the intended snapshot day.

Suggested change

and logistic regression AUC at day-21 snapshot.

and logistic regression AUC from the snapshot produced by the current

simulation output aggregation.

Copilot · 2026-04-29T13:12:34Z

+    config = gen._world_spec.config
+    narrative = gen._world_spec.narrative


This spike reaches into Generator private state (gen._world_spec.*). Since Generator exposes public config and world_spec properties, prefer those to avoid coupling the script to internal implementation details.

Suggested change

config = gen._world_spec.config

narrative = gen._world_spec.narrative

world_spec = gen.world_spec

config = gen.config

narrative = world_spec.narrative

Copilot · 2026-04-29T13:12:35Z

+    feature_cols = [c for c in df.columns if c != "converted_within_90_days"]
+    x_df = df[feature_cols].copy()
+    y = df["converted_within_90_days"].astype(int)


measure_auc() trains on feature_cols = [c for c in df.columns if c != 'converted_within_90_days'], which will include opaque identifiers and meta columns (e.g., lead_id, account_id, contact_id, lead_created_at, current_stage, is_sql). Including these can make the spike AUC hard to interpret/compare against the planned v4 column set (and can spuriously inflate train AUC). Consider restricting to the intended v4 feature subset (or explicitly dropping IDs/timestamps/outcome-adjacent columns) before fitting.

Copilot AI review requested due to automatic review settings April 29, 2026 13:06

Copilot started reviewing on behalf of shaypal5 April 29, 2026 13:06 View session

This comment has been minimized.

Sign in to view

style: fix ruff formatting in spike script

b21cf09

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This comment has been minimized.

Sign in to view

shaypal5 merged commit eee5a86 into main Apr 29, 2026
5 checks passed

shaypal5 deleted the fix/v4-planning-review-fixes branch April 29, 2026 13:11

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: consolidate v4 planning docs + spike experiment#20

refactor: consolidate v4 planning docs + spike experiment#20
shaypal5 merged 3 commits into
mainfrom
fix/v4-planning-review-fixes

shaypal5 commented Apr 29, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)


		def run_pipeline(label: str, gen: Generator, scale: float \| None = None) -> None:

	and logistic regression AUC at day-21 snapshot.
	and logistic regression AUC from the snapshot produced by the current
	simulation output aggregation.

		config = gen._world_spec.config
		narrative = gen._world_spec.narrative

-    config = gen._world_spec.config
-    narrative = gen._world_spec.narrative
+    world_spec = gen.world_spec
+    config = gen.config
+    narrative = world_spec.narrative

Conversation

shaypal5 commented Apr 29, 2026

Summary

Test plan

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants