Skip to content

scripts/ is outside the mypy and test nets — refactors can break it silently #123

@shaypal5

Description

@shaypal5

Problem

The scripts/ tree is not covered by either of our two safety nets:

  • Type checking: CI runs mypy leadforge only — scripts/ is never type-checked.
  • Tests: the tests/scripts/test_build_v*_snapshot.py suites cover the leadforge.pipelines.build_v* transform helpers (pure DataFrame functions), but not the scripts' own generate_bundle() / main() entry points that touch package objects.

As a result, a refactor in leadforge/ can break a script at runtime while the full suite and CI stay green.

Concrete example (already fixed)

In LTV-Pn.2 (#122), WorldBundle's population / simulation_result / world_graph fields were replaced by an opaque artifacts container. Five scripts — build_v4/v5/v6/v7_snapshot.py and build_midproject_lead_scoring.py — still read bundle.simulation_result / bundle.population and would have AttributeError-ed on the next dataset regeneration. Nothing caught it:

  • mypy leadforge doesn't see scripts/;
  • the existing script tests don't call generate_bundle();
  • CI's "Validate v6/v7 lead scoring dataset" jobs validate a pre-built CSV, not a fresh regeneration.

#122 fixed the five call sites and added one representative guard (TestGenerateBundleArtifactsPath, which loads build_v6_snapshot.py via importlib and runs generate_bundle small). That closes the specific hole but not the systemic gap.

Proposed work

  1. Bring scripts/ under mypy — add it to the mypy targets (likely a separate, more lenient section in pyproject.toml, since scripts carry many untyped patterns). Expect an initial wave of annotations / # type: ignore to land it cleanly.
  2. Smoke-test the remaining build-script entry points — extend the importlib-load + tiny-run guard (the pattern added in refactor: scheme-agnostic WorldBundle + exposure metadata hook [LTV-Pn.2] #122 for v6) to v4/v5/v7 and build_midproject, so every generate_*/main path is exercised at least once.
  3. Optionally, decide which one-off scripts are genuinely legacy and could be deleted rather than maintained.

Why not in #122

The mechanical .artifacts. migration + a representative guard belonged in #122 (it caused the break). Onboarding all of scripts/ into mypy is a noisy, self-contained chore better done on its own so the refactor PR stays reviewable.

Surfaced during the LTV-Pn.2 self-review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: choreTooling, config, repo hygienetype: ciCI/CD pipeline changes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions