Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ _Source: `docs/external_review/summaries/v1_release_review_synthesis.md` — cro
- Labels: `type: test`, `layer: validation`, `layer: render`
- Size: S (~450 lines, mostly tests)

- [ ] **PR 8.4** — `feat(scripts): integration script + preview hardening`
- [x] **PR 8.4** — `feat(scripts): integration script + preview hardening`
- **Regenerate lockfile + bump to v1.0.1** (HIGH): delete `package-lock.json`, update `package.json` pin to `github:ShmuggingFace/ShmuggingFaceCore#v1.0.1`, regenerate via HTTPS. Fixes SSH lockfile and gets the socks/laundry copy fix in one step.
- **Remove fabricated Kaggle usability scores and medals** (HIGH): delete `TIER_USABILITY`, `TIER_MEDAL` constants from `build_shmuggingface_site.py`. These are dead config today (the framework ignores them) but latent misinformation.
- **Remaining preview-generator cleanup** (HIGH): `build_shmuggingface_site.py` no longer exists on `main`; if a ShmuggingFaceCore build path is reintroduced, it must consume or validate against the canonical lint gate from SMF-PR5 rather than hard-code task, tags, license, splits, or schema.
Expand Down
10 changes: 9 additions & 1 deletion leadforge/validation/invariants.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@
SNAPSHOT_FILTERED_TABLES,
)

# Feature-dictionary rows that are intentionally present only in the student
# bundle and are exempt from the subset check in check_exposure_monotonicity.
# ``split`` documents the partition-label column that exists only in the flat
# ``lead_scoring.csv`` convenience export (student_public only); the
# instructor bundle's feature dictionary covers the raw Parquet columns.
STUDENT_ONLY_DICT_ROWS: frozenset[str] = frozenset({"split"})


def check_determinism(bundle_a: Path, bundle_b: Path) -> list[str]:
"""Compare two bundles that should be identical (same seed/config).
Expand Down Expand Up @@ -169,12 +176,13 @@ def check_exposure_monotonicity(student_bundle: Path, instructor_bundle: Path) -

# feature_dictionary.csv: student rows must be a subset of instructor rows
# (by ``name``). For names present in both, the metadata must agree.
#
s_dict = student_bundle / "feature_dictionary.csv"
i_dict = instructor_bundle / "feature_dictionary.csv"
if s_dict.exists() and i_dict.exists():
s_df = pd.read_csv(s_dict).set_index("name")
i_df = pd.read_csv(i_dict).set_index("name")
extra_in_student = set(s_df.index) - set(i_df.index)
extra_in_student = set(s_df.index) - set(i_df.index) - STUDENT_ONLY_DICT_ROWS
if extra_in_student:
errors.append(
"feature_dictionary.csv: student has rows missing from instructor: "
Expand Down
Loading
Loading