SMF-PR5: Add canonical platform metadata lint gate by shaypal5 · Pull Request #85 · leadforge-dev/leadforge

shaypal5 · 2026-05-25T21:14:12Z

Planning notation

SMF-PR5 / PR 8.4a: canonical platform metadata lint gate

Parent milestone: dataset: leadforge-lead-scoring-v1

Plan source:

/Users/shaypalachy/agents/environments/opensource/projects/shmuggingface/view_only_clones/ShmuggingFaceCore/docs/next_10_review_prs.md
/Users/shaypalachy/agents/handoffs/leadforge-v1-review/leadforge_shmuggingface_integration_issues.md
docs/external_review/summaries/v1_release_review_synthesis.md

What changed

Adds scripts/lint_platform_metadata.py, a canonical metadata diff/lint gate over the actual publication artifacts:

release/kaggle/dataset-metadata.json
release/huggingface/README.md

The lint fails on:

Kaggle isPrivate drifting away from false
Kaggle/HF license mismatches
missing HF tabular-classification task category
exact platform tag vocabulary drift for Kaggle keywords and HF tags
HF config/split declarations that do not exactly match the canonical train/validation/test layout
HF data files absent from Kaggle resources
task-split schema drift against the flat CSV schema minus split
metadata schema drift against actual CSV/parquet files when bundle files are materialized
missing root and per-tier agent-reviewable resources in the canonical Kaggle file list

The existing preview renderers already consume the canonical artifacts directly; this PR makes that contract explicit and CI-enforced. It also exposes --strict-files for release-readiness runs where missing tier CSV/parquet files should fail instead of being soft-skipped on fresh checkouts.

Why

HIGH-LF1 / HIGH-I1 said the preview path could miss platform metadata bugs such as privacy, tags, task, license, split, and schema mismatches. This adds a focused failing gate before preview/publish so those bugs are caught without relying on visual review.

Tests

python scripts/lint_platform_metadata.py
python scripts/sync_release_docs.py --check && python scripts/build_release_metrics.py --check && python scripts/build_claims_register.py --check && python scripts/verify_claims_register.py && python scripts/lint_platform_metadata.py
python -m pytest tests/scripts/test_lint_platform_metadata.py -q
python -m pytest tests/scripts/test_lint_platform_metadata.py tests/scripts/test_preview_kaggle_page.py tests/scripts/test_preview_hf_page.py -q
ruff check scripts/lint_platform_metadata.py tests/scripts/test_lint_platform_metadata.py
ruff format --check scripts/lint_platform_metadata.py tests/scripts/test_lint_platform_metadata.py
python -m mypy scripts/lint_platform_metadata.py

Note: plain pytest on this machine resolves to /Library/Frameworks/Python.framework/... and lacks repo deps; validation used python -m pytest from the pyenv Python where pandas/pyarrow are installed.

Follow-up

Remaining preview hardening items in .agent-plan.md stay scoped to later work: dependency pin cleanup, any reintroduced ShmuggingFaceCore builder path, deploy defaults, and broader link-rewrite cleanup.

Copilot

Pull request overview

Adds a CI-enforced lint gate that diffs/validates the canonical publication metadata artifacts for Kaggle and Hugging Face, aiming to prevent preview/publish drift (privacy, license, tags/tasks, split paths, and required resource coverage) from landing unnoticed.

Changes:

Introduce scripts/lint_platform_metadata.py to lint Kaggle dataset-metadata.json vs HF README.md frontmatter and enforce a canonical contract.
Add focused unit tests covering expected pass/fail cases and asserting committed release artifacts pass the lint.
Wire the lint script into the release-artifacts-sync GitHub Actions job.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`scripts/lint_platform_metadata.py`	Implements the canonical Kaggle/HF metadata lint checks and CLI entrypoint.
`tests/scripts/test_lint_platform_metadata.py`	Adds unit tests for the lint gate plus a “committed artifacts pass” test.
`.github/workflows/ci.yml`	Runs the new metadata lint as part of the release artifact sync job.
`.agent-plan.md`	Marks SMF-PR5 planned work item as completed and documents the gate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    parser.add_argument(
+        "--tier",
+        action="append",
+        dest="tiers",
+        default=None,
+        help="tier/config to validate (repeatable; default: intro/intermediate/advanced)",
+    )


github-actions · 2026-05-25T21:27:08Z

pr-agent-context report:

This run includes an unresolved review comment on PR #85 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: scripts/lint_platform_metadata.py:648
URL: https://github.com/leadforge-dev/leadforge/pull/85#discussion_r3299980392
Root author: copilot-pull-request-reviewer

Comment:
    The `--tier` CLI flag is described as selecting which tier/configs to validate, but `_lint_hf_configs()` currently requires the HF frontmatter `configs` list to match `tiers` exactly (same order, no extras). Passing `--tier intro` will fail against the canonical README (which includes 3 configs), so the flag can’t be used as advertised. Either (a) implement subset validation by filtering `configs` to the requested tiers and only checking those, or (b) change/remove the flag/help text to reflect that the README is expected to contain exactly the specified configs.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 26420400486 attempt 1
Comment timestamp: 2026-05-25T21:26:20.445392+00:00
PR head commit: e5e907fea17609c884da8a9d031034050c8ac2b9

Copilot AI review requested due to automatic review settings May 25, 2026 21:14

Copilot started reviewing on behalf of shaypal5 May 25, 2026 21:14 View session

shaypal5 added this to the dataset: leadforge-lead-scoring-v1 milestone May 25, 2026

shaypal5 added the status: needs review Ready for review label May 25, 2026

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread scripts/lint_platform_metadata.py

Comment on lines +469 to +475

parser.add_argument(

"--tier",

action="append",

dest="tiers",

default=None,

help="tier/config to validate (repeatable; default: intro/intermediate/advanced)",

)

test(scripts): add canonical platform metadata lint

e5e907f

shaypal5 force-pushed the codex/pr5-canonical-metadata branch from 2035975 to e5e907f Compare May 25, 2026 21:26

shaypal5 merged commit 3c44d13 into main May 26, 2026
9 of 10 checks passed

shaypal5 deleted the codex/pr5-canonical-metadata branch May 26, 2026 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMF-PR5: Add canonical platform metadata lint gate#85

SMF-PR5: Add canonical platform metadata lint gate#85
shaypal5 merged 1 commit into
mainfrom
codex/pr5-canonical-metadata

shaypal5 commented May 25, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Planning notation

What changed

Why

Tests

Follow-up

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 25, 2026 •

edited

Loading