SMF-PR5: Add canonical platform metadata lint gate#85
Merged
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds a CI-enforced lint gate that diffs/validates the canonical publication metadata artifacts for Kaggle and Hugging Face, aiming to prevent preview/publish drift (privacy, license, tags/tasks, split paths, and required resource coverage) from landing unnoticed.
Changes:
- Introduce
scripts/lint_platform_metadata.pyto lint Kaggledataset-metadata.jsonvs HFREADME.mdfrontmatter and enforce a canonical contract. - Add focused unit tests covering expected pass/fail cases and asserting committed release artifacts pass the lint.
- Wire the lint script into the
release-artifacts-syncGitHub Actions job.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
scripts/lint_platform_metadata.py |
Implements the canonical Kaggle/HF metadata lint checks and CLI entrypoint. |
tests/scripts/test_lint_platform_metadata.py |
Adds unit tests for the lint gate plus a “committed artifacts pass” test. |
.github/workflows/ci.yml |
Runs the new metadata lint as part of the release artifact sync job. |
.agent-plan.md |
Marks SMF-PR5 planned work item as completed and documents the gate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+469
to
+475
| parser.add_argument( | ||
| "--tier", | ||
| action="append", | ||
| dest="tiers", | ||
| default=None, | ||
| help="tier/config to validate (repeatable; default: intro/intermediate/advanced)", | ||
| ) |
2035975 to
e5e907f
Compare
|
pr-agent-context report: This run includes an unresolved review comment on PR #85 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: scripts/lint_platform_metadata.py:648
URL: https://github.com/leadforge-dev/leadforge/pull/85#discussion_r3299980392
Root author: copilot-pull-request-reviewer
Comment:
The `--tier` CLI flag is described as selecting which tier/configs to validate, but `_lint_hf_configs()` currently requires the HF frontmatter `configs` list to match `tiers` exactly (same order, no extras). Passing `--tier intro` will fail against the canonical README (which includes 3 configs), so the flag can’t be used as advertised. Either (a) implement subset validation by filtering `configs` to the requested tiers and only checking those, or (b) change/remove the flag/help text to reflect that the README is expected to contain exactly the specified configs.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Planning notation
SMF-PR5 / PR 8.4a: canonical platform metadata lint gate
Parent milestone:
dataset: leadforge-lead-scoring-v1Plan source:
/Users/shaypalachy/agents/environments/opensource/projects/shmuggingface/view_only_clones/ShmuggingFaceCore/docs/next_10_review_prs.md/Users/shaypalachy/agents/handoffs/leadforge-v1-review/leadforge_shmuggingface_integration_issues.mddocs/external_review/summaries/v1_release_review_synthesis.mdWhat changed
Adds
scripts/lint_platform_metadata.py, a canonical metadata diff/lint gate over the actual publication artifacts:release/kaggle/dataset-metadata.jsonrelease/huggingface/README.mdThe lint fails on:
isPrivatedrifting away fromfalsetabular-classificationtask categorysplitThe existing preview renderers already consume the canonical artifacts directly; this PR makes that contract explicit and CI-enforced. It also exposes
--strict-filesfor release-readiness runs where missing tier CSV/parquet files should fail instead of being soft-skipped on fresh checkouts.Why
HIGH-LF1 / HIGH-I1 said the preview path could miss platform metadata bugs such as privacy, tags, task, license, split, and schema mismatches. This adds a focused failing gate before preview/publish so those bugs are caught without relying on visual review.
Tests
python scripts/lint_platform_metadata.pypython scripts/sync_release_docs.py --check && python scripts/build_release_metrics.py --check && python scripts/build_claims_register.py --check && python scripts/verify_claims_register.py && python scripts/lint_platform_metadata.pypython -m pytest tests/scripts/test_lint_platform_metadata.py -qpython -m pytest tests/scripts/test_lint_platform_metadata.py tests/scripts/test_preview_kaggle_page.py tests/scripts/test_preview_hf_page.py -qruff check scripts/lint_platform_metadata.py tests/scripts/test_lint_platform_metadata.pyruff format --check scripts/lint_platform_metadata.py tests/scripts/test_lint_platform_metadata.pypython -m mypy scripts/lint_platform_metadata.pyNote: plain
pyteston this machine resolves to/Library/Frameworks/Python.framework/...and lacks repo deps; validation usedpython -m pytestfrom the pyenv Python where pandas/pyarrow are installed.Follow-up
Remaining preview hardening items in
.agent-plan.mdstay scoped to later work: dependency pin cleanup, any reintroduced ShmuggingFaceCore builder path, deploy defaults, and broader link-rewrite cleanup.