test: verify SHA-256 hash determinism for public release#55
Merged
Conversation
Runs scripts/build_public_release.py twice into temp directories and asserts every generated file hashes identically across runs (modulo manifest.json's wall-clock generation_timestamp, which is stripped before comparison). Enforces the "deterministic given (recipe, config, seed, version)" architectural invariant on the bundle layer. Exits 0 on PASS, 1 on FAIL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Verified via scripts/verify_hash_determinism.py: 73/73 files in the release bundle hash identically across two consecutive builds with the same seed/config (manifest.json compared after stripping its wall-clock generation_timestamp). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds a release-validation script that enforces the repo’s “public release build is deterministic given (recipe, config, seed, version)” invariant by rebuilding the public release twice and comparing SHA-256 hashes for all generated files (with a targeted exception for manifest.json’s wall-clock generation_timestamp).
Changes:
- Add
scripts/verify_hash_determinism.pyto runbuild_public_release.pytwice into temp dirs and compare per-file hashes (special-casingmanifest.jsonby strippinggeneration_timestampbefore comparing payloads). - Update
.agent-plan.mdto mark the determinism verification step as completed and document the local verification result.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| scripts/verify_hash_determinism.py | New verification script to confirm byte-level determinism of the public release artifacts (except manifest.json timestamp field). |
| .agent-plan.md | Marks the determinism verification checklist item as completed and records the observed result. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+76
to
+80
| def compare(run_a: Path, run_b: Path) -> list[str]: | ||
| """Return a list of human-readable mismatch messages (empty == identical).""" | ||
| tree_a = hash_tree(run_a) | ||
| tree_b = hash_tree(run_b) | ||
|
|
Issues from review of PR #55, applied here: 1. Reuse existing infrastructure. The original script reimplemented tree-walk + hash compare locally despite leadforge.validation.invariants already exporting check_determinism. Extracted compare_bundle_trees() into the same module as a public, full-tree check (the existing check_determinism only inspects a hardcoded 3-file list). 2. Drop manifest-stripping hack in favour of timestamp pinning. WorldBundle.save() already accepts generation_timestamp=; build_public_release now exposes it as --generation-timestamp. The verifier pins it to the unix epoch on both runs, so manifest.json is byte-identical too — no special-casing required at compare time. compare_bundle_trees keeps a defence-in-depth fallback that strips NON_DETERMINISTIC_MANIFEST_FIELDS and re-dumps with sort_keys=True (catches accidental key reordering). 3. Single source of truth for non-deterministic fields. New constant NON_DETERMINISTIC_MANIFEST_FIELDS in leadforge/render/manifests.py; consumed by the invariants module. No more duplicated string literal. 4. Preserve artifacts on failure. Verifier now writes to release/_determinism/ (gitignored), wipes at start, cleans up only on PASS (unless --keep-on-success). On FAIL the dirs stay so the dev can diff the offending files. 5. Better failure diagnostics. compare_bundle_trees() reports byte-size delta on hash mismatches; manifest mismatches list which fields were stripped before comparison. 6. Self-tested. New TestCompareBundleTrees suite (8 tests) covers identical, only-in-A, only-in-B, hash mismatch, manifest timestamp-only diff, manifest real diff, manifest key reorder, and the nested- manifest.json edge case (only the top-level manifest is special-cased). 7. argparse on both scripts (--out, --keep-on-success on verifier; --generation-timestamp on build_public_release). Verifier still runs subprocess (intentional — the script's job is to test the build script end-to-end). The fast in-process determinism check that runs in CI on every PR continues to live in tests/validation/test_invariants.py::TestDeterminism. Result: PASS — 73/73 files identical across two pinned-timestamp runs; all 876 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pr-agent-context report: This run includes an unresolved review comment on PR #55 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: scripts/verify_hash_determinism.py
URL: https://github.com/leadforge-dev/leadforge/pull/55#discussion_r3180123178
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
This script adds comparison logic (tree hashing plus special-casing `manifest.json`), but there are no tests exercising `compare()`/`manifest_payload_without_timestamp()`. Consider adding a small unit test (e.g., two temp dirs with `manifest.json` differing only by `generation_timestamp`) to prevent regressions in the determinism check itself.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/verify_hash_determinism.py— runsscripts/build_public_release.pytwice into temp directories with the same seed/config and asserts every generated file hashes identically across runs.How
manifest.jsonis handledbuild_manifest()stampsgeneration_timestampwithdatetime.now(UTC), somanifest.jsonbytes legitimately differ between runs. The script strips that one field and compares the remaining payload (which already carries per-file SHA-256 digests for the relational and task Parquet files). Every other file is compared byte-for-byte.Verification result
Ran locally on
main+ this branch:All four bundles (
intro,intermediate,advanced,intermediate_instructor) plusLICENSEare byte-identical across runs.Test plan
python scripts/verify_hash_determinism.pyexits 0 and prints PASSruff check scripts/verify_hash_determinism.pycleanruff format --check scripts/verify_hash_determinism.pyclean🤖 Generated with Claude Code