Skip to content

Add metadata fixture UTF-8 safety tests#121

Merged
punk6529 merged 4 commits into
mainfrom
codex/metadata-fixture-utf8-attributes
Jun 11, 2026
Merged

Add metadata fixture UTF-8 safety tests#121
punk6529 merged 4 commits into
mainfrom
codex/metadata-fixture-utf8-attributes

Conversation

@punk6529

@punk6529 punk6529 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Closes #119.
Closes #120.

Summary

  • add metadata fixture regressions for invalid UTF-8 JSON and animation HTML data URI payloads
  • add semantic attribute-shape regressions for missing keys, unexpected keys, and non-string values in committed fixtures
  • pin release-artifact, JavaScript, and Python text files to LF line endings so dependency artifact hashes stay deterministic across Windows and Linux checkouts
  • refresh release manifests/checksums and update metadata docs, status, roadmap, changelog, and autonomous run state

Notes

  • No Solidity source or bytecode changes are included.
  • Production raw-attribute schema enforcement, production invalid UTF-8 enforcement, and full browser execution sandboxing remain follow-up work.
  • StreamCore remains at 24,135 runtime bytes with 441 bytes of EIP-170 headroom.

Validation

  • python scripts\test_metadata_fixtures.py
  • python scripts\check_metadata_fixtures.py
  • make metadata-fixtures-check
  • python scripts\generate_dependency_artifact_manifest.py --check
  • python scripts\test_dependency_artifact_manifest.py
  • python -m py_compile scripts\check_metadata_fixtures.py scripts\test_metadata_fixtures.py scripts\generate_dependency_artifact_manifest.py scripts\test_dependency_artifact_manifest.py
  • make release-checksums
  • make release-manifest-check
  • make release-checksums-check
  • python scripts\check_changelog.py
  • make check
  • powershell -ExecutionPolicy Bypass -File scripts\check.ps1
  • bash -n scripts/check.sh
  • git diff --check

Summary by CodeRabbit

  • New Features

    • Metadata validation now rejects invalid UTF-8 in metadata and animation data URIs.
    • Fixture-level semantic attribute shape validation added.
  • Documentation

    • Clarified fixture-level vs. production enforcement and updated metadata hardening guidance.
    • Expanded testing documentation for metadata fixture validation.
  • Tests

    • Added negative tests covering invalid UTF-8 data-URI fixtures.
  • Chores

    • Pinned LF line endings for release text artifacts; updated manifests and checksums.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 753af421-c1ca-4ee2-b61d-824dd5209f5d

📥 Commits

Reviewing files that changed from the base of the PR and between 7b09248 and ea90b91.

📒 Files selected for processing (4)
  • ops/AUTONOMOUS_RUN.md
  • release-artifacts/latest/SHA256SUMS
  • release-artifacts/latest/release-checksums.json
  • release-artifacts/latest/release-manifest.json
✅ Files skipped from review due to trivial changes (4)
  • release-artifacts/latest/release-checksums.json
  • release-artifacts/latest/SHA256SUMS
  • ops/AUTONOMOUS_RUN.md
  • release-artifacts/latest/release-manifest.json

📝 Walkthrough

Walkthrough

Adds fixture-level negative tests for invalid UTF-8 in metadata/animation data URIs, enforces LF line endings for release artifacts/JS/Python via .gitattributes, updates docs to separate fixture-level checks from remaining production enforcement, and refreshes release manifest checksums and ops/roadmap entries.

Changes

Metadata Fixture Safety & Release Artifact Consistency

Layer / File(s) Summary
Release artifact line-ending policy and documentation
.gitattributes, release-artifacts/README.md, CHANGELOG.md
Enforce LF line endings for release-artifacts/**, *.js, and *.py in .gitattributes; document byte-for-byte dependency artifact hashing and record LF pinning in the changelog.
Metadata fixture test enhancement—infrastructure and test cases
scripts/test_metadata_fixtures.py
Add encode_raw_data_uri() helper to produce base64 data URIs from raw bytes, extend write_fixture_set() to accept attributes and malformed_animation_uri, and add negative tests rejecting invalid UTF-8 in metadata and animation data URIs.
Metadata specification and safety documentation
docs/metadata.md, docs/status.md
Document newly present fixture-level checks (strict UTF-8 decoding, semantic trait_type/value attribute shape, JSON/data-URI/HTML boundary checks, wrapper/script-boundary assertions) and explicitly separate fixture-level coverage from remaining production enforcement items.
Test documentation and coverage mapping
test/README.md
Enumerate fixture validation stages (JSON parsing, invalid UTF-8 rejection, semantic attribute-shape validation, URI policy enforcement, animation wrapper script constraints) and clarify remaining sandboxing/production enforcement work.
Roadmap and ops tracking
ops/ROADMAP.md, ops/AUTONOMOUS_RUN.md
Mark Queue Item 62 merged, activate Queue Item 63/PR #121 with UTF-8/attribute fixture and LF pinning goals, update timestamps and decision-log entries.
Release artifact manifest checksum updates
release-artifacts/latest/SHA256SUMS, release-artifacts/latest/release-checksums.json, release-artifacts/latest/release-manifest.json
Replace SHA256 and size metadata for CHANGELOG.md and docs/status.md entries and update checksum files to match the modified artifacts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • 6529-Collections/6529Stream#7: Also updated .gitattributes to enforce LF normalization for release-text artifacts/metadata-related files.
  • 6529-Collections/6529Stream#112: Prior PR introducing metadata fixture safety checks and test harness that this change extends with invalid-UTF-8 and semantic-attribute regression tests.

Poem

🐰 I nibble bytes and check the glue,
If UTF-8 breaks, I’ll say adieu.
LF lines neat across the plain,
Manifests match again and again.
Fixtures guarded — tests hop true.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add metadata fixture UTF-8 safety tests' is partially related to the changeset. While it accurately describes the main coding work (fixture UTF-8 tests), it omits the secondary but significant objective of pinning line endings for deterministic hashes (#120), which represents a meaningful portion of the changes.
Linked Issues check ✅ Passed All primary coding objectives from both linked issues are met: UTF-8 and semantic attribute fixture tests added [#119], semantic attribute validation extended [#119], .gitattributes pinned to LF [#120], dependency artifact hashes updated [#120], and documentation updated [#119].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue objectives. Metadata fixture enhancements, line-ending normalization, documentation updates, release-artifact regeneration, and autonomous run state tracking are all within scope of #119 and #120.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/metadata-fixture-utf8-attributes

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
release-artifacts/latest/release-manifest.json (1)

232-257: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Regenerate and recommit release-manifest.json to clear manifest drift

CI already shows this file is stale: python3 scripts/generate_release_manifest.py --check reports changed release-artifacts/latest/release-manifest.json. Until that is fixed, Line 234–235 and Line 255–256 cannot be treated as authoritative for release integrity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@release-artifacts/latest/release-manifest.json` around lines 232 - 257, Run
the release manifest generator and commit the updated JSON to eliminate drift:
execute the same command used by CI (python3
scripts/generate_release_manifest.py --check) or run python3
scripts/generate_release_manifest.py to regenerate
release-artifacts/latest/release-manifest.json, verify the output, and recommit
the updated release-manifest.json so the sha256/size entries are consistent with
the current files.

Source: Pipeline failures

scripts/test_metadata_fixtures.py (1)

63-65: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Respect explicit malformed URI inputs.

Using or here silently turns an explicit empty-string malformed URI into a valid generated fixture. Switch to an explicit is not None check so the helper can faithfully build intentionally bad fixtures.

🔧 Proposed fix
-    pending_uri = malformed_pending_uri or encode_data_uri(
-        checker.JSON_DATA_URI_PREFIX, json.dumps(pending_metadata, separators=(",", ":"))
-    )
+    pending_uri = (
+        malformed_pending_uri
+        if malformed_pending_uri is not None
+        else encode_data_uri(
+            checker.JSON_DATA_URI_PREFIX, json.dumps(pending_metadata, separators=(",", ":"))
+        )
+    )
@@
-    final_metadata["animation_url"] = malformed_animation_uri or encode_data_uri(
-        checker.HTML_DATA_URI_PREFIX, animation_html
-    )
+    final_metadata["animation_url"] = (
+        malformed_animation_uri
+        if malformed_animation_uri is not None
+        else encode_data_uri(checker.HTML_DATA_URI_PREFIX, animation_html)
+    )

Also applies to: 78-80

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_metadata_fixtures.py` around lines 63 - 65, The current
construction of pending_uri uses "malformed_pending_uri or encode_data_uri(...)"
which treats an explicit empty-string malformed_pending_uri as falsy and
overrides the intent; change this to an explicit None check: set pending_uri to
malformed_pending_uri if malformed_pending_uri is not None, otherwise call
encode_data_uri with checker.JSON_DATA_URI_PREFIX and
json.dumps(pending_metadata, separators=(",", ":")); make the same change for
the corresponding final_uri block (the similar code at the later section around
lines 78-80) so an explicit "" malformed URI is preserved rather than replaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ops/ROADMAP.md`:
- Around line 1635-1637: Update the remaining-work bullet that currently reads
"Complete the remaining render-safety work for full browser execution
sandboxing, production invalid UTF-8 enforcement, and production structured
attributes or semantic attribute schema validation" to explicitly call out
"production raw-attribute enforcement" (e.g., append or insert "production
raw-attribute enforcement" alongside the structured/semantic attribute
validation wording) so the roadmap/test matrix clearly distinguishes
fixture-level coverage from remaining production enforcement work.

---

Outside diff comments:
In `@release-artifacts/latest/release-manifest.json`:
- Around line 232-257: Run the release manifest generator and commit the updated
JSON to eliminate drift: execute the same command used by CI (python3
scripts/generate_release_manifest.py --check) or run python3
scripts/generate_release_manifest.py to regenerate
release-artifacts/latest/release-manifest.json, verify the output, and recommit
the updated release-manifest.json so the sha256/size entries are consistent with
the current files.

In `@scripts/test_metadata_fixtures.py`:
- Around line 63-65: The current construction of pending_uri uses
"malformed_pending_uri or encode_data_uri(...)" which treats an explicit
empty-string malformed_pending_uri as falsy and overrides the intent; change
this to an explicit None check: set pending_uri to malformed_pending_uri if
malformed_pending_uri is not None, otherwise call encode_data_uri with
checker.JSON_DATA_URI_PREFIX and json.dumps(pending_metadata, separators=(",",
":")); make the same change for the corresponding final_uri block (the similar
code at the later section around lines 78-80) so an explicit "" malformed URI is
preserved rather than replaced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e46a2a01-1356-48fb-86b1-660dbd74f5ce

📥 Commits

Reviewing files that changed from the base of the PR and between 97ea7ae and 7b09248.

📒 Files selected for processing (12)
  • .gitattributes
  • CHANGELOG.md
  • docs/metadata.md
  • docs/status.md
  • ops/AUTONOMOUS_RUN.md
  • ops/ROADMAP.md
  • release-artifacts/README.md
  • release-artifacts/latest/SHA256SUMS
  • release-artifacts/latest/release-checksums.json
  • release-artifacts/latest/release-manifest.json
  • scripts/test_metadata_fixtures.py
  • test/README.md

Comment thread ops/ROADMAP.md Outdated

Copy link
Copy Markdown
Contributor Author

CI follow-up pushed in ea90b91: the post-open ops/AUTONOMOUS_RUN.md state commit changed a release-manifest-covered input, so I updated the state first and regenerated release-manifest.json, SHA256SUMS, and release-checksums.json. Local make release-manifest-check, make release-checksums-check, and git diff --check pass.

Copy link
Copy Markdown
Contributor Author

CodeRabbit follow-up pushed in 98094a5:

  • changed malformed metadata/animation fixture URI helpers to use explicit is not None checks
  • updated the roadmap/test-matrix wording to call out remaining production raw-attribute enforcement separately
  • reran python scripts\test_metadata_fixtures.py, python scripts\check_metadata_fixtures.py, python -m py_compile scripts\check_metadata_fixtures.py scripts\test_metadata_fixtures.py, make metadata-fixtures-check, make release-checksums, make release-manifest-check, make release-checksums-check, and git diff --check

The prior release-manifest drift finding was fixed in ea90b91 and remained clean in the follow-up checks.

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

Please review the latest head after the follow-up fixes for the prior comments.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get current state of the PR diff for the follow-up commits
gh pr view 121 --repo 6529-Collections/6529Stream --json headRefName,headRefOid,commits | jq '{branch: .headRefName, head: .headRefOid, lastCommits: .commits[-5:]}'

# Get the latest diff
gh pr diff 121 --repo 6529-Collections/6529Stream | head -500

Length of output: 39372


🏁 Script executed:

#!/bin/bash
# Read the key files changed in the follow-up
cat scripts/test_metadata_fixtures.py

Length of output: 8977


🏁 Script executed:

#!/bin/bash
# Check roadmap changes
cat ops/ROADMAP.md

# Check autonomous run state
tail -80 ops/AUTONOMOUS_RUN.md

Length of output: 161873


🏁 Script executed:

#!/bin/bash
# Verify release manifest consistency
cat release-artifacts/latest/release-manifest.json | python3 -c "import json,sys; data=json.load(sys.stdin); print(json.dumps(data, indent=2))" | head -100

# Check SHA256SUMS
cat release-artifacts/latest/SHA256SUMS

# Verify checksums file
cat release-artifacts/latest/release-checksums.json | python3 -c "import json,sys; data=json.load(sys.stdin); print(json.dumps(data, indent=2))"

Length of output: 12441


🏁 Script executed:

#!/bin/bash
# Check .gitattributes
cat .gitattributes

# Check docs changes
cat docs/metadata.md | head -100

# Check test README
cat test/README.md

Length of output: 20851



@punk6529 The latest head (98094a5) looks clean. Here's a summary of what I verified:

Follow-up fixes confirmed (98094a5)

  • write_fixture_set now uses explicit is not None guards for both malformed_pending_uri and malformed_animation_uri, preventing any falsy-string bypass.
  • encode_raw_data_uri correctly accepts bytes and skips UTF-8 encoding, ensuring \xff payloads reach the checker unmodified.
  • All five new negative tests (invalid_utf8_metadata_data_uri, invalid_utf8_animation_data_uri, attribute_missing_key, attribute_unexpected_key, non_string_value) are structurally correct and their assertRaisesRegex patterns match the error messages they target.

Release manifest chain verified (ea90b91)

  • SHA256SUMSrelease-manifest.json: 960816ef9818ff03...
  • release-checksums.json files[] entry for release-manifest.json: sha256:960816ef9818ff03...
  • release-checksums.json text_checksum_file.sha256 for SHA256SUMS: sha256:27b5970ebe836880...
  • All hashes are internally self-consistent.

Docs and roadmap

  • docs/metadata.md, docs/status.md, ops/ROADMAP.md, test/README.md, and CHANGELOG.md all cleanly separate fixture-level coverage from the remaining production raw-attribute, invalid-UTF-8, and browser-sandbox work.

No actionable findings on the current head. The PR is good to merge.

[approve]

@punk6529 punk6529 merged commit 0b5a63d into main Jun 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant