Release Date: 2025-11-06
Status: ✅ Production Ready
License: MIT
This release introduces the Research Benchmark Module (RBM) skeleton to ProofCore. RBM standardises step-level dataset ingestion, computer-assisted proof checks, evaluation metrics, and reporting so we can integrate internal corpora and public benchmarks (e.g., IMO-Bench) in a repeatable way. It also kicks off a broader plan to externalise Pyodide assets for security and performance.
- Added
proofcore/research_benchmark/with loader/parser helpers, a cascade validator that leverages CA proof hooks, and the first metrics helpers (balanced_scores,omega_rbm). - Delivered
rbm_clito run end-to-end evaluations and emit JSON reports; bundleddata_examples/sample_set.jsonfor smoke testing. - Added Python regression suites in
backend/tests_rbm/covering the CA proof hooks, cascade pipeline, metrics, and CLI execution.
- Repository now ships with an empty
public/pyodide/pluspyodide-manifest.json; assets are fetched on demand. - Added
npm run setup:pyodide(fetch) andnpm run verify:offline-assets(manifest verification) to manage downloads safely. - Hash verification is supported via manifest entries and should be enabled for production deployments.
- Expand manifest generation to include Subresource Integrity hashes automatically.
- Integrate dependency scanning (npm audit, pip-audit, OSV) into CI for Pyodide bundles.
- Introduce Service Worker background caching to improve first-use latency.
- CHANGELOG, README, and release notes updated to describe RBM usage and Pyodide asset strategy.
- Version bumped to 1.0.3 in
pyproject.toml,package.json, andsetup.py.
npm install(updates lockfile to 1.0.3).python -m pytest backend/tests_rbm -q --no-cov(verifies RBM stack).npm run test -- tests/offline/offline_guarantee.test.ts(ensures offline hardening remains intact).
- Manifest entries currently ship without hashes; add SRI/hash values before production deployment.
- Remaining npm audit warnings require larger upgrades (Vite/Vitest majors).
Thanks to the ProofCore maintainers for laying the groundwork for research dataset integration and tightening our asset security posture.***