Open benchmark suite for verifying Catalyst Brain SDK behavior from the public PyPI wheels.
The benchmark code is MIT licensed. The catalyst-brain SDK is distributed
through PyPI with a generous free tier. These tests use only the public SDK API,
do not require source access, and do not require signup, registration, or an API
key for normal local benchmark runs.
| Suite | What it checks |
|---|---|
| Install smoke | pip install catalyst-brain==1.3.3, package versions, HDC import, HoloSwarm API parity |
| Token discovery | Progressive tool discovery versus repeatedly sending full verbose schemas |
| Tool selection accuracy | Compact discovery still routes task intents to the expected tool |
| Deferred outputs | Code/tool outputs stay out of context until explicitly fetched |
| HDC primitives | Bind, unbind, bundle, resonance latency and throughput |
| Quantum attention heads | Quantum-inspired attention routing accuracy and latency against a classical reference |
| Bind/unbind correctness | Exact recovery through direct and chained HDC binding |
| HKVC scaling | Median and p95 query latency as stored entries increase |
| HKVC path breakdown | Exact indexed hits measured separately from missing-key fallback |
| HKVC recency probe | First/middle/last entry retrieval and latency checks |
| Rain state transfer | Binary/header size and round-trip checks for stateless handoff |
| Memory model | Explicit FP16 KV-cache model compared with fixed Catalyst Rain state |
| KV-cache comparison | FP16, TurboQuant, KIVI, PyramidKV, and Catalyst HKVC memory-model comparison |
python3 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
catalyst-brain-bench --mode quick --out results/localThe command writes:
results/local/latest.jsonresults/local/*.csvresults/local/README.mdresults/local/charts/*.svg
For a larger run:
catalyst-brain-bench --mode full --out results/local-fullSee results/README.md.
Charts from the latest checked-in run:
- Token savings
- Tool selection accuracy
- Deferred output savings
- HKVC query latency
- HKVC path latency
- HKVC recency latency
- HDC primitive latency
- Quantum attention latency
- Quantum attention accuracy
- Bind/unbind chain correctness
- Rain state transfer
- Memory model
- KV-cache comparison
Comparison assumptions and primary paper links are documented in docs/SOURCES.md. Claim coverage and next evidence targets are tracked in docs/EVIDENCE_MAP.md.
These benchmarks are meant to be reproducible, not rhetorical.
- The suite uses public APIs only.
- Results vary by hardware, Python version, and operating system.
- The memory chart is an explicit model, not process RSS. It compares a stated FP16 transformer KV-cache formula with the SDK's fixed world-vector size and measured compressed Rain header size.
- The KV-cache comparison chart models memory footprint from published compression or retention targets. It does not claim equal model quality, equal serving behavior, or source access to competitor systems.
- Catalyst Brain uses classical HDC and quantum-inspired algorithms. This suite does not claim physical quantum behavior.
- Quantum attention benchmarks compare implemented public-wheel routing against a pure-Python cosine softmax reference; they are not claims of quantum hardware acceleration.
- Token savings are byte/token estimates for agent context payloads, not LLM billing statements from a provider.
- HKVC exact-key hit latency and missing-key fallback latency are reported separately. Treat O(1) wording as applying to the indexed exact-key path.
- Tool-selection accuracy is measured on a deterministic public harness; it is not a claim of general semantic search quality on arbitrary tool catalogs.
The GitHub workflow installs catalyst-brain==1.3.3 from PyPI, runs the quick
benchmark, and uploads generated results as artifacts. It does not need SDK
source code or private credentials.
Benchmark code is MIT licensed. The Catalyst Brain SDK is governed by its own terms.
Install catalyst-brain from PyPI and run these benchmarks without signup,
registration, or an API key. Most users should not hit free-tier limits during
early benchmark reproduction. If your use case moves toward production,
enterprise evaluation, hosted benchmarking, customer pilots, or needs higher
quotas/support, contact hello@strategic-innovations.ai.