A reference simulator + benchmark harness for evaluating HBM residency-control policies and fragmentation behavior. It is intentionally hardware-agnostic: no proprietary hooks required.
This repo models:
- probabilistic reuse forecasts with confidence bounds
- confidence-gated hysteresis (LB admission, UB eviction)
- a thrash budget ledger with deterministic fallback
- a simple contiguous HBM allocator + fragmentation metrics
- optional compaction only inside safe windows
- trace-driven “LLM-style” workload mocks
What it is not: a device microcode implementation. It’s a policy simulator for reproducible evaluation.
Patent: Patent application filed (India). Patent pending.
This repository is provided for research and evaluation purposes only. It is not production software and is provided “AS IS” without warranties of any kind (including fitness for a particular purpose, accuracy, or non-infringement).
No endorsement or affiliation with any HBM/GPU vendor is implied.
This software license does not grant rights to any patents.
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe .\bench.pypython -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
# Compare confidence vs LRU under demand paging
.\.venv\Scripts\python.exe .\run_sim.py --trace .\traces\llm_kvcache_growth.jsonl --policy confidence --miss-mode demand
.\.venv\Scripts\python.exe .\run_sim.py --trace .\traces\llm_kvcache_growth.jsonl --policy lru --miss-mode demandpython3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python run_sim.py --trace traces/llm_kvcache_growth.jsonl --policy confidence --miss-mode demand
python run_sim.py --trace traces/llm_kvcache_growth.jsonl --policy lru --miss-mode demandRuns the canonical scenarios and prints a compact comparison table:
python bench.pyOn Windows PowerShell (without activation):
.\.venv\Scripts\python.exe .\bench.pyThe simulator prints:
- Faults (serving from “system memory” / non-resident access)
- Migrations / Bytes moved (HBM fill/relocation proxy)
- Fallback epochs (budgets exceeded → discretionary actions blocked)
- Fragmentation metrics
LFE: largest free extent (contiguous space)holes: number of free extentsexternal_frag:1 - LFE/total_freeentropy: entropy of free-extent sizes (higher = more scattered)
A typical desired pattern:
- confidence policy: fewer fallback epochs, fewer blocked actions, better fragmentation than LRU demand paging.
--policy confidence— confidence-gated LB admission / UB eviction--policy lru— baseline LRU (demand paging only in demand mode)
--miss-mode serve— model misses as faults without forced admission--miss-mode demand— demand-load admission behavior
By default, confidence policy in demand mode only demand-loads after budgets are exceeded (deterministic fallback path). This preserves the confidence gate as the primary admission criterion.
policy/confidence_gated.py— LB admission / UB eviction + compaction triggersbaselines.py— LRU baseline and simple greedy stub
control/safety_gate.py— thrash budgets + fallbackscheduler.py— safe-window gating
memory/allocator.py— contiguous allocator + compaction primitivefragmentation.py— LFE/external frag/entropy metrics
viz/ascii_map.py— ASCII HBM map for quick inspection
traces/schema.json— JSONL trace schemallm_kvcache_growth.jsonl— KV-cache growth mockmoe_expert_swap.jsonl— MoE swap mockfragmentation_stressor.jsonl— designed to induce fragmentation
Each line is a JSON event:
t: integer timestamp (arbitrary units)event:alloc|free|touch|safe_windowid: object id (alloc/free/touch)size: object size (alloc)
Optional for touch:
mu,sigma: reuse forecast parameters (mean/std-dev, 0..1)phase: label string
See: traces/schema.json and docs/TRACE_FORMAT.md.
docs/TRACE_FORMAT.md— trace events + examplesdocs/INTERPRETING_OUTPUT.md— how to read the summary and compare policiesdocs/EXTENDING.md— adding traces, policies, and visualizations
Generate a simple HBM occupancy heatmap from a trace:
python -m tools.visualize_fragmentation --trace .\traces\fragmentation_stressor.jsonl --out .\docs\img\fragmentation_demo.pngMIT (see LICENSE).
