Adaptive speculative-decoding inference engine with Triton-optimised verification and online bandit draft selection.
machine-learning triton bandit-algorithms inference-optimization llm-inference speculative-decoding production-inference
-
Updated
Jun 15, 2026 - Python