Release 2.0.3 · SearchSavior/OpenArc

What's Changed

added docker build system
add inference cancel for LLM/VLM. Cancel streaming mid flight by closing client connection, resetting KV cache to before that request. Before, canceling mid flight left inference running in a background thread until organic completion. in practice, this burns GPU cycles. Not anymore! Note: cancel does not work for non-streaming (stream=False). In stream=False case generation will continue until you reach max_tokens, OOM, or inference completes on its own for whatever reason.

No results found