Skip to content

Add Hopper LLM provider#6132

Open
pavanyellow wants to merge 1 commit into
livekit:mainfrom
pavanyellow:feat/hopper-llm
Open

Add Hopper LLM provider#6132
pavanyellow wants to merge 1 commit into
livekit:mainfrom
pavanyellow:feat/hopper-llm

Conversation

@pavanyellow

@pavanyellow pavanyellow commented Jun 17, 2026

Copy link
Copy Markdown

Summary

Adds a LLM.with_hopper() factory to the OpenAI plugin, alongside the other
OpenAI-compatible providers (Cerebras, Together, Nebius, Telnyx, …).

Hopper serves open-source models optimized for low
time-to-first-token, aimed at voice agents. The API is OpenAI-compatible, so
this follows the existing with_* provider pattern — no new plugin, just a
factory method plus a HopperChatModels type.

Usage

from livekit.plugins import openai

llm = openai.LLM.with_hopper(
    # api_key defaults to HOPPER_API_KEY env var
    model="Qwen/Qwen3.6-35B-A3B",
)

Get an API key at https://withhopper.com.

Changes

  • with_hopper() classmethod on openai.LLM (defaults: base_url=https://api.withhopper.com/v1, HOPPER_API_KEY).
  • HopperChatModels literal in models.py.
  • _strict_tool_schema=False, matching the other open-model providers served on vLLM-style backends (same as Cerebras, use non-strict tool schema for cerebras llm #3134).

Latency

TTFT (time to first token) over a warm connection, voice-agent-shaped context
(~2k-token system prompt + short user turn), 10-run p50:

  • From us-west-2 (same region as the model server): p50 62ms (min 56, max 77)
  • From a residential laptop in SF: ~170ms — the difference is network round-trip

First-token latency is dominated by where your agent runs relative to the model,
not the serving itself; colocated, it's ~60ms.

Testing

Verified against the live endpoint through the plugin:

  • factory builds with the correct base URL and model
  • streaming chat() returns a valid completion
  • function calling emits a correct tool call (validates _strict_tool_schema=False)
  • ruff check passes on the changed files

@CLAassistant

CLAassistant commented Jun 17, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@pavanyellow pavanyellow changed the title feat(openai): add Hopper LLM provider (with_hopper) Add Hopper LLM provider Jun 17, 2026
Hopper serves open-source models optimized for low time-to-first-token,
aimed at voice agents. The API is OpenAI-compatible, so this adds a
LLM.with_hopper() factory in the livekit-plugins-openai package alongside
the other OpenAI-compatible providers (Cerebras, Together, Nebius, etc.),
plus a HopperChatModels type.
@pavanyellow pavanyellow marked this pull request as ready for review June 17, 2026 00:52
@pavanyellow pavanyellow requested a review from a team as a code owner June 17, 2026 00:52

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants