Skip to content

feat: add --llama-server-port for a fixed llama-server runtime port#499

Merged
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:feat/metal-agent-llama-server-port
May 20, 2026
Merged

feat: add --llama-server-port for a fixed llama-server runtime port#499
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:feat/metal-agent-llama-server-port

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 19, 2026

What

Add a --llama-server-port flag to the metal-agent so the spawned llama-server can bind a fixed port instead of an ephemeral one.

Why

Refs #406.

Per #406: the metal-agent allocates a dynamic port per spawned child, so host-side clients (an agentic coding tool, a quick curl, anything using an OpenAI SDK against localhost) have no stable target. They have to be re-pointed every time the agent respawns the process. The mlx-server runtime already has --mlx-server-port for exactly this reason; this brings the llama-server runtime to parity.

Scope note: this PR addresses the llama-server portion of #406. The vllm-swift runtime and absorbing vllm-swift-proxy.py into the metal-agent are still TBD; I'd suggest tracking those as follow-ups so #406 isn't closed by this single change.

How

  • New --llama-server-port int CLI flag on the metal-agent. Default 0 keeps the historical ephemeral-port behavior; a non-zero value pins the spawned llama-server to that port.
  • MetalAgentConfig.LlamaServerPort is wired through to MetalExecutor.SetPort(int). SetPort clamps negative values back to 0.
  • MetalExecutor.StartProcess uses the fixed port when non-zero; otherwise falls back to allocatePort() exactly as before.
  • waitForHealthy polls whichever port was resolved, so a fixed port works end to end with no further change.
  • Test TestMetalExecutorSetPort covers the default-zero, set-to-8080, and negative-clamped-to-zero paths.

Non-breaking: every change is additive. With the flag unset, behavior is byte-identical to pre-patch. No public-API signature changes.

Checklist

  • Tests added/updated (TestMetalExecutorSetPort in pkg/agent/executor_test.go)
  • `make test` passes locally (go test ./pkg/agent/... -> ok)
  • `make lint` passes locally
  • Commit messages follow conventional commits (feat:)
  • All commits are signed off (git commit -s) per DCO
  • Documentation updated — n/a, internal operator flag

The llama-server runtime allocated an ephemeral port for every spawned
process, so a native OpenAI-compatible client (e.g. an agentic coding
tool pointed at localhost) had no stable endpoint and had to be
re-pointed whenever the agent respawned the process.

Add a --llama-server-port flag, mirroring the existing --mlx-server-port
for the mlx-server runtime. Zero (the default) keeps the historical
ephemeral-port behavior; a non-zero value fixes the port the spawned
llama-server binds, which the health check then polls consistently.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/agent/executor.go 40.00% 6 Missing ⚠️
cmd/metal-agent/main.go 0.00% 5 Missing ⚠️
pkg/agent/agent.go 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Defilan Defilan merged commit cc30b0d into defilantech:main May 20, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant