feat: add --llama-server-port for a fixed llama-server runtime port#499
Merged
Defilan merged 1 commit intoMay 20, 2026
Merged
Conversation
The llama-server runtime allocated an ephemeral port for every spawned process, so a native OpenAI-compatible client (e.g. an agentic coding tool pointed at localhost) had no stable endpoint and had to be re-pointed whenever the agent respawned the process. Add a --llama-server-port flag, mirroring the existing --mlx-server-port for the mlx-server runtime. Zero (the default) keeps the historical ephemeral-port behavior; a non-zero value fixes the port the spawned llama-server binds, which the health check then polls consistently. Signed-off-by: Christopher Maher <chris@mahercode.io>
8 tasks
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add a
--llama-server-portflag to the metal-agent so the spawnedllama-servercan bind a fixed port instead of an ephemeral one.Why
Refs #406.
Per #406: the metal-agent allocates a dynamic port per spawned child, so host-side clients (an agentic coding tool, a quick
curl, anything using an OpenAI SDK againstlocalhost) have no stable target. They have to be re-pointed every time the agent respawns the process. The mlx-server runtime already has--mlx-server-portfor exactly this reason; this brings the llama-server runtime to parity.Scope note: this PR addresses the llama-server portion of #406. The vllm-swift runtime and absorbing
vllm-swift-proxy.pyinto the metal-agent are still TBD; I'd suggest tracking those as follow-ups so #406 isn't closed by this single change.How
--llama-server-port intCLI flag on the metal-agent. Default0keeps the historical ephemeral-port behavior; a non-zero value pins the spawned llama-server to that port.MetalAgentConfig.LlamaServerPortis wired through toMetalExecutor.SetPort(int).SetPortclamps negative values back to 0.MetalExecutor.StartProcessuses the fixed port when non-zero; otherwise falls back toallocatePort()exactly as before.waitForHealthypolls whichever port was resolved, so a fixed port works end to end with no further change.TestMetalExecutorSetPortcovers the default-zero, set-to-8080, and negative-clamped-to-zero paths.Non-breaking: every change is additive. With the flag unset, behavior is byte-identical to pre-patch. No public-API signature changes.
Checklist
TestMetalExecutorSetPortinpkg/agent/executor_test.go)go test ./pkg/agent/... -> ok)feat:)git commit -s) per DCO