llmctl manages a single local GGUF backend served by llama-server.
llmctld owns the process lifecycle and exposes both a control API and an OpenAI-compatible proxy. The CLI and Pi extension are clients of that daemon. Pi registers one model, llmctl/local; the active GGUF behind it is switched with llmctl switch or /llm.
llmctld
llmctl models
llmctl switch qwen3.5
llmctl status
llmctl stopPi:
/model llmctl/local
/llm # dashboard
/llm switch # model picker
/llm switch qwen3 # fuzzy switch
CLI / Pi extension / OpenAI client
│
▼
llmctld :16384
├─ control API: /status, /switch, /models, ...
└─ OpenAI proxy: /v1/*
│
▼
active llama-server :11435
Rules:
- one active model at a time
- switch is rejected while requests are in flight
- no client talks directly to
llama-server llama-serverbinds localhost only- no auth token is used; this is localhost tooling
- if no model is active and a default is configured, the daemon loads it on first request
- CLI and Pi both display loading/switching progress
Global config lives at ~/.local/share/llmctl/config/default.yaml:
server:
host: 127.0.0.1
port: 11435
binary: llama-server
readyTimeoutMs: 600000
daemon:
host: 127.0.0.1
port: 16384
default_model: qwen3.5-35b-a3b-q8_0
load_default_on_startup: false
llama:
flash_attn: auto
gpu_layers: 999
mmap: false
ctx_size: 0
cache_type_k: q8_0
cache_type_v: q8_0
parallel: 1
metrics: true
slots: true
reasoning_format: deepseek
pi:
model:
reasoning: false
input: [text]
contextWindow: 131072
maxTokens: 8192
provider_compat:
supportsDeveloperRole: falsePer-model config lives beside the GGUF:
# ~/.local/share/llmctl/gguf/qwen3.5/config.yaml
id: qwen3.5-35b-a3b-q8_0
name: Qwen3.5 35B A3B Q8
model_file: Qwen3.5-35B-A3B-Q8_0.gguf
pi:
model:
reasoning: true
contextWindow: 262144
maxTokens: 32768Precedence:
- built-in defaults
- global
config/default.yaml - model
config.yaml
Control API:
GET /health
GET /status
GET /models
GET /models/:id
GET /active
POST /switch/:model # fuzzy match; rejects if busy
POST /switch/:model?stream=1 # NDJSON progress
POST /stop
GET /logs[/model]
POST /reload
OpenAI-compatible proxy:
GET /v1/models
POST /v1/chat/completions
POST /v1/completions
... /v1/* is proxied to the active backend
Pi uses http://127.0.0.1:16384/v1 and model id local.
The flake exposes a batteries-included package for general use:
nix run github:sigilmakes/llmctl#llmctl -- status
nix run github:sigilmakes/llmctl#llmctldPackages:
packages.<system>.llmctl llmctl wrapped with vanilla pkgs.llama-cpp
packages.<system>.llmctl-unwrapped daemon/CLI only, no backend in PATH
packages.<system>.default same as llmctl
NixOS module:
{
imports = [ inputs.llmctl.nixosModules.default ];
services.llmctld.enable = true;
}For hardware-specific llama.cpp builds, override the backend explicitly:
services.llmctld.backendPackage = pkgs.llama-cpp.override {
vulkanSupport = true;
};The module uses services.llmctld.package for the daemon/CLI and
services.llmctld.backendPackage for the package providing bin/llama-server.
The service PATH is built only from those packages.
llmctl switch <model> Switch active model, printing progress
llmctl serve <model> Alias for switch
llmctl stop Stop active model
llmctl status Show daemon status
llmctl active Print active model id
llmctl models List discovered models
llmctl inspect <model> Show model configuration
llmctl logs [model] Show active/model log tail
llmctl reload Re-discover models from disk
Install:
pi install https://github.com/sigilmakes/llmctlThe extension registers only:
llmctl/local
/llm opens a dashboard with active/default/status and actions. /llm switch opens the picker. /llm switch <query> fuzzy-matches; ambiguous matches open the picker in Pi and error in CLI.
When the active backend changes, the extension refreshes llmctl/local metadata from the active model's pi.model config.
~/.local/share/llmctl/
├── config/default.yaml
├── gguf/<model>/config.yaml
├── state.json
├── llmctld.pid
└── logs/<model>.log
Environment overrides:
LLMCTL_HOMEchanges config/model rootLLMCTL_RUNTIME_HOMEchanges runtime/log rootLLMCTL_HOSTchanges daemon hostLLMCTL_PORTchanges daemon port
src/
client/ shared daemon API client
daemon/ daemon entrypoint, HTTP API, OpenAI proxy
core/ single-active-model lifecycle
cli/ CLI frontend
pi/ Pi extension frontend