llmctl

llmctl manages a single local GGUF backend served by llama-server.

llmctld owns the process lifecycle and exposes both a control API and an OpenAI-compatible proxy. The CLI and Pi extension are clients of that daemon. Pi registers one model, llmctl/local; the active GGUF behind it is switched with llmctl switch or /llm.

Quick start

llmctld

llmctl models
llmctl switch qwen3.5
llmctl status
llmctl stop

Pi:

/model llmctl/local
/llm              # dashboard
/llm switch       # model picker
/llm switch qwen3 # fuzzy switch

Architecture

CLI / Pi extension / OpenAI client
            │
            ▼
   llmctld :16384
   ├─ control API: /status, /switch, /models, ...
   └─ OpenAI proxy: /v1/*
            │
            ▼
   active llama-server :11435

Rules:

one active model at a time
switch is rejected while requests are in flight
no client talks directly to llama-server
llama-server binds localhost only
no auth token is used; this is localhost tooling
if no model is active and a default is configured, the daemon loads it on first request
CLI and Pi both display loading/switching progress

Config

Global config lives at ~/.local/share/llmctl/config/default.yaml:

server:
  host: 127.0.0.1
  port: 11435
  binary: llama-server
  readyTimeoutMs: 600000

daemon:
  host: 127.0.0.1
  port: 16384
  default_model: qwen3.5-35b-a3b-q8_0
  load_default_on_startup: false

llama:
  flash_attn: auto
  gpu_layers: 999
  mmap: false
  ctx_size: 0
  cache_type_k: q8_0
  cache_type_v: q8_0
  parallel: 1
  metrics: true
  slots: true
  reasoning_format: deepseek

pi:
  model:
    reasoning: false
    input: [text]
    contextWindow: 131072
    maxTokens: 8192
  provider_compat:
    supportsDeveloperRole: false

Per-model config lives beside the GGUF:

# ~/.local/share/llmctl/gguf/qwen3.5/config.yaml
id: qwen3.5-35b-a3b-q8_0
name: Qwen3.5 35B A3B Q8
model_file: Qwen3.5-35B-A3B-Q8_0.gguf

pi:
  model:
    reasoning: true
    contextWindow: 262144
    maxTokens: 32768

Precedence:

built-in defaults
global config/default.yaml
model config.yaml

HTTP API

Control API:

GET  /health
GET  /status
GET  /models
GET  /models/:id
GET  /active
POST /switch/:model        # fuzzy match; rejects if busy
POST /switch/:model?stream=1  # NDJSON progress
POST /stop
GET  /logs[/model]
POST /reload

OpenAI-compatible proxy:

GET  /v1/models
POST /v1/chat/completions
POST /v1/completions
... /v1/* is proxied to the active backend

Pi uses http://127.0.0.1:16384/v1 and model id local.

Nix

The flake exposes a batteries-included package for general use:

nix run github:sigilmakes/llmctl#llmctl -- status
nix run github:sigilmakes/llmctl#llmctld

Packages:

packages.<system>.llmctl            llmctl wrapped with vanilla pkgs.llama-cpp
packages.<system>.llmctl-unwrapped  daemon/CLI only, no backend in PATH
packages.<system>.default           same as llmctl

NixOS module:

{
  imports = [ inputs.llmctl.nixosModules.default ];

  services.llmctld.enable = true;
}

For hardware-specific llama.cpp builds, override the backend explicitly:

services.llmctld.backendPackage = pkgs.llama-cpp.override {
  vulkanSupport = true;
};

The module uses services.llmctld.package for the daemon/CLI and services.llmctld.backendPackage for the package providing bin/llama-server. The service PATH is built only from those packages.

CLI

llmctl switch <model>    Switch active model, printing progress
llmctl serve <model>     Alias for switch
llmctl stop              Stop active model
llmctl status            Show daemon status
llmctl active            Print active model id
llmctl models            List discovered models
llmctl inspect <model>   Show model configuration
llmctl logs [model]      Show active/model log tail
llmctl reload            Re-discover models from disk

Pi extension

Install:

pi install https://github.com/sigilmakes/llmctl

The extension registers only:

llmctl/local

/llm opens a dashboard with active/default/status and actions. /llm switch opens the picker. /llm switch <query> fuzzy-matches; ambiguous matches open the picker in Pi and error in CLI.

When the active backend changes, the extension refreshes llmctl/local metadata from the active model's pi.model config.

Layout

~/.local/share/llmctl/
├── config/default.yaml
├── gguf/<model>/config.yaml
├── state.json
├── llmctld.pid
└── logs/<model>.log

Environment overrides:

LLMCTL_HOME changes config/model root
LLMCTL_RUNTIME_HOME changes runtime/log root
LLMCTL_HOST changes daemon host
LLMCTL_PORT changes daemon port

Development layout

src/
  client/    shared daemon API client
  daemon/    daemon entrypoint, HTTP API, OpenAI proxy
  core/      single-active-model lifecycle
  cli/       CLI frontend
  pi/        Pi extension frontend

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
bin		bin
examples		examples
nix		nix
src		src
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
settings.schema.json		settings.schema.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmctl

Quick start

Architecture

Config

HTTP API

Nix

CLI

Pi extension

Layout

Development layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmctl

Quick start

Architecture

Config

HTTP API

Nix

CLI

Pi extension

Layout

Development layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages