Skip to content

sigilmakes/llmctl

Repository files navigation

llmctl

llmctl manages a single local GGUF backend served by llama-server.

llmctld owns the process lifecycle and exposes both a control API and an OpenAI-compatible proxy. The CLI and Pi extension are clients of that daemon. Pi registers one model, llmctl/local; the active GGUF behind it is switched with llmctl switch or /llm.

Quick start

llmctld

llmctl models
llmctl switch qwen3.5
llmctl status
llmctl stop

Pi:

/model llmctl/local
/llm              # dashboard
/llm switch       # model picker
/llm switch qwen3 # fuzzy switch

Architecture

CLI / Pi extension / OpenAI client
            │
            ▼
   llmctld :16384
   ├─ control API: /status, /switch, /models, ...
   └─ OpenAI proxy: /v1/*
            │
            ▼
   active llama-server :11435

Rules:

  • one active model at a time
  • switch is rejected while requests are in flight
  • no client talks directly to llama-server
  • llama-server binds localhost only
  • no auth token is used; this is localhost tooling
  • if no model is active and a default is configured, the daemon loads it on first request
  • CLI and Pi both display loading/switching progress

Config

Global config lives at ~/.local/share/llmctl/config/default.yaml:

server:
  host: 127.0.0.1
  port: 11435
  binary: llama-server
  readyTimeoutMs: 600000

daemon:
  host: 127.0.0.1
  port: 16384
  default_model: qwen3.5-35b-a3b-q8_0
  load_default_on_startup: false

llama:
  flash_attn: auto
  gpu_layers: 999
  mmap: false
  ctx_size: 0
  cache_type_k: q8_0
  cache_type_v: q8_0
  parallel: 1
  metrics: true
  slots: true
  reasoning_format: deepseek

pi:
  model:
    reasoning: false
    input: [text]
    contextWindow: 131072
    maxTokens: 8192
  provider_compat:
    supportsDeveloperRole: false

Per-model config lives beside the GGUF:

# ~/.local/share/llmctl/gguf/qwen3.5/config.yaml
id: qwen3.5-35b-a3b-q8_0
name: Qwen3.5 35B A3B Q8
model_file: Qwen3.5-35B-A3B-Q8_0.gguf

pi:
  model:
    reasoning: true
    contextWindow: 262144
    maxTokens: 32768

Precedence:

  1. built-in defaults
  2. global config/default.yaml
  3. model config.yaml

HTTP API

Control API:

GET  /health
GET  /status
GET  /models
GET  /models/:id
GET  /active
POST /switch/:model        # fuzzy match; rejects if busy
POST /switch/:model?stream=1  # NDJSON progress
POST /stop
GET  /logs[/model]
POST /reload

OpenAI-compatible proxy:

GET  /v1/models
POST /v1/chat/completions
POST /v1/completions
... /v1/* is proxied to the active backend

Pi uses http://127.0.0.1:16384/v1 and model id local.

Nix

The flake exposes a batteries-included package for general use:

nix run github:sigilmakes/llmctl#llmctl -- status
nix run github:sigilmakes/llmctl#llmctld

Packages:

packages.<system>.llmctl            llmctl wrapped with vanilla pkgs.llama-cpp
packages.<system>.llmctl-unwrapped  daemon/CLI only, no backend in PATH
packages.<system>.default           same as llmctl

NixOS module:

{
  imports = [ inputs.llmctl.nixosModules.default ];

  services.llmctld.enable = true;
}

For hardware-specific llama.cpp builds, override the backend explicitly:

services.llmctld.backendPackage = pkgs.llama-cpp.override {
  vulkanSupport = true;
};

The module uses services.llmctld.package for the daemon/CLI and services.llmctld.backendPackage for the package providing bin/llama-server. The service PATH is built only from those packages.

CLI

llmctl switch <model>    Switch active model, printing progress
llmctl serve <model>     Alias for switch
llmctl stop              Stop active model
llmctl status            Show daemon status
llmctl active            Print active model id
llmctl models            List discovered models
llmctl inspect <model>   Show model configuration
llmctl logs [model]      Show active/model log tail
llmctl reload            Re-discover models from disk

Pi extension

Install:

pi install https://github.com/sigilmakes/llmctl

The extension registers only:

llmctl/local

/llm opens a dashboard with active/default/status and actions. /llm switch opens the picker. /llm switch <query> fuzzy-matches; ambiguous matches open the picker in Pi and error in CLI.

When the active backend changes, the extension refreshes llmctl/local metadata from the active model's pi.model config.

Layout

~/.local/share/llmctl/
├── config/default.yaml
├── gguf/<model>/config.yaml
├── state.json
├── llmctld.pid
└── logs/<model>.log

Environment overrides:

  • LLMCTL_HOME changes config/model root
  • LLMCTL_RUNTIME_HOME changes runtime/log root
  • LLMCTL_HOST changes daemon host
  • LLMCTL_PORT changes daemon port

Development layout

src/
  client/    shared daemon API client
  daemon/    daemon entrypoint, HTTP API, OpenAI proxy
  core/      single-active-model lifecycle
  cli/       CLI frontend
  pi/        Pi extension frontend

About

Daemon and pi client for managing local llama-server GGUF models cleanly

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors