___ _
/ _ \ _ __ __| | ___
| | | | '__/ _` |/ _ \
| |_| | | | (_| | (_) |
\___/|_| \__,_|\___/
──────────────────────────────────────────────────
Docker Compose stack for local LLMs, chat UI, image/video (ComfyUI), and automation (n8n) — with a unified dashboard.
Ordo AI Stack packages a local-first, operator-deployed stack: llama.cpp-backed models behind an OpenAI-compatible LiteLLM model gateway, Open WebUI for chat, ComfyUI for diffusion workflows, n8n for workflows, and an MCP gateway for shared tools. A dashboard provides a single place to inspect dependencies, pull models, and control the stack.
Deployment model: Single homelab operator. All user-facing UIs sit behind a Caddy + oauth2-proxy + Tailscale + Google SSO front door — no UI service publishes a host port directly. The operator brings their own Tailscale tailnet and Google OAuth client; the stack stitches them together so every UI is reachable at https://${CADDY_TAILNET_HOSTNAME}/<service>/ after a single Google sign-in, with an email allowlist gating access. See docs/runbooks/auth.md for the one-time setup.
Who it is for: A homelab operator running the stack on their own hardware, exposed over their tailnet to a small allowlist of personal Google accounts. Local AI models, strong operator-deployment principles.
Docs: Getting started · Auth front door · Secrets · Configuration · Data · Hermes Agent · PRD index
All UI ports below are internal (container-network). Operators reach them via the Caddy front door under https://${CADDY_TAILNET_HOSTNAME}/<path>/; the only host-published ports are Caddy :443 (tailnet-bound), and 127.0.0.1-bound publishes of model-gateway:11435, mcp-gateway:8811, and qdrant:6333 for host-side tools (Cursor, Cline, scripts).
- Unified dashboard (internal 8080, front-door
/dash/) — model lists, service links, dependency health, model pulls. - Model gateway (host
127.0.0.1:11435, also internal) — LiteLLM OpenAI-compatible API in front of llama.cpp backends. - Open WebUI (internal 8080, front-door
/) — chat UI at the root of the tailnet hostname. - ComfyUI (internal 8188, front-door
/comfy/) — workflows; large optional model downloads on demand. - n8n (internal 5678, front-door
/n8n/) — automation. - MCP gateway (host
127.0.0.1:8811, also internal) — shared MCP tools for host clients and in-stack services. - Ops controller (internal 9000; no host port) — compose lifecycle from the dashboard with
OPS_CONTROLLER_TOKEN. - Hermes dashboard (internal 9119, front-door
/hermes/) — assistant-agent UI. - GPU profiles —
scripts/detect_hardware.pygeneratesoverrides/compute.yml(gitignored) for NVIDIA / AMD / Intel / CPU paths.
Prerequisites:
- Docker with Compose, and enough disk for models.
- Tailscale installed on the host machine, with a Tailscale-issued TLS cert for the chosen tailnet hostname (
tailscale cert ordo.<tailnet>.ts.net). - A Google Cloud OAuth 2.0 Web client for the SSO front door (Client ID + secret).
- SOPS + age for secrets at rest.
- For tests / lint, Python 3.12+ (see
pyproject.toml).
-
Clone this repository and open a shell at the repo root.
-
Environment: If
.envis missing, init scripts can create it from.env.example. Otherwise copy manually:cp .env.example .env
Set at least
BASE_PATH,CADDY_BIND(your tailnet IPv4 fromtailscale ip -4), andCADDY_TAILNET_HOSTNAME(e.g.ordo.<tailnet>.ts.net). See comments in.env.example. -
Auth front door (one-time): Follow docs/runbooks/auth.md to configure the Tailscale cert, Google OAuth client, cookie secret, and email allowlist.
-
Secrets (one-time): Follow docs/runbooks/secrets.md — generate an age keypair, register your public key in
secrets/.sops.yaml, and runmake decrypt-secretsto materialize runtime tokens at~/.ai-toolkit/runtime/secrets/. -
Full bring-up — the
composewrapper runs hardware detection, then builds and starts the stack:Windows (PowerShell):
.\compose.ps1 up -d --build --force-recreate
Linux / macOS:
./compose up -d --build --force-recreate
-
From any device on your tailnet, browse to
https://${CADDY_TAILNET_HOSTNAME}/— Google sign-in gates the front door, then Open WebUI loads at/, the dashboard at/dash/, n8n at/n8n/, ComfyUI at/comfy/, and the Hermes UI at/hermes/.
Lighter bring-up (no forced rebuild/recreate; still runs hardware detection):
.\compose.ps1 up -d./compose up -dCPU-only / minimal services: bring up a subset after init, e.g. ./compose up -d ollama dashboard open-webui.
-
Runtime: Everything runs in containers; install Docker and use the repo from a fixed path (set
BASE_PATHaccordingly). -
Development: Python 3.12+. Install test dependencies:
pip install -r tests/requirements.txt
On Linux/macOS you can use
make test,make lint, andmake smoke-test(see Makefile).
Primary reference: .env.example (copy to .env).
| Area | Variables (examples) |
|---|---|
| Paths | BASE_PATH, DATA_PATH |
| Models | MODELS, DEFAULT_MODEL |
| Security / APIs | DASHBOARD_AUTH_TOKEN, OPS_CONTROLLER_TOKEN, WEBUI_AUTH, HF_TOKEN, GITHUB_PERSONAL_ACCESS_TOKEN |
| MCP | MCP_GATEWAY_SERVERS |
| Compute | COMPUTE_MODE, COMPOSE_FILE (see comments for overrides/*.yml) |
| RAG profile | EMBED_MODEL, QDRANT_PORT, RAG_COLLECTION, … |
Auto-generated: overrides/compute.yml (from hardware detection). Do not commit secrets; .env is gitignored.
-
Daily restart / full rebuild: same as Quickstart step 3.
-
On-demand one-off containers:
./compose run --rm model-puller ./compose run --rm comfyui-model-puller
-
RAG:
docker compose --profile rag up -dand ingest paths per Getting started — RAG. -
MCP clients: connect to
http://localhost:8811/mcp(see mcp/README.md).
Reach the dashboard at https://${CADDY_TAILNET_HOSTNAME}/dash/ (Google SSO front door; allowlist via auth/oauth2-proxy/emails.txt). It lists models (Ollama and ComfyUI), links to other services, dependency health, and searchable model pulls. OPS_CONTROLLER_TOKEN lets it restart services and run POST /api/comfyui/install-node-requirements. DASHBOARD_AUTH_TOKEN is an optional bearer layer for non-browser API access; the browser path is gated by SSO at the proxy level.
After code changes affecting the dashboard image: .\compose.ps1 build dashboard then .\compose.ps1 up -d (or ./compose equivalents).
Pull lists and defaults come from .env (MODELS, DEFAULT_MODEL). Pull via the dashboard or:
./compose run --rm model-pullerLarge optional downloads on demand; first run can take a long time. Pull via the dashboard or ./compose run --rm comfyui-model-puller.
- Front door: Caddy + oauth2-proxy + Google SSO gates all browser-reachable UIs at the network edge. Email allowlist in
auth/oauth2-proxy/emails.txt(replaceYOUR_ALLOWLIST_EMAILlocally — never commit your real email). See docs/runbooks/auth.md. - Open WebUI: runs with native auth disabled by default because Google SSO already gates it at the proxy; flip
WEBUI_AUTH=Trueif you want a second auth layer for multi-user workspaces. - Dashboard:
DASHBOARD_AUTH_TOKENprovides a bearer-token fallback for non-browser API access (e.g. host scripts). Browser traffic is SSO-gated. - Ops controller: requires
OPS_CONTROLLER_TOKENfor dashboard-driven lifecycle and installs; no host port at all. - Secrets at rest: SOPS + age, with high-value tokens mounted as Docker secrets at
/run/secrets/<name>. See docs/runbooks/secrets.md. - Never commit
.envor any plaintext secret. Full notes: SECURITY.md.
Hardware detection writes overrides/compute.yml. The compose wrapper runs detection before commands. No GPU: use a minimal service set (./compose up -d ollama dashboard open-webui); ComfyUI will be slower.
Tailnet device → Caddy :443 (TLS) → oauth2-proxy (Google SSO + email allowlist)
│
├── / → Open WebUI
├── /dash/ → Dashboard
├── /n8n/ → n8n
├── /comfy/ → ComfyUI
└── /hermes/ → Hermes dashboard
│
├── Model Gateway → LiteLLM → llama.cpp / Ollama / (vLLM)
├── MCP Gateway → shared tools (SearXNG, n8n, ComfyUI, …)
└── Ops Controller → Docker Compose lifecycle (token-auth, no host port)
Local-first AI; operator-deployed front door. Dashboard does not mount docker.sock. Details: PRD index.
Bind mounts only. Set BASE_PATH (and optionally DATA_PATH). Ollama blobs under models/ollama. See docs/data.md.
MCP Gateway — configure servers with MCP_GATEWAY_SERVERS in .env. Endpoint: http://localhost:8811/mcp. See mcp/README.md.
Hermes Agent runs as two compose services (hermes-gateway + hermes-dashboard) with persistent state under data/hermes/. Setup and upgrade notes: docs/hermes-agent.md.
- Python layout:
dashboard/,model-gateway/,ops-controller/,rag-ingestion/,scripts/; Ruff config inpyproject.toml. - Do not commit:
.env,data/,models/,overrides/compute.yml,mcp/.env— see CONTRIBUTING.md.
pip install -r tests/requirements.txt
python -m pytest tests/ -v
python -m ruff check dashboard tests model-gateway ops-controller rag-ingestion scripts comfyui-mcp orchestration-mcp workerHealth / diagnostics:
.\scripts\doctor.ps1./scripts/doctor.shOptional: DOCTOR_DEPS_TIMEOUT_SEC; DASHBOARD_AUTH_TOKEN from .env when probing the dashboard.
Smoke (Docker required):
.\scripts\smoke_test.ps1./scripts/smoke_test.sh
# or: make smoke-testCI (.github/workflows/ci.yml): TruffleHog secret scan; pytest + ruff; docker compose config; optional compose smoke via workflow dispatch.
- Services won’t start or images are stale — Rebuild affected images and recreate, e.g.
docker compose build dashboard model-gateway(or thecomposewrapper), thenup -d. Doctor WARN on missing/api/dependenciesor/readyoften indicates an old image. - Doctor warns on Ollama (11434) or MCP (8811) — Expected if those ports are not published; use
overrides/ollama-expose.yml/overrides/mcp-expose.ymlor setDOCTOR_STRICT=1only when you intend strict probes (see doctor script comments in repo). - No GPU — Use a minimal service set or CPU-oriented overrides; ComfyUI will be slower.
- Exposing to a network — Enable Open WebUI auth (
WEBUI_AUTH=True), setDASHBOARD_AUTH_TOKEN, and harden n8n — see SECURITY.md.
Rolling changes: CHANGELOG.md.
See CONTRIBUTING.md. Report security issues per SECURITY.md (do not use public issues for vulnerabilities).
MIT License — Copyright (c) 2026 Ordo AI Stack contributors.