This guide walks you through installing HelixLLM, running it for the first time, and making your first API call.
- Go 1.24+ (the project uses Go 1.26.1 module syntax but builds with 1.24+)
- openssl -- used to generate self-signed TLS certificates for local development
- Podman (preferred) or Docker -- required for container builds and multi-host deployment
- Git -- with submodule support
Optional:
- golangci-lint -- for
make lint - goimports -- for
make fmt
Clone the repository with all submodules:
git clone --recurse-submodules https://github.com/HelixDevelopment/HelixLLM.git
cd HelixLLMIf you already cloned without --recurse-submodules, initialize them:
make depsThis runs git submodule update --init --recursive and go mod tidy.
Copy the example environment file:
cp .env.example .envFor a minimal local setup, the defaults work out of the box. The system starts in full mode with all subsystems active, listening on port 8443 with self-signed TLS.
To use cloud providers, add your API keys:
HELIX_LLM_OPENAI_KEY=sk-your-openai-key
HELIX_LLM_ANTHROPIC_KEY=sk-ant-your-anthropic-keySee configuration.md for the full variable reference.
Generate TLS certificates and start the server:
make devThis:
- Creates self-signed TLS certificates in
./certs/(if not already present) - Sets
HELIX_MODE=full - Runs
go run ./cmd/helixllm
You should see output like:
[GIN] mode=release
INFO starting HelixLLM mode=full
INFO server listening addr=0.0.0.0:8443
The server is now running at https://localhost:8443. Since it uses a self-signed certificate, use -k with curl to skip verification.
curl -k https://localhost:8443/internal/healthExpected response:
{
"status": "healthy",
"checks": []
}curl -k https://localhost:8443/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.1-70B-Instruct-Q4_K_M",
"messages": [
{"role": "user", "content": "Hello, what is HelixLLM?"}
]
}'curl -k https://localhost:8443/v1/modelscurl -k https://localhost:8443/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'For development or small-scale use, full mode is the simplest deployment. Everything runs in a single process:
- Copy config:
cp .env.example .env - Set mode:
HELIX_MODE=full(this is the default) - Start:
make dev - Test:
curl -k https://localhost:8443/internal/health
All subsystems (gateway, brain, knowledge, agents, control plane) are active and communicate via direct Go function calls -- no network overhead.
To build a production binary:
make buildThe binary is created at ./bin/helixllm. Run it directly:
./bin/helixllmOr with a specific mode:
./bin/helixllm --mode=gatewaymake containerThis auto-detects Podman or Docker and builds the image as helixllm:dev.
- Configuration Reference -- all environment variables and their defaults
- API Reference -- complete endpoint documentation
- Models -- local vs cloud model configuration
- Multi-Host Setup -- deploying across multiple machines
Most Linux distributions provide the required tools via package managers.
Fedora / RHEL / ALT Linux:
sudo dnf install golang openssl podman gitUbuntu / Debian:
sudo apt update
sudo apt install golang openssl podman gitArch Linux:
sudo pacman -S go openssl podman gitVerify Go version (1.24+ required):
go versionInstall optional development tools:
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install golang.org/x/tools/cmd/goimports@latestOn Linux, Podman runs rootless by default. No additional configuration is needed for container builds.
Install prerequisites via Homebrew:
brew install go openssl podman gitInitialize the Podman machine (macOS requires a Linux VM for containers):
podman machine init
podman machine startVerify the setup:
podman info
go versionNote: macOS uses libressl by default. The Homebrew openssl is required for make certs to generate certificates with the correct extensions.
HelixLLM runs on Windows via WSL 2. Native Windows builds are not supported.
- Install WSL 2 with Ubuntu:
wsl --install -d Ubuntu- Inside the WSL terminal, install dependencies:
sudo apt update
sudo apt install golang openssl podman git- Clone and build as on Linux:
git clone --recurse-submodules https://github.com/HelixDevelopment/HelixLLM.git
cd HelixLLM
make deps
make buildWSL 2 shares the host network, so the server at https://localhost:8443 is accessible from Windows browsers and tools.
Local LLM inference via llama.cpp benefits significantly from GPU acceleration. HelixLLM delegates GPU management to the llama.cpp server process.
- Install the NVIDIA driver and CUDA toolkit:
# Fedora / RHEL
sudo dnf install nvidia-driver cuda-toolkit
# Ubuntu
sudo apt install nvidia-driver-550 nvidia-cuda-toolkit- Verify GPU detection:
nvidia-smi-
The llama.cpp server automatically detects CUDA GPUs. Set the number of GPU layers to offload in your llama.cpp server configuration. More layers offloaded to GPU means faster inference but higher VRAM usage.
-
For multi-GPU setups, the control plane scheduler uses the
gpu-affinitystrategy:
HELIX_SCHEDULE_STRATEGY=gpu-affinity- Install ROCm:
sudo apt install rocm-hip-libraries rocm-dev- Verify with:
rocm-smi- Ensure your llama.cpp build includes ROCm support (
-DLLAMA_HIPBLAS=ON).
If no GPU is available, llama.cpp falls back to CPU inference. Performance depends on the CPU core count and available RAM. Quantized models (Q4_K_M, Q5_K_M) are recommended for CPU-only setups to reduce memory requirements.
For CPU-only deployments, consider using cloud providers instead:
HELIX_LLM_DEFAULT_PROVIDER=openai
# or
HELIX_LLM_DEFAULT_PROVIDER=anthropicThe agent system provides a conversational interface with tool calling, session memory, and RAG-augmented responses.
make devWait until you see INFO server listening addr=0.0.0.0:8443.
curl -k https://localhost:8443/v1/agents/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What time is it?"}
]
}'The agent uses its built-in time tool to respond with the current time. The response includes:
{
"session_id": "a1b2c3d4-...",
"choices": [
{
"message": {
"role": "assistant",
"content": "The current time is 2026-04-05T14:30:00Z."
}
}
]
}Use the session_id from the previous response to maintain context:
curl -k https://localhost:8443/v1/agents/chat \
-H "Content-Type: application/json" \
-d '{
"session_id": "a1b2c3d4-...",
"messages": [
{"role": "user", "content": "And what timezone is that in?"}
]
}'The agent remembers the previous exchange and responds in context.
curl -k https://localhost:8443/v1/agents/toolsReturns the registered tools (echo, time, knowledge_query, etc.) with their descriptions and parameter schemas.
If you have ingested documents into the knowledge layer, the agent automatically uses RAG to augment its responses:
# First, ingest some documents
curl -k https://localhost:8443/internal/knowledge/ingest \
-H "Content-Type: application/json" \
-d '{
"title": "Project README",
"content": "HelixLLM is a distributed LLM serving platform...",
"collection": "docs"
}'
# Then ask the agent about it
curl -k https://localhost:8443/v1/agents/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is HelixLLM?"}
]
}'The RAG hook automatically retrieves relevant chunks from the knowledge base and includes them in the agent's context.
| Problem | Cause | Solution |
|---|---|---|
TLS handshake error |
Missing or expired certificates | Run make certs to regenerate self-signed certificates |
bind: address already in use |
Port 8443 is occupied by another process | Change HELIX_PORT in .env or stop the conflicting process with lsof -i :8443 |
submodule path 'submodules/X' not initialized |
Submodules were not cloned | Run make deps to initialize all submodules |
go: module not found |
Missing replace directives or stale module cache | Run make deps then go clean -modcache if needed |
connection refused on API calls |
Server not running or wrong port | Verify with curl -k https://localhost:8443/internal/health |
401 Unauthorized |
API key auth is enabled but no key provided | Add -H "Authorization: Bearer <key>" or clear HELIX_AUTH_API_KEYS in .env |
no provider available |
No LLM provider is configured or reachable | Set at least one provider key (HELIX_LLM_OPENAI_KEY or HELIX_LLM_ANTHROPIC_KEY) or ensure llama.cpp is running |
certificate signed by unknown authority |
Using self-signed cert without -k flag |
Add -k to curl commands, or import the cert into your system trust store |
QUIC handshake timeout |
Firewall blocking UDP on the server port | Ensure UDP is allowed on port 8443 or the configured HELIX_PORT; the server falls back to HTTP/2 over TCP |
permission denied on SSH commands |
SSH key not authorized on remote host | Add your public key to ~/.ssh/authorized_keys on the target host |
race detected during execution |
Concurrent access issue (development only) | Report the full race trace as a bug; this should not occur in released builds |
Check server health:
curl -k https://localhost:8443/internal/healthView available models:
curl -k https://localhost:8443/v1/modelsCheck cluster status (multi-host mode):
curl -k https://localhost:8443/internal/cluster/statusView knowledge store stats:
curl -k https://localhost:8443/internal/knowledge/statsEnable debug logging for verbose output:
HELIX_LOG_LEVEL=debug make dev