Getting Started

This guide walks you through installing HelixLLM, running it for the first time, and making your first API call.

Prerequisites

Go 1.24+ (the project uses Go 1.26.1 module syntax but builds with 1.24+)
openssl -- used to generate self-signed TLS certificates for local development
Podman (preferred) or Docker -- required for container builds and multi-host deployment
Git -- with submodule support

Optional:

golangci-lint -- for make lint
goimports -- for make fmt

Installation

Clone the repository with all submodules:

git clone --recurse-submodules https://github.com/HelixDevelopment/HelixLLM.git
cd HelixLLM

If you already cloned without --recurse-submodules, initialize them:

make deps

This runs git submodule update --init --recursive and go mod tidy.

Configuration

Copy the example environment file:

cp .env.example .env

For a minimal local setup, the defaults work out of the box. The system starts in full mode with all subsystems active, listening on port 8443 with self-signed TLS.

To use cloud providers, add your API keys:

HELIX_LLM_OPENAI_KEY=sk-your-openai-key
HELIX_LLM_ANTHROPIC_KEY=sk-ant-your-anthropic-key

See configuration.md for the full variable reference.

First Run

Generate TLS certificates and start the server:

make dev

This:

Creates self-signed TLS certificates in ./certs/ (if not already present)
Sets HELIX_MODE=full
Runs go run ./cmd/helixllm

You should see output like:

[GIN] mode=release
INFO starting HelixLLM                mode=full
INFO server listening                 addr=0.0.0.0:8443

Your First API Call

The server is now running at https://localhost:8443. Since it uses a self-signed certificate, use -k with curl to skip verification.

Health Check

curl -k https://localhost:8443/internal/health

Expected response:

{
  "status": "healthy",
  "checks": []
}

Chat Completion (OpenAI Compatible)

curl -k https://localhost:8443/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.1-70B-Instruct-Q4_K_M",
    "messages": [
      {"role": "user", "content": "Hello, what is HelixLLM?"}
    ]
  }'

List Models

curl -k https://localhost:8443/v1/models

Anthropic Messages API

curl -k https://localhost:8443/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Single-Host Quick Start

For development or small-scale use, full mode is the simplest deployment. Everything runs in a single process:

Copy config: cp .env.example .env
Set mode: HELIX_MODE=full (this is the default)
Start: make dev
Test: curl -k https://localhost:8443/internal/health

All subsystems (gateway, brain, knowledge, agents, control plane) are active and communicate via direct Go function calls -- no network overhead.

Building the Binary

To build a production binary:

make build

The binary is created at ./bin/helixllm. Run it directly:

./bin/helixllm

Or with a specific mode:

./bin/helixllm --mode=gateway

Building a Container Image

make container

This auto-detects Podman or Docker and builds the image as helixllm:dev.

Next Steps

Configuration Reference -- all environment variables and their defaults
API Reference -- complete endpoint documentation
Models -- local vs cloud model configuration
Multi-Host Setup -- deploying across multiple machines

Platform-Specific Setup

Linux

Most Linux distributions provide the required tools via package managers.

Fedora / RHEL / ALT Linux:

sudo dnf install golang openssl podman git

Ubuntu / Debian:

sudo apt update
sudo apt install golang openssl podman git

Arch Linux:

sudo pacman -S go openssl podman git

Verify Go version (1.24+ required):

go version

Install optional development tools:

go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install golang.org/x/tools/cmd/goimports@latest

On Linux, Podman runs rootless by default. No additional configuration is needed for container builds.

macOS

Install prerequisites via Homebrew:

brew install go openssl podman git

Initialize the Podman machine (macOS requires a Linux VM for containers):

podman machine init
podman machine start

Verify the setup:

podman info
go version

Note: macOS uses libressl by default. The Homebrew openssl is required for make certs to generate certificates with the correct extensions.

Windows (WSL 2)

HelixLLM runs on Windows via WSL 2. Native Windows builds are not supported.

Install WSL 2 with Ubuntu:

wsl --install -d Ubuntu

Inside the WSL terminal, install dependencies:

sudo apt update
sudo apt install golang openssl podman git

Clone and build as on Linux:

git clone --recurse-submodules https://github.com/HelixDevelopment/HelixLLM.git
cd HelixLLM
make deps
make build

WSL 2 shares the host network, so the server at https://localhost:8443 is accessible from Windows browsers and tools.

GPU Setup

Local LLM inference via llama.cpp benefits significantly from GPU acceleration. HelixLLM delegates GPU management to the llama.cpp server process.

NVIDIA GPU (CUDA)

Install the NVIDIA driver and CUDA toolkit:

# Fedora / RHEL
sudo dnf install nvidia-driver cuda-toolkit

# Ubuntu
sudo apt install nvidia-driver-550 nvidia-cuda-toolkit

Verify GPU detection:

nvidia-smi

The llama.cpp server automatically detects CUDA GPUs. Set the number of GPU layers to offload in your llama.cpp server configuration. More layers offloaded to GPU means faster inference but higher VRAM usage.
For multi-GPU setups, the control plane scheduler uses the gpu-affinity strategy:

HELIX_SCHEDULE_STRATEGY=gpu-affinity

AMD GPU (ROCm)

Install ROCm:

sudo apt install rocm-hip-libraries rocm-dev

Verify with:

rocm-smi

Ensure your llama.cpp build includes ROCm support (-DLLAMA_HIPBLAS=ON).

CPU-Only Mode

If no GPU is available, llama.cpp falls back to CPU inference. Performance depends on the CPU core count and available RAM. Quantized models (Q4_K_M, Q5_K_M) are recommended for CPU-only setups to reduce memory requirements.

For CPU-only deployments, consider using cloud providers instead:

HELIX_LLM_DEFAULT_PROVIDER=openai
# or
HELIX_LLM_DEFAULT_PROVIDER=anthropic

First Agent Chat Walkthrough

The agent system provides a conversational interface with tool calling, session memory, and RAG-augmented responses.

1. Start the Server

make dev

Wait until you see INFO server listening addr=0.0.0.0:8443.

2. Send Your First Agent Message

curl -k https://localhost:8443/v1/agents/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What time is it?"}
    ]
  }'

The agent uses its built-in time tool to respond with the current time. The response includes:

{
  "session_id": "a1b2c3d4-...",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The current time is 2026-04-05T14:30:00Z."
      }
    }
  ]
}

3. Continue the Conversation

Use the session_id from the previous response to maintain context:

curl -k https://localhost:8443/v1/agents/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "a1b2c3d4-...",
    "messages": [
      {"role": "user", "content": "And what timezone is that in?"}
    ]
  }'

The agent remembers the previous exchange and responds in context.

4. List Available Tools

curl -k https://localhost:8443/v1/agents/tools

Returns the registered tools (echo, time, knowledge_query, etc.) with their descriptions and parameter schemas.

5. Query Knowledge

If you have ingested documents into the knowledge layer, the agent automatically uses RAG to augment its responses:

# First, ingest some documents
curl -k https://localhost:8443/internal/knowledge/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Project README",
    "content": "HelixLLM is a distributed LLM serving platform...",
    "collection": "docs"
  }'

# Then ask the agent about it
curl -k https://localhost:8443/v1/agents/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is HelixLLM?"}
    ]
  }'

The RAG hook automatically retrieves relevant chunks from the knowledge base and includes them in the agent's context.

Troubleshooting

Common First-Run Issues

Problem	Cause	Solution
`TLS handshake error`	Missing or expired certificates	Run `make certs` to regenerate self-signed certificates
`bind: address already in use`	Port 8443 is occupied by another process	Change `HELIX_PORT` in `.env` or stop the conflicting process with `lsof -i :8443`
`submodule path 'submodules/X' not initialized`	Submodules were not cloned	Run `make deps` to initialize all submodules
`go: module not found`	Missing replace directives or stale module cache	Run `make deps` then `go clean -modcache` if needed
`connection refused` on API calls	Server not running or wrong port	Verify with `curl -k https://localhost:8443/internal/health`
`401 Unauthorized`	API key auth is enabled but no key provided	Add `-H "Authorization: Bearer <key>"` or clear `HELIX_AUTH_API_KEYS` in `.env`
`no provider available`	No LLM provider is configured or reachable	Set at least one provider key (`HELIX_LLM_OPENAI_KEY` or `HELIX_LLM_ANTHROPIC_KEY`) or ensure llama.cpp is running
`certificate signed by unknown authority`	Using self-signed cert without `-k` flag	Add `-k` to curl commands, or import the cert into your system trust store
`QUIC handshake timeout`	Firewall blocking UDP on the server port	Ensure UDP is allowed on port 8443 or the configured `HELIX_PORT`; the server falls back to HTTP/2 over TCP
`permission denied` on SSH commands	SSH key not authorized on remote host	Add your public key to `~/.ssh/authorized_keys` on the target host
`race detected during execution`	Concurrent access issue (development only)	Report the full race trace as a bug; this should not occur in released builds

Diagnostic Commands

Check server health:

curl -k https://localhost:8443/internal/health

View available models:

curl -k https://localhost:8443/v1/models

Check cluster status (multi-host mode):

curl -k https://localhost:8443/internal/cluster/status

View knowledge store stats:

curl -k https://localhost:8443/internal/knowledge/stats

Enable debug logging for verbose output:

HELIX_LOG_LEVEL=debug make dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Prerequisites

Installation

Configuration

First Run

Your First API Call

Health Check

Chat Completion (OpenAI Compatible)

List Models

Anthropic Messages API

Single-Host Quick Start

Building the Binary

Building a Container Image

Next Steps

Platform-Specific Setup

Linux

macOS

Windows (WSL 2)

GPU Setup

NVIDIA GPU (CUDA)

AMD GPU (ROCm)

CPU-Only Mode

First Agent Chat Walkthrough

1. Start the Server

2. Send Your First Agent Message

3. Continue the Conversation

4. List Available Tools

5. Query Knowledge

Troubleshooting

Common First-Run Issues

Diagnostic Commands

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started

Prerequisites

Installation

Configuration

First Run

Your First API Call

Health Check

Chat Completion (OpenAI Compatible)

List Models

Anthropic Messages API

Single-Host Quick Start

Building the Binary

Building a Container Image

Next Steps

Platform-Specific Setup

Linux

macOS

Windows (WSL 2)

GPU Setup

NVIDIA GPU (CUDA)

AMD GPU (ROCm)

CPU-Only Mode

First Agent Chat Walkthrough

1. Start the Server

2. Send Your First Agent Message

3. Continue the Conversation

4. List Available Tools

5. Query Knowledge

Troubleshooting

Common First-Run Issues

Diagnostic Commands