This repository contains Docker files for running the LLAMA server with CUDA and Vulkan support, as well as a Simple Diffusion server.
- Docker and Docker Compose installed on your system
- NVIDIA GPU with CUDA support (for CUDA version)
- AMD GPU with Vulkan support (for Vulkan version)
- Create a
.envfile in the root directory of the project and set the following variables:
BUILD_CONTEXT=./
LLAMA_CUDA_DOCKERFILE=./llama-cuda/Dockerfile
LLAMA_VK_DOCKERFILE=./llama-vk/Dockerfile
SD_CUDA_DOCKERFILE=./sd-cuda/Dockerfile
SD_VK_DOCKERFILE=./sd-vk/Dockerfile
NGINX_SECRET=your_secret_token
LLAMA_ARGS="--port 8000 --host 0.0.0.0 -n 8192 -ngl 100 -c 16384 --embedding -m \"/app/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf\" "
LLAMA_MODEL_PATH=/home/fightgpt/models
SD_ARGS="--port 8001 --host 0.0.0.0 --model \"/app/models/DynaVisionXL.safetensors\" --vae \"/app/models/sdxl.vae.safetensors\" --scheduler euler_a"
SD_MODEL_PATH=/home/fightgpt/models
ENABLE_LLAMA=true
ENABLE_SD=true
Replace your_secret_token with your desired secret token for Nginx authentication, and update the LLAMA_MODEL_PATH and SD_MODEL_PATH variables with the appropriate paths to your models.
- Generate or provide SSL certificates for Nginx:
openssl genrsa -out nginx/cert.key 2048
openssl req -new -x509 -days 365 -key nginx/cert.key -out nginx/cert.crtBefore running the services, make sure to build the Docker images:
docker-compose -f docker-compose.nvidia.yml build llama sdor
docker-compose -f docker-compose.amd.yml build llama sdTo build and run the LLAMA server and Simple Diffusion server with CUDA support:
docker-compose -f docker-compose.nvidia.yml --profile llama --profile sd up -d --remove-orphansTo build and run the LLAMA server and Simple Diffusion server with Vulkan support:
docker-compose -f docker-compose.amd.yml up --profile llama --profile sd -d --remove-orphansThis script uses ENABLE_LLAMA and ENABLE_SD variables from .env file to determine which services to build and run.
./compose-nvidia.sh build
./compose-nvidia.sh startor
./compose-amd.sh build
./compose-amd.sh startTo check the status of running containers:
docker-compose -f docker-compose.nvidia.yml psor
docker-compose -f docker-compose.amd.yml psYou can make requests to the LLAMA server using curl:
curl --request POST \
--url http://localhost:8000/text/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'You can make requests to the Simple Diffusion server using curl:
curl --request POST \
--url http://localhost:8001/image/generate \
--header "Content-Type: application/json" \
--data '{"prompt": "A beautiful sunset over the ocean", "negative_prompt": "fog, mist", "num_inference_steps": 30, "guidance_scale": 7.5, "width": 512, "height": 512}'Make sure to replace localhost with the appropriate hostname or IP address if running the server on a different machine.
- The
docker-compose.nvidia.ymlfile is configured to use CUDA and run on NVIDIA GPUs. - The
docker-compose.amd.ymlfile is configured to use Vulkan and run on AMD GPUs. - Adjust the
LLAMA_ARGSandSD_ARGSvariables in the.envfile to customize the server and model settings. - Ensure that you have the necessary models in the specified
LLAMA_MODEL_PATHandSD_MODEL_PATHdirectories.
For more information on the individual Dockerfiles and their usage, please refer to the respective directories (llama-cuda, llama-vk, sd-cuda, sd-vk).
The Simple Diffusion server provides endpoints for generating images and performing image-to-image translation. Refer to the server.py file in the sd-server directory for more details on the available endpoints and their usage.