Docker files for LLAMA and Simple Diffusion Server

This repository contains Docker files for running the LLAMA server with CUDA and Vulkan support, as well as a Simple Diffusion server.

Prerequisites

Docker and Docker Compose installed on your system
NVIDIA GPU with CUDA support (for CUDA version)
AMD GPU with Vulkan support (for Vulkan version)

Configuration

Create a .env file in the root directory of the project and set the following variables:

BUILD_CONTEXT=./
LLAMA_CUDA_DOCKERFILE=./llama-cuda/Dockerfile
LLAMA_VK_DOCKERFILE=./llama-vk/Dockerfile
SD_CUDA_DOCKERFILE=./sd-cuda/Dockerfile
SD_VK_DOCKERFILE=./sd-vk/Dockerfile
NGINX_SECRET=your_secret_token
LLAMA_ARGS="--port 8000 --host 0.0.0.0 -n 8192 -ngl 100 -c 16384 --embedding -m \"/app/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf\" "
LLAMA_MODEL_PATH=/home/fightgpt/models
SD_ARGS="--port 8001 --host 0.0.0.0 --model \"/app/models/DynaVisionXL.safetensors\" --vae \"/app/models/sdxl.vae.safetensors\" --scheduler euler_a"
SD_MODEL_PATH=/home/fightgpt/models
ENABLE_LLAMA=true
ENABLE_SD=true

Replace your_secret_token with your desired secret token for Nginx authentication, and update the LLAMA_MODEL_PATH and SD_MODEL_PATH variables with the appropriate paths to your models.

Generate or provide SSL certificates for Nginx:

openssl genrsa -out nginx/cert.key 2048
openssl req -new -x509 -days 365 -key nginx/cert.key -out nginx/cert.crt

Building and Running

Before running the services, make sure to build the Docker images:

docker-compose -f docker-compose.nvidia.yml build llama sd

or

docker-compose -f docker-compose.amd.yml build llama sd

To build and run the LLAMA server and Simple Diffusion server with CUDA support:

docker-compose -f docker-compose.nvidia.yml --profile llama --profile sd up -d --remove-orphans

To build and run the LLAMA server and Simple Diffusion server with Vulkan support:

docker-compose -f docker-compose.amd.yml up --profile llama --profile sd -d --remove-orphans

Better way to build and run

This script uses ENABLE_LLAMA and ENABLE_SD variables from .env file to determine which services to build and run.

./compose-nvidia.sh build
./compose-nvidia.sh start

or

./compose-amd.sh build
./compose-amd.sh start

Checking Container Status

To check the status of running containers:

docker-compose -f docker-compose.nvidia.yml ps

or

docker-compose -f docker-compose.amd.yml ps

Making Requests

LLAMA Server

You can make requests to the LLAMA server using curl:

curl --request POST \
  --url http://localhost:8000/text/completion \
  --header "Content-Type: application/json" \
  --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

Simple Diffusion Server

You can make requests to the Simple Diffusion server using curl:

curl --request POST \
  --url http://localhost:8001/image/generate \
  --header "Content-Type: application/json" \
  --data '{"prompt": "A beautiful sunset over the ocean", "negative_prompt": "fog, mist", "num_inference_steps": 30, "guidance_scale": 7.5, "width": 512, "height": 512}'

Make sure to replace localhost with the appropriate hostname or IP address if running the server on a different machine.

Additional Notes

The docker-compose.nvidia.yml file is configured to use CUDA and run on NVIDIA GPUs.
The docker-compose.amd.yml file is configured to use Vulkan and run on AMD GPUs.
Adjust the LLAMA_ARGS and SD_ARGS variables in the .env file to customize the server and model settings.
Ensure that you have the necessary models in the specified LLAMA_MODEL_PATH and SD_MODEL_PATH directories.

For more information on the individual Dockerfiles and their usage, please refer to the respective directories (llama-cuda, llama-vk, sd-cuda, sd-vk).

The Simple Diffusion server provides endpoints for generating images and performing image-to-image translation. Refer to the server.py file in the sd-server directory for more details on the available endpoints and their usage.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
llama-cuda		llama-cuda
llama-vk		llama-vk
llama.cpp @ d5cb868		llama.cpp @ d5cb868
nginx		nginx
sd-cuda		sd-cuda
sd-server @ 833fa1e		sd-server @ 833fa1e
sd-vk		sd-vk
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
compose-amd.sh		compose-amd.sh
compose-nvidia.sh		compose-nvidia.sh
docker-compose.amd.yml		docker-compose.amd.yml
docker-compose.nvidia.yml		docker-compose.nvidia.yml
docker-compose.yml		docker-compose.yml
llama-bench.py		llama-bench.py
llama-bench.sh		llama-bench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker files for LLAMA and Simple Diffusion Server

Prerequisites

Configuration

Building and Running

Better way to build and run

Checking Container Status

Making Requests

LLAMA Server

Simple Diffusion Server

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Docker files for LLAMA and Simple Diffusion Server

Prerequisites

Configuration

Building and Running

Better way to build and run

Checking Container Status

Making Requests

LLAMA Server

Simple Diffusion Server

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages