Skip to content

pintar-team/llama-server

Repository files navigation

Docker files for LLAMA and Simple Diffusion Server

This repository contains Docker files for running the LLAMA server with CUDA and Vulkan support, as well as a Simple Diffusion server.

Prerequisites

  • Docker and Docker Compose installed on your system
  • NVIDIA GPU with CUDA support (for CUDA version)
  • AMD GPU with Vulkan support (for Vulkan version)

Configuration

  1. Create a .env file in the root directory of the project and set the following variables:
BUILD_CONTEXT=./
LLAMA_CUDA_DOCKERFILE=./llama-cuda/Dockerfile
LLAMA_VK_DOCKERFILE=./llama-vk/Dockerfile
SD_CUDA_DOCKERFILE=./sd-cuda/Dockerfile
SD_VK_DOCKERFILE=./sd-vk/Dockerfile
NGINX_SECRET=your_secret_token
LLAMA_ARGS="--port 8000 --host 0.0.0.0 -n 8192 -ngl 100 -c 16384 --embedding -m \"/app/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf\" "
LLAMA_MODEL_PATH=/home/fightgpt/models
SD_ARGS="--port 8001 --host 0.0.0.0 --model \"/app/models/DynaVisionXL.safetensors\" --vae \"/app/models/sdxl.vae.safetensors\" --scheduler euler_a"
SD_MODEL_PATH=/home/fightgpt/models
ENABLE_LLAMA=true
ENABLE_SD=true

Replace your_secret_token with your desired secret token for Nginx authentication, and update the LLAMA_MODEL_PATH and SD_MODEL_PATH variables with the appropriate paths to your models.

  1. Generate or provide SSL certificates for Nginx:
openssl genrsa -out nginx/cert.key 2048
openssl req -new -x509 -days 365 -key nginx/cert.key -out nginx/cert.crt

Building and Running

Before running the services, make sure to build the Docker images:

docker-compose -f docker-compose.nvidia.yml build llama sd

or

docker-compose -f docker-compose.amd.yml build llama sd

To build and run the LLAMA server and Simple Diffusion server with CUDA support:

docker-compose -f docker-compose.nvidia.yml --profile llama --profile sd up -d --remove-orphans

To build and run the LLAMA server and Simple Diffusion server with Vulkan support:

docker-compose -f docker-compose.amd.yml up --profile llama --profile sd -d --remove-orphans

Better way to build and run

This script uses ENABLE_LLAMA and ENABLE_SD variables from .env file to determine which services to build and run.

./compose-nvidia.sh build
./compose-nvidia.sh start

or

./compose-amd.sh build
./compose-amd.sh start

Checking Container Status

To check the status of running containers:

docker-compose -f docker-compose.nvidia.yml ps

or

docker-compose -f docker-compose.amd.yml ps

Making Requests

LLAMA Server

You can make requests to the LLAMA server using curl:

curl --request POST \
  --url http://localhost:8000/text/completion \
  --header "Content-Type: application/json" \
  --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

Simple Diffusion Server

You can make requests to the Simple Diffusion server using curl:

curl --request POST \
  --url http://localhost:8001/image/generate \
  --header "Content-Type: application/json" \
  --data '{"prompt": "A beautiful sunset over the ocean", "negative_prompt": "fog, mist", "num_inference_steps": 30, "guidance_scale": 7.5, "width": 512, "height": 512}'

Make sure to replace localhost with the appropriate hostname or IP address if running the server on a different machine.

Additional Notes

  • The docker-compose.nvidia.yml file is configured to use CUDA and run on NVIDIA GPUs.
  • The docker-compose.amd.yml file is configured to use Vulkan and run on AMD GPUs.
  • Adjust the LLAMA_ARGS and SD_ARGS variables in the .env file to customize the server and model settings.
  • Ensure that you have the necessary models in the specified LLAMA_MODEL_PATH and SD_MODEL_PATH directories.

For more information on the individual Dockerfiles and their usage, please refer to the respective directories (llama-cuda, llama-vk, sd-cuda, sd-vk).

The Simple Diffusion server provides endpoints for generating images and performing image-to-image translation. Refer to the server.py file in the sd-server directory for more details on the available endpoints and their usage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors