⚔️ OELLM Arena

OELLM Arena (Open European LLM Arena) is a specialized "Chatbot Arena" for evaluating open-source Large Language Models (LLMs) across a wide spectrum of European languages.
It facilitates blind A/B testing between two distinct families of monolingual models:

Model A (MultiSynt): Instruction-tuned models (Tower, Opus) based on Nemotron architecture.
Model B (HPLT): Monolingual reference models (Base models) from the High Performance Language Technologies project.

The application is designed for GPU-accelerated servers (Dual L4 GPUs recommended) and orchestrates inference to allow parallel loading of competing models.

✨ Key Features

Language-First Workflow: Users select a target language (e.g., Basque, Swedish, Italian) rather than specific models, ensuring unbiased evaluation.
Blind Evaluation: Model identities are strictly hidden until the vote is cast.
Dynamic Matchups: The system randomly selects specific architectures (e.g., MultiSynt Tower9b vs MultiSynt Opus) to compete against the HPLT reference model.
Slurm Inference Offloading: Streamlit elegantly delegates large models to cluster jobs via subprocess and srun, allowing concurrent parallel loading of competing models directly to distributed execution resources.
Dual-GPU Orchestration: Efficiently maps Model A to GPU 0 and Model B to GPU 1 natively via isolated -gpus=1 cluster jobs.
Live Analytics: Visualizes win rates by language and by model architecture (e.g., Opus vs Tower).
Persistent Logging: All prompts, generated responses, and user votes are saved to arena_results.csv for linguistic analysis.

📊 Arena Statistics

Here are the current standings and activity in the OELLM Arena:

Win Rate by Language

Total Votes by Language

Votes Over Time

🌍 Supported Languages

The arena supports 13+ languages across multiple families:

Nordic: Icelandic, Swedish, Danish, Norwegian (Bokmål), Finnish.
Germanic: German, Dutch.
Romance: Spanish, Italian, Portuguese, Romanian, Catalan.
Other: Basque.

🛠️ Installation

Prerequisites

Hardware: Linux server with NVIDIA GPUs managed by Slurm (min. 2x L4 or A100 recommended for 24GB+ VRAM). The srun CLI must be properly configured.
Software: Python 3.10+, CUDA drivers.

Setup

Clone the repository:
git clone [https://github.com/your-username/oellm-arena.git\](https://github.com/your-username/oellm-arena.git)
cd oellm-arena

Install dependencies with uv:

# Install uv if not already installed
pip install uv

# Initialize and sync dependencies
uv sync
source .venv/bin/activate

Alternatively, using standard pip:

pip install -r requirements.txt

🚀 Usage

Local Development

To run the app on your local machine for testing:
streamlit run app.py

Automated Backend Benchmark

To run the automated model comparison script which generates responses for all languages (or a limit):

# Run for all languages
uv run python benchmark_backend.py

# Run for a specific number of languages (e.g., test 1)
uv run python benchmark_backend.py --limit 1

Results are saved to backend_benchmark_results.csv.

Server Production (with Sub-path)

To run the app persistently on a server under a specific URL path (e.g., domain.com/oellmarena), use tmux:

Start a persistent session:
tmux new -s arena
Run the app with the Base URL flag:
streamlit run app.py --server.port 8501 --server.baseUrlPath=oellmarena
Detach: Press Ctrl+B, then D.

🌐 Nginx Deployment & SSL

This application is designed to sit behind an Nginx reverse proxy. Below is the configuration for serving the app securely under /oellmarena.

Nginx Configuration

File: /etc/nginx/sites-available/your-domain.com
server {
listen 80;
server_name your-domain.com;

\# Allow Certbot challenges  
location ^\~ /.well-known/acme-challenge/ {  
    default\_type "text/plain";  
    root /var/www/certbot;  
}

\# Redirect HTTP to HTTPS  
location / {  
    return 301 https://$host$request\_uri;  
}

}

server {
listen 443 ssl;
server_name your-domain.com;

\# SSL Certificates (Managed by Certbot)  
ssl\_certificate /etc/letsencrypt/live/\[your-domain.com/fullchain.pem\](https://your-domain.com/fullchain.pem);  
ssl\_certificate\_key /etc/letsencrypt/live/\[your-domain.com/privkey.pem\](https://your-domain.com/privkey.pem);

\# Security Protocols  
ssl\_protocols TLSv1.2 TLSv1.3;  
ssl\_ciphers HIGH:\!aNULL:\!MD5;

\# 1\. Trailing Slash Redirect (Vital for Streamlit)  
location \= /oellmarena {  
    return 301 /oellmarena/;  
}

\# 2\. Proxy to Streamlit App  
location /oellmarena/ {  
    proxy\_pass \[http://127.0.0.1:8501/oellmarena/\](http://127.0.0.1:8501/oellmarena/);  
    proxy\_http\_version 1.1;  
    proxy\_set\_header X-Forwarded-For $proxy\_add\_x\_forwarded\_for;  
    proxy\_set\_header Host $host;  
    proxy\_set\_header Upgrade $http\_upgrade;  
    proxy\_set\_header Connection "upgrade";  
    proxy\_read\_timeout 86400;  
}

}

🤖 Models & Attribution

This arena compares models from two primary collections:

MultiSynt Models: Hugging Face Collection - Instruction-tuned models optimized for specific languages.
HPLT Models: Hugging Face Collection - Monolingual base models from the High Performance Language Technologies project.

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
airon_logo.png		airon_logo.png
app.py		app.py
arena_results.csv		arena_results.csv
backend.py		backend.py
backend_benchmark_results.csv		backend_benchmark_results.csv
benchmark_backend.py		benchmark_backend.py
config.py		config.py
main.py		main.py
oellm_logo.png		oellm_logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_danish_model.py		test_danish_model.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚔️ OELLM Arena

✨ Key Features

📊 Arena Statistics

Win Rate by Language

Total Votes by Language

Votes Over Time

🌍 Supported Languages

🛠️ Installation

Prerequisites

Setup

🚀 Usage

Local Development

Automated Backend Benchmark

Server Production (with Sub-path)

🌐 Nginx Deployment & SSL

Nginx Configuration

🤖 Models & Attribution

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚔️ OELLM Arena

✨ Key Features

📊 Arena Statistics

Win Rate by Language

Total Votes by Language

Votes Over Time

🌍 Supported Languages

🛠️ Installation

Prerequisites

Setup

🚀 Usage

Local Development

Automated Backend Benchmark

Server Production (with Sub-path)

🌐 Nginx Deployment & SSL

Nginx Configuration

🤖 Models & Attribution

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages