OELLM Arena (Open European LLM Arena) is a specialized "Chatbot Arena" for evaluating open-source Large Language Models (LLMs) across a wide spectrum of European languages.
It facilitates blind A/B testing between two distinct families of monolingual models:
- Model A (MultiSynt): Instruction-tuned models (Tower, Opus) based on Nemotron architecture.
- Model B (HPLT): Monolingual reference models (Base models) from the High Performance Language Technologies project.
The application is designed for GPU-accelerated servers (Dual L4 GPUs recommended) and orchestrates inference to allow parallel loading of competing models.
- Language-First Workflow: Users select a target language (e.g., Basque, Swedish, Italian) rather than specific models, ensuring unbiased evaluation.
- Blind Evaluation: Model identities are strictly hidden until the vote is cast.
- Dynamic Matchups: The system randomly selects specific architectures (e.g., MultiSynt Tower9b vs MultiSynt Opus) to compete against the HPLT reference model.
- Slurm Inference Offloading: Streamlit elegantly delegates large models to cluster jobs via
subprocessandsrun, allowing concurrent parallel loading of competing models directly to distributed execution resources. - Dual-GPU Orchestration: Efficiently maps Model A to GPU 0 and Model B to GPU 1 natively via isolated
-gpus=1cluster jobs. - Live Analytics: Visualizes win rates by language and by model architecture (e.g., Opus vs Tower).
- Persistent Logging: All prompts, generated responses, and user votes are saved to arena_results.csv for linguistic analysis.
Here are the current standings and activity in the OELLM Arena:
The arena supports 13+ languages across multiple families:
- Nordic: Icelandic, Swedish, Danish, Norwegian (Bokmål), Finnish.
- Germanic: German, Dutch.
- Romance: Spanish, Italian, Portuguese, Romanian, Catalan.
- Other: Basque.
- Hardware: Linux server with NVIDIA GPUs managed by Slurm (min. 2x L4 or A100 recommended for 24GB+ VRAM). The
srunCLI must be properly configured. - Software: Python 3.10+, CUDA drivers.
-
Clone the repository:
git clone [https://github.com/your-username/oellm-arena.git\](https://github.com/your-username/oellm-arena.git)
cd oellm-arena -
Install dependencies with uv:
# Install uv if not already installed pip install uv # Initialize and sync dependencies uv sync source .venv/bin/activate
Alternatively, using standard pip:
pip install -r requirements.txt
To run the app on your local machine for testing:
streamlit run app.py
To run the automated model comparison script which generates responses for all languages (or a limit):
# Run for all languages
uv run python benchmark_backend.py
# Run for a specific number of languages (e.g., test 1)
uv run python benchmark_backend.py --limit 1Results are saved to backend_benchmark_results.csv.
To run the app persistently on a server under a specific URL path (e.g., domain.com/oellmarena), use tmux:
-
Start a persistent session:
tmux new -s arena -
Run the app with the Base URL flag:
streamlit run app.py --server.port 8501 --server.baseUrlPath=oellmarena -
Detach: Press Ctrl+B, then D.
This application is designed to sit behind an Nginx reverse proxy. Below is the configuration for serving the app securely under /oellmarena.
File: /etc/nginx/sites-available/your-domain.com
server {
listen 80;
server_name your-domain.com;
\# Allow Certbot challenges
location ^\~ /.well-known/acme-challenge/ {
default\_type "text/plain";
root /var/www/certbot;
}
\# Redirect HTTP to HTTPS
location / {
return 301 https://$host$request\_uri;
}
}
server {
listen 443 ssl;
server_name your-domain.com;
\# SSL Certificates (Managed by Certbot)
ssl\_certificate /etc/letsencrypt/live/\[your-domain.com/fullchain.pem\](https://your-domain.com/fullchain.pem);
ssl\_certificate\_key /etc/letsencrypt/live/\[your-domain.com/privkey.pem\](https://your-domain.com/privkey.pem);
\# Security Protocols
ssl\_protocols TLSv1.2 TLSv1.3;
ssl\_ciphers HIGH:\!aNULL:\!MD5;
\# 1\. Trailing Slash Redirect (Vital for Streamlit)
location \= /oellmarena {
return 301 /oellmarena/;
}
\# 2\. Proxy to Streamlit App
location /oellmarena/ {
proxy\_pass \[http://127.0.0.1:8501/oellmarena/\](http://127.0.0.1:8501/oellmarena/);
proxy\_http\_version 1.1;
proxy\_set\_header X-Forwarded-For $proxy\_add\_x\_forwarded\_for;
proxy\_set\_header Host $host;
proxy\_set\_header Upgrade $http\_upgrade;
proxy\_set\_header Connection "upgrade";
proxy\_read\_timeout 86400;
}
}
This arena compares models from two primary collections:
- MultiSynt Models: Hugging Face Collection - Instruction-tuned models optimized for specific languages.
- HPLT Models: Hugging Face Collection - Monolingual base models from the High Performance Language Technologies project.
MIT License


