The evaluate.py script now supports switching between Ollama and Gemini for agent models via a CLI argument, while judges always use Gemini for consistency.
Select the model provider for agents (MonolithicAgent and EnsembleAgent):
-
ollama(default): Uses local Ollama models- MonolithicAgent: Uses
OLLAMA_MODELfrom .env (default:qwen2.5:7b) - EnsembleAgent: Uses
CREWAI_MODELfrom .env (default:openai/qwen2.5:7b) - Judges: Always use Gemini (
JUDGE_MODELfrom .env)
- MonolithicAgent: Uses
-
gemini: Uses Google Gemini API for all agents- MonolithicAgent:
gemini-2.5-flash-lite - EnsembleAgent:
gemini/gemini-2.5-flash-lite - Judges: Always use Gemini (
JUDGE_MODELfrom .env) - Requires:
GEMINI_API_KEYenvironment variable must be set
- MonolithicAgent:
Run in test mode (single paper, single task) for quick testing.
python evaluate.py
# or explicitly:
python evaluate.py --agents-model=ollamaMLflow Experiments Created:
document_synthesis_monolithicdocument_synthesis_ensemble
python evaluate.py --agents-model=geminiMLflow Experiments Created:
document_synthesis_monolithic_geminidocument_synthesis_ensemble_gemini
python evaluate.py --test
# or:
python evaluate.py -t --agents-model=ollamapython evaluate.py -t --agents-model=geminiThe experiment naming scheme automatically appends _gemini suffix when running with Gemini agents:
| Agent Type | Ollama Experiment Name | Gemini Experiment Name |
|---|---|---|
| Monolithic | document_synthesis_monolithic |
document_synthesis_monolithic_gemini |
| Ensemble | document_synthesis_ensemble |
document_synthesis_ensemble_gemini |
This allows you to:
- Compare Ollama vs Gemini performance side-by-side in MLflow UI
- Track experiments separately by model provider
- Avoid mixing results from different model providers
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b
OLLAMA_NUM_CTX=32768
CREWAI_MODEL=openai/qwen2.5:7bGEMINI_API_KEY=your_api_key_hereJUDGE_MODEL=gemini:/gemini-2.5-flash-liteIf you run with --agents-model=gemini without setting GEMINI_API_KEY, the script will:
- Display a clear error message
- Exit with code 1
- Prompt you to set the API key in your
.envfile
Example error:
ERROR: GEMINI_API_KEY environment variable is not set.
Please set GEMINI_API_KEY in your .env file to use --agents-model=gemini
View all available options:
python evaluate.py --helpOutput:
usage: evaluate.py [-h] [--agents-model {ollama,gemini}] [-t]
Evaluate Monolithic vs Ensemble agents for document synthesis
options:
-h, --help show this help message and exit
--agents-model {ollama,gemini}
Model to use for agents: 'ollama' (default) or 'gemini'.
Judges always use Gemini.
-t, --test Run in test mode (single paper, single task)