CLI Usage Guide for Evaluation Script

Overview

The evaluate.py script now supports switching between Ollama and Gemini for agent models via a CLI argument, while judges always use Gemini for consistency.

Command-Line Arguments

`--agents-model {ollama,gemini}`

Select the model provider for agents (MonolithicAgent and EnsembleAgent):

ollama (default): Uses local Ollama models
- MonolithicAgent: Uses OLLAMA_MODEL from .env (default: qwen2.5:7b)
- EnsembleAgent: Uses CREWAI_MODEL from .env (default: openai/qwen2.5:7b)
- Judges: Always use Gemini (JUDGE_MODEL from .env)
gemini: Uses Google Gemini API for all agents
- MonolithicAgent: gemini-2.5-flash-lite
- EnsembleAgent: gemini/gemini-2.5-flash-lite
- Judges: Always use Gemini (JUDGE_MODEL from .env)
- Requires: GEMINI_API_KEY environment variable must be set

`-t, --test`

Run in test mode (single paper, single task) for quick testing.

Usage Examples

Standard Evaluation with Ollama (default)

python evaluate.py
# or explicitly:
python evaluate.py --agents-model=ollama

MLflow Experiments Created:

document_synthesis_monolithic
document_synthesis_ensemble

Standard Evaluation with Gemini

python evaluate.py --agents-model=gemini

MLflow Experiments Created:

document_synthesis_monolithic_gemini
document_synthesis_ensemble_gemini

Test Mode with Ollama

python evaluate.py --test
# or:
python evaluate.py -t --agents-model=ollama

Test Mode with Gemini

python evaluate.py -t --agents-model=gemini

MLflow Experiment Naming

The experiment naming scheme automatically appends _gemini suffix when running with Gemini agents:

Agent Type	Ollama Experiment Name	Gemini Experiment Name
Monolithic	`document_synthesis_monolithic`	`document_synthesis_monolithic_gemini`
Ensemble	`document_synthesis_ensemble`	`document_synthesis_ensemble_gemini`

This allows you to:

Compare Ollama vs Gemini performance side-by-side in MLflow UI
Track experiments separately by model provider
Avoid mixing results from different model providers

Environment Variables

Required for Ollama (when `--agents-model=ollama`)

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b
OLLAMA_NUM_CTX=32768
CREWAI_MODEL=openai/qwen2.5:7b

Required for Gemini (when `--agents-model=gemini`)

GEMINI_API_KEY=your_api_key_here

Always Required (Judges)

JUDGE_MODEL=gemini:/gemini-2.5-flash-lite

Error Handling

If you run with --agents-model=gemini without setting GEMINI_API_KEY, the script will:

Display a clear error message
Exit with code 1
Prompt you to set the API key in your .env file

Example error:

ERROR: GEMINI_API_KEY environment variable is not set.
Please set GEMINI_API_KEY in your .env file to use --agents-model=gemini

Help

View all available options:

python evaluate.py --help

Output:

usage: evaluate.py [-h] [--agents-model {ollama,gemini}] [-t]

Evaluate Monolithic vs Ensemble agents for document synthesis

options:
  -h, --help            show this help message and exit
  --agents-model {ollama,gemini}
                        Model to use for agents: 'ollama' (default) or 'gemini'. 
                        Judges always use Gemini.
  -t, --test            Run in test mode (single paper, single task)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Usage Guide for Evaluation Script

Overview

Command-Line Arguments

`--agents-model {ollama,gemini}`

`-t, --test`

Usage Examples

Standard Evaluation with Ollama (default)

Standard Evaluation with Gemini

Test Mode with Ollama

Test Mode with Gemini

MLflow Experiment Naming

Environment Variables

Required for Ollama (when `--agents-model=ollama`)

Required for Gemini (when `--agents-model=gemini`)

Always Required (Judges)

Error Handling

Help

FilesExpand file tree

CLI_USAGE.md

Latest commit

History

CLI_USAGE.md

File metadata and controls

CLI Usage Guide for Evaluation Script

Overview

Command-Line Arguments

--agents-model {ollama,gemini}

-t, --test

Usage Examples

Standard Evaluation with Ollama (default)

Standard Evaluation with Gemini

Test Mode with Ollama

Test Mode with Gemini

MLflow Experiment Naming

Environment Variables

Required for Ollama (when --agents-model=ollama)

Required for Gemini (when --agents-model=gemini)

Always Required (Judges)

Error Handling

Help

`--agents-model {ollama,gemini}`

`-t, --test`

Required for Ollama (when `--agents-model=ollama`)

Required for Gemini (when `--agents-model=gemini`)