Skip to content

Add configurable default model support#161

Merged
Stephen Belanger (Qard) merged 1 commit into
mainfrom
model-flexibility
Jan 13, 2026
Merged

Add configurable default model support#161
Stephen Belanger (Qard) merged 1 commit into
mainfrom
model-flexibility

Conversation

@Qard
Copy link
Copy Markdown
Contributor

This change allows users to configure which model to use as the default for all evaluations, replacing the hardcoded gpt-4o default.

Changes:

  • Add defaultModel parameter to init() in both JS and Python
  • Add getDefaultModel() function to retrieve configured default model
  • Update LLMClassifier and RAGAS scorers to use configurable default model
  • Update documentation with examples for different use cases

This enables:

  • Using different OpenAI models (gpt-4-turbo, o1, gpt-3.5-turbo, etc.)
  • Using non-OpenAI models via Braintrust proxy (Claude, Gemini, Llama, etc.)
  • Configuring once and having all evaluators use the preferred model

Example usage:

init({
  client: new OpenAI({
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://api.braintrust.dev/v1/proxy",
  }),
  defaultModel: "claude-3-5-sonnet-20241022",
});

Fixes #136

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 12, 2026

Braintrust eval report

Autoevals (model-flexibility-1768324125)

Score Average Improvements Regressions
NumericDiff 73.4% (+1pp) 3 🟢 1 🔴
Time_to_first_token 1.33tok (-0.06tok) 85 🟢 33 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 3.14s (-0.42s) 140 🟢 79 🔴
Llm_duration 2.58s (-0.22s) 106 🟢 13 🔴

@Qard Stephen Belanger (Qard) force-pushed the model-flexibility branch 3 times, most recently from 48320a5 to 91f19f8 Compare January 13, 2026 00:00
Copy link
Copy Markdown
Collaborator

@ibolmo Olmo Maldonado (ibolmo) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

This change allows users to configure which model to use as the default
for all evaluations, replacing the hardcoded gpt-4o default.

Changes:
- Add `defaultModel` parameter to `init()` in both JS and Python
- Add `getDefaultModel()` function to retrieve configured default model
- Update LLMClassifier and RAGAS scorers to use configurable default model
- Update documentation with examples for different use cases

This enables:
- Using different OpenAI models (gpt-4-turbo, o1, gpt-3.5-turbo, etc.)
- Using non-OpenAI models via Braintrust proxy (Claude, Gemini, Llama, etc.)
- Configuring once and having all evaluators use the preferred model

Example usage:
```javascript
init({
  client: new OpenAI({
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://api.braintrust.dev/v1/proxy",
  }),
  defaultModel: "claude-3-5-sonnet-20241022",
});
```

Fixes #136

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Qard Stephen Belanger (Qard) merged commit 1ff945d into main Jan 13, 2026
7 checks passed
@Qard Stephen Belanger (Qard) deleted the model-flexibility branch January 13, 2026 17:10
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 13, 2026

Braintrust eval report

Autoevals (main-1768324249)

Score Average Improvements Regressions
NumericDiff 72.5% (-1pp) 1 🟢 3 🔴
Time_to_first_token 1.34tok (+0.01tok) 50 🟢 68 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 2.86s (-0.28s) 105 🟢 111 🔴
Llm_duration 2.72s (+0.14s) 30 🟢 89 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to use an Anthropic model for evals without unsetting OPENAI_API_KEY?

2 participants