Skip to content

Fix ContextRelevancy score#159

Merged
Stephen Belanger (Qard) merged 1 commit into
mainfrom
fix-context-relevancy-scoring
Jan 13, 2026
Merged

Fix ContextRelevancy score#159
Stephen Belanger (Qard) merged 1 commit into
mainfrom
fix-context-relevancy-scoring

Conversation

@Qard
Copy link
Copy Markdown
Contributor

Fixes #80

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 12, 2026

Braintrust eval report

Autoevals (fix-context-relevancy-scoring-1768258918)

Score Average Improvements Regressions
NumericDiff 73.8% (+2pp) 7 🟢 1 🔴
Time_to_first_token 1.48tok (+0.1tok) 16 🟢 102 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 1.5s (-1.27s) 109 🟢 110 🔴
Llm_duration 3.04s (+0.17s) 14 🟢 105 🔴

@github-actions
Copy link
Copy Markdown

Braintrust eval report

Autoevals (fix-context-relevancy-scoring-1768256188)

Score Average Improvements Regressions
NumericDiff 73.8% (+0pp) 2 🟢 -
Time_to_first_token 1.49tok (+0.1tok) 15 🟢 103 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (0$) 1 🟢 -
Duration 1.5s (+0.04s) 60 🟢 159 🔴
Llm_duration 3.07s (+0.17s) 10 🟢 109 🔴

@Qard Stephen Belanger (Qard) force-pushed the fix-context-relevancy-scoring branch from 18fe998 to 1c6bf5b Compare January 12, 2026 22:18
@Qard Stephen Belanger (Qard) merged commit 0e5793b into main Jan 13, 2026
7 checks passed
@Qard Stephen Belanger (Qard) deleted the fix-context-relevancy-scoring branch January 13, 2026 18:20
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 13, 2026

Braintrust eval report

Autoevals (main-1768328420)

Score Average Improvements Regressions
NumericDiff 72.9% (0pp) 3 🟢 2 🔴
Time_to_first_token 1.32tok (-0.08tok) 97 🟢 22 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 3.22s (+0.2s) 142 🟢 77 🔴
Llm_duration 2.72s (-0.19s) 106 🟢 13 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context Relevancy issue with score not between 0 and 1.

2 participants