Comprehensive documentation improvements for scorers and evaluators by Qard · Pull Request #163 · braintrustdata/autoevals

Stephen Belanger (Qard) · 2026-01-13T00:47:53Z

Summary

Comprehensive documentation improvements addressing multiple user requests for better examples and reference materials.

Changes

1. Custom JSON Scorers (#137)

Added detailed JSDoc documentation to TypeScript JSON scorers (JSONDiff, ValidJSON)
Added module-level documentation with practical examples for both TypeScript and Python
Included examples for:
- Composing existing scorers (schema validation + semantic comparison)
- Custom validation logic (API response validation)
Improved consistency between TypeScript and Python documentation

2. Context-Based Evaluators with Braintrust Eval (#82)

Added comprehensive RAGAS module documentation with Braintrust Eval integration
Demonstrated how to pass context through metadata in Eval runs
Included practical examples for both TypeScript and Python showing:
- Direct usage of context-based evaluators
- Integration with Braintrust Eval runs
- Proper extraction of context from dataset metadata

3. Complete Scorer Reference (#101)

Created comprehensive SCORERS.md reference documentation
Documented all 30+ available scorers organized by category:
- LLM-as-a-Judge scorers (Factuality, Battle, ClosedQA, etc.)
- RAG scorers (ContextRelevancy, Faithfulness, AnswerCorrectness, etc.)
- Heuristic scorers (Levenshtein, NumericDiff, etc.)
- JSON scorers (JSONDiff, ValidJSON)
- List scorers (ListContains)
For each scorer, documented:
- All parameters with descriptions
- Score ranges and interpretation
- Practical usage examples
Added score interpretation guidelines

Documentation Files

js/json.ts - Enhanced with module and JSDoc examples
py/autoevals/json.py - Enhanced with module examples
js/ragas.ts - Added comprehensive module documentation
py/autoevals/ragas.py - Added Braintrust Eval integration examples
SCORERS.md - New comprehensive scorer reference (650+ lines)
TODO.md - Updated to track documentation progress

Issues Addressed

Closes #137 - Custom scorer for JSONs
Closes #82 - Better docs for context-based evaluators
Closes #101 - Document supported scores

github-actions · 2026-01-13T00:52:46Z

Braintrust eval report

Autoevals (json-scorer-docs-1768323434)

Score	Average	Improvements	Regressions
NumericDiff	73.7% (-1pp)	8 🟢	9 🔴
Time_to_first_token	1.43tok (0tok)	62 🟢	57 🔴
Llm_calls	1.55 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	279.25tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	19.3tok (+1.15tok)	33 🟢	44 🔴
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	298.54tok (+1.15tok)	33 🟢	44 🔴
Estimated_cost	0$ (+0$)	-	-
Duration	1.44s (-0.1s)	161 🟢	57 🔴
Llm_duration	2.92s (-0.19s)	73 🟢	46 🔴

Addresses multiple documentation requests with practical examples and complete reference materials. - Custom JSON scorers (#137): Added detailed JSDoc/docstrings with examples showing how to compose scorers (schema validation + semantic comparison) and create custom validators (API response validation) - Context-based evaluators with Braintrust Eval (#82): Added RAGAS module documentation demonstrating how to pass context through metadata in Eval runs, with practical examples for both TypeScript and Python - Complete scorer reference (#101): Created comprehensive SCORERS.md documenting all 30+ available scorers with parameters, score ranges, interpretation guidelines, and usage examples organized by category (LLM-as-judge, RAG, heuristic, JSON, list) Files modified: - js/json.ts, py/autoevals/json.py: Enhanced with module-level examples - js/ragas.ts, py/autoevals/ragas.py: Added Eval integration examples - SCORERS.md: New 650+ line comprehensive reference

github-actions · 2026-01-13T18:20:37Z

Braintrust eval report

Autoevals (main-1768328440)

Score	Average	Improvements	Regressions
NumericDiff	73.4% (+1pp)	3 🟢	2 🔴
Time_to_first_token	1.35tok (+0.03tok)	38 🟢	81 🔴
Llm_calls	1.55 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	279.25tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	19.3tok (+0tok)	-	-
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	298.54tok (+0tok)	-	-
Estimated_cost	0$ (+0$)	-	-
Duration	2.87s (-0.36s)	94 🟢	124 🔴
Llm_duration	2.82s (+0.11s)	32 🟢	86 🔴

Stephen Belanger (Qard) changed the title ~~Add comprehensive documentation for custom JSON scorers~~ Comprehensive documentation improvements for scorers and evaluators Jan 13, 2026

Stephen Belanger (Qard) force-pushed the json-scorer-docs branch 2 times, most recently from 95ca6c8 to e2f72e5 Compare January 13, 2026 01:00

Ankur Goyal (ankrgyl) approved these changes Jan 13, 2026

View reviewed changes

Comment thread SCORERS.md Outdated

Comment thread SCORERS.md Outdated

Stephen Belanger (Qard) force-pushed the json-scorer-docs branch from e2f72e5 to 421fe0e Compare January 13, 2026 16:56

Matt Perpick (clutchski) approved these changes Jan 13, 2026

View reviewed changes

Stephen Belanger (Qard) merged commit f4e62ec into main Jan 13, 2026
7 checks passed

Stephen Belanger (Qard) deleted the json-scorer-docs branch January 13, 2026 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comprehensive documentation improvements for scorers and evaluators#163

Comprehensive documentation improvements for scorers and evaluators#163
Stephen Belanger (Qard) merged 1 commit into
mainfrom
json-scorer-docs

Stephen Belanger (Qard) commented Jan 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Stephen Belanger (Qard) commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Custom JSON Scorers (#137)

2. Context-Based Evaluators with Braintrust Eval (#82)

3. Complete Scorer Reference (#101)

Documentation Files

Issues Addressed

Uh oh!

github-actions Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stephen Belanger (Qard) commented Jan 13, 2026 •

edited

Loading

github-actions Bot commented Jan 13, 2026 •

edited

Loading

github-actions Bot commented Jan 13, 2026 •

edited

Loading