File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 11# evals
22
3- Evaluation framework for testing AI agents' ability to write Dart and Flutter code. Built on [ Inspect AI] ( https://inspect.aisi.org.uk/ ) .
3+ > [ !Warning]
4+ > This repo is _ highly unstable_ and the APIs _ will_ change.
45
5- > [ !TIP]
6- > Full documentation at [ evals-docs.web.app/] ( https://evals-docs.web.app/ )
6+ Evaluation framework for testing AI agents' ability to write Dart and Flutter code. Built on [ Inspect AI] ( https://inspect.aisi.org.uk/ ) .
77
88## Overview
99
10- evals provides:
10+ This repo includes
1111
12- - ** Evaluation Runner ** — Python package for running LLM evaluations with configurable tasks, variants, and models
13- - ** Evaluation Configuration ** — Dart and Python packages that resolve dataset YAML into EvalSet JSON for the runner
12+ - ** eval runner ** — Python package for running LLM evaluations with configurable tasks, variants, and models
13+ - ** config packages ** — Dart and Python packages that resolve dataset YAML into EvalSet JSON for the runner
1414- ** devals CLI** — Dart CLI for creating and managing dataset samples, tasks, and jobs
1515- ** Evaluation Explorer** — Dart/Flutter app for browsing and analyzing results
16- - ** Dataset** — Curated samples for Dart/Flutter Q&A, code generation, and debugging tasks
16+
17+ > [ !CAUTION]
18+ > Full documentation at [ evals-docs.web.app/] ( https://evals-docs.web.app/ )
1719
1820## Packages
1921
@@ -45,4 +47,4 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for details, or go directly to the [Contr
4547
4648## License
4749
48- See [ LICENSE] ( LICENSE ) for details.
50+ See [ LICENSE] ( LICENSE ) for details.
You can’t perform that action at this time.
0 commit comments