Summary
The dashboard model selector becomes hard to use when a provider exposes many models or when multiple endpoints are configured.
A useful improvement would be to group models by provider/endpoint and allow those groups to be collapsed or expanded.
Problem
When BenchLoop detects several providers or endpoints, the model dropdown can become long and difficult to scan.
This is especially noticeable with setups such as:
- Ollama with many local models
- OpenAI-compatible proxy endpoints
- multiple tunneled/local endpoints
- providers that expose many model IDs
- duplicate model names available from different endpoints
In those cases, users need to distinguish not only the model name, but also which endpoint/provider it belongs to.
Suggested UX
Group models by provider/endpoint, for example:
- Ollama (local)
- llama.cpp @ :8088
- vLLM @ :8000
- OpenRouter
- Custom endpoints
Each group could be collapsible. The selected model should still show both:
- model name
- endpoint/provider
Why this helps
- Reduces visual noise in the benchmark model selector.
- Makes duplicate model names across endpoints easier to distinguish.
- Helps users avoid accidentally benchmarking the wrong endpoint.
- Scales better for users with many local and remote models.
Notes
This should be a frontend UX change only. It should not affect benchmark scoring or provider behavior.
Summary
The dashboard model selector becomes hard to use when a provider exposes many models or when multiple endpoints are configured.
A useful improvement would be to group models by provider/endpoint and allow those groups to be collapsed or expanded.
Problem
When BenchLoop detects several providers or endpoints, the model dropdown can become long and difficult to scan.
This is especially noticeable with setups such as:
In those cases, users need to distinguish not only the model name, but also which endpoint/provider it belongs to.
Suggested UX
Group models by provider/endpoint, for example:
Each group could be collapsible. The selected model should still show both:
Why this helps
Notes
This should be a frontend UX change only. It should not affect benchmark scoring or provider behavior.