model routing: cost/latency ranking with ranked fallback list by adilhafeez · Pull Request #849 · katanemo/plano

adilhafeez · 2026-03-27T00:36:17Z

Summary

Top-level routing_preferences (v0.4.0+) with candidate model list and selection_policy
/routing/v1/* returns ranked models[] array; client uses models[0], falls back on 429/5xx
selection_policy.prefer: cheapest, fastest, random, none
model_metrics_sources: cost_metrics, prometheus_metrics, digitalocean_pricing (public DO catalog with model_aliases)
Startup errors for missing metric sources; startup + request-time warnings for unmatched models
Dropped legacy per-provider routing format
Demo updated to v0.4.0 with docker-compose (Prometheus + mock latency server)

fixes #848

…rics fetch

…urn ranked model list

- MetricsSource::DigitalOceanPricing variant: fetch public DO Gen-AI pricing, normalize as lowercase(creator)/model_id, cost = input + output per million - cost_metrics endpoint format updated to { "model": { "input_per_million": X, "output_per_million": Y } } - Startup errors: prefer:cheapest requires cost source, prefer:fastest requires prometheus - Startup warning: models with no pricing/latency data ranked last - One-per-type enforcement: digitalocean_pricing; error if cost_metrics + digitalocean_pricing both configured - cost_snapshot() / latency_snapshot() on ModelMetricsService for startup checks - Demo config updated to v0.4.0 top-level routing_preferences with cheapest + fastest policies - docker-compose.yaml + prometheus.yaml + metrics_server.py for demo latency metrics - Schema and docs updated

…o endpoint

…on missing data at request time

…ne in config.yaml

alex-paperspace

lgtm

Plano should only handle ranking that requires server-side data (cost metrics, latency). Random shuffling is trivial for callers.

…g-preferences

PR #851 introduced duplicate openai/gpt-4o entries and set use_agent_orchestrator: true with multiple endpoints. Fixed by using groq/llama-3.3-70b-versatile for the routing_preferences example and setting use_agent_orchestrator: false.

Falls back to bare python when uv is not available (CI).

PR #851 added ratelimit examples using unit: day but the Rust TimeUnit enum only had second/minute/hour. Adds Day variant and maps it to per-hour quota (tokens/24).

adilhafeez added 7 commits March 26, 2026 17:35

add top-level routing_preferences with selection_policy and model met…

2ef938a

…rics fetch

cargo fmt

b12bf74

redesign model_metrics_sources, drop legacy per-provider routing, ret…

76b1f37

…urn ranked model list

fix DO pricing URL, model_providers name validation, cost_metrics dem…

bd335cd

…o endpoint

add model_aliases to digitalocean_pricing, use model_id as key, warn …

a7903d9

…on missing data at request time

docs: note per-request warning for models with no metric data

3af94d3

adilhafeez changed the title ~~add top-level routing_preferences with selection_policy and model metrics fetch~~ model routing: cost/latency ranking with ranked fallback list Mar 28, 2026

fix pre-commit: black format metrics_server.py, remove trailing newli…

41e6b48

…ne in config.yaml

alex-paperspace approved these changes Mar 30, 2026

View reviewed changes

adilhafeez added 5 commits March 30, 2026 12:33

remove random selection policy — consumers can shuffle client-side

5b86964

Plano should only handle ranking that requires server-side data (cost metrics, latency). Random shuffling is trivial for callers.

Merge remote-tracking branch 'origin/main' into adil/top-level-routin…

87343e1

…g-preferences

use uv run in validate script for local dev

d96a2b3

Falls back to bare python when uv is not available (CI).

add day time unit for ratelimits

21d4806

PR #851 added ratelimit examples using unit: day but the Rust TimeUnit enum only had second/minute/hour. Adds Day variant and maps it to per-hour quota (tokens/24).

adilhafeez merged commit e5751d6 into main Mar 30, 2026
36 of 37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model routing: cost/latency ranking with ranked fallback list#849

model routing: cost/latency ranking with ranked fallback list#849
adilhafeez merged 13 commits intomainfrom
adil/top-level-routing-preferences

adilhafeez commented Mar 27, 2026 •

edited

Loading

Uh oh!

alex-paperspace left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adilhafeez commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

alex-paperspace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adilhafeez commented Mar 27, 2026 •

edited

Loading