Skip to content

Commit 9f838e9

Browse files
author
Mateusz
committed
fix: add first-request override for weighted composite routing
Support [first] annotations in weighted composite selectors so one branch can be pinned for the first request in a session before normal weighted routing resumes. Persist per-session consumption state, add parser/selector/coordinator/resolver coverage, and document the behavior in routing docs and changelog. Made-with: Cursor
1 parent 4197594 commit 9f838e9

15 files changed

Lines changed: 842 additions & 25 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ All notable changes to this project will be documented in this file.
1010

1111
### Added
1212

13+
- **Weighted Routing `[first]` Override**: Added `[first]` annotation for weighted composite selectors (`^`) that forces the tagged backend/model for the very first request of a session, bypassing the dice roll. Subsequent requests use normal weighted routing. Accepted forms: `[first]`, `[first=1]`, `[first=yes]`, `[first=true]`. Exactly one branch may be tagged; negative forms (`[first=false]`, etc.) are rejected. Weight on the first-tagged branch does not affect the first request. Session flag (`weighted_first_request_consumed`) is persisted after first routing and ignored on retry paths. See [Routing Selectors](docs/development_guide/routing-selectors.md).
1314
- **Composite Model Routing**: Added ordered failover (`|`) and weighted random (`^`) selector syntax for intelligent backend failover and traffic distribution
1415
- **OpenCode Go Connector**: New hybrid connector for OpenCode Go with dedicated user guide and environment key support
1516
- **Ollama Local Connector**: Connect to locally running Ollama instances with support for both local and cloud model discovery (30-min TTL cache)

docs/development_guide/routing-selectors.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,33 @@ Uses `^` with optional `[weight=N]` prefixes for weighted distribution:
4747

4848
This distributes traffic 75% to OpenAI and 25% to Anthropic.
4949

50+
#### First-Request Override in Weighted Routing
51+
52+
Add a `[first]` annotation to one branch to force that backend/model for the very first request of a session. Subsequent requests use normal weighted routing based on weights.
53+
54+
```
55+
openai-codex:gpt-5.3-codex?reasoning_effort=low^[first]openai-codex:gpt-5.4?reasoning_effort=xhigh
56+
```
57+
58+
On the first request of a session, the `[first]`-tagged branch is selected regardless of weight. From the second request onward, weighted routing applies normally (equal 50/50 split in this example since no `[weight=N]` annotations are present).
59+
60+
Accepted annotation forms: `[first]`, `[first=1]`, `[first=yes]`, `[first=true]`. Forms like `[first=false]`, `[first=0]`, `[first=no]` are rejected.
61+
62+
Rules:
63+
- **Exactly one** branch may be tagged `[first]` per weighted selector. Multiple `[first]` tags cause a validation error.
64+
- The `[first]` tag only affects the first request of a session. A session-level flag (`weighted_first_request_consumed`) is set after the first request is routed, ensuring subsequent requests use weighted selection even if routing fails and retries.
65+
- Retry paths (failover bridge) ignore the `[first]` tag and always use weighted selection among remaining candidates.
66+
- The weight on the `[first]`-tagged branch does not influence the first request; it only applies from the second request onward.
67+
- The `[first]` annotation is only valid within weighted (`^`) selectors. Using it in failover (`|`) selectors is a validation error.
68+
69+
Combining both annotations on the same branch:
70+
71+
```
72+
[first][weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet
73+
```
74+
75+
This uses gpt-4 for the first session request, then routes 75% / 25% weighted distribution from the second request onward.
76+
5077
### Selector Rules
5178

5279
1. **No mixing operators** - Composite selectors must not mix `|` and `^` in the same selector string. These are rejected during validation.
@@ -73,6 +100,9 @@ python -m src.core.cli --default-backend "openai:gpt-4o|anthropic:claude-3-5-son
73100

74101
# Weighted routing
75102
python -m src.core.cli --default-backend "[weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet"
103+
104+
# Weighted with first-request override
105+
python -m src.core.cli --default-backend "[weight=3]openai:gpt-4^[first][weight=1]anthropic:claude-3-5-sonnet"
76106
```
77107

78108
### Environment Variables

src/core/domain/composite_routing.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ class CompositeLeafSelector(DomainModel):
107107
raw_selector: str
108108
normalized_selector: str
109109
weight_annotation: int | None = None
110+
first_annotation: bool = False
110111
uri_params: dict[str, JsonValue] = Field(default_factory=dict)
111112
backend_type: str = ""
112113
model_name: str = ""
@@ -128,6 +129,13 @@ def _validate_weight_annotation(cls, value: int | None) -> int | None:
128129
raise ValueError("weight annotation must be positive")
129130
return value
130131

132+
@field_validator("first_annotation")
133+
@classmethod
134+
def _validate_first_annotation(cls, value: bool) -> bool:
135+
if not isinstance(value, bool):
136+
raise ValueError("first_annotation must be a boolean")
137+
return value
138+
131139

132140
class CompositeLeafNode(DomainModel):
133141
"""Leaf node that can be resolved by existing single-target selector semantics."""
@@ -296,6 +304,7 @@ class CompositeRoutingInput(DomainModel):
296304
configured_max_hops: int | None = None
297305
max_branch_history: int = _DEFAULT_BRANCH_HISTORY_LIMIT
298306
default_backend: str = ""
307+
prefer_first_weighted_branch: bool = False
299308

300309
@field_validator("selector")
301310
@classmethod

src/core/domain/session.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ class SessionState(ValueObject):
8888
# (main model, excluding tool-result continuations and replacement-model turns).
8989
quality_verifier_eligible_turn_count: int = 0
9090
auto_append_first_prompt_applied: bool = False
91+
weighted_first_request_consumed: bool = False
9192

9293
def with_backend_config(self, backend_config: BackendConfiguration) -> SessionState:
9394
"""Create a new session state with updated backend config."""
@@ -172,6 +173,10 @@ def with_auto_append_first_prompt_applied(self, applied: bool) -> SessionState:
172173
"""Create a new session state with updated auto-append-first-prompt flag."""
173174
return self.model_copy(update={"auto_append_first_prompt_applied": applied})
174175

176+
def with_weighted_first_request_consumed(self, consumed: bool) -> SessionState:
177+
"""Create a new session state with updated weighted_first_request_consumed flag."""
178+
return self.model_copy(update={"weighted_first_request_consumed": consumed})
179+
175180
def with_multiple_updates(self, **updates: Any) -> SessionState:
176181
"""Create a new session state with multiple field updates in a single model_copy operation.
177182
@@ -341,6 +346,11 @@ def auto_append_first_prompt_applied(self) -> bool:
341346
"""Whether the per-session first user-message append has already run."""
342347
return bool(getattr(self._state, "auto_append_first_prompt_applied", False))
343348

349+
@property
350+
def weighted_first_request_consumed(self) -> bool:
351+
"""Whether the weighted routing [first] request has been consumed."""
352+
return bool(getattr(self._state, "weighted_first_request_consumed", False))
353+
344354
@property
345355
def planning_phase_turn_count(self) -> int:
346356
"""Number of turns completed in planning phase."""

src/core/services/backend_model_resolver.py

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
parse_model_with_params,
2424
)
2525
from src.core.domain.request_context import RequestContext
26+
from src.core.domain.session import SessionState
2627
from src.core.interfaces.backend_lifecycle_manager_interface import (
2728
IBackendLifecycleManager,
2829
)
@@ -142,6 +143,18 @@ async def resolve_target(
142143
should_route_via_composite = True
143144

144145
if should_route_via_composite:
146+
session, prefer_first_weighted_branch = (
147+
await self._resolve_weighted_first_session_state(
148+
request=request,
149+
context=context,
150+
)
151+
)
152+
should_consume_weighted_first_flag = bool(
153+
session is not None
154+
and not bool(
155+
getattr(session.state, "weighted_first_request_consumed", False)
156+
)
157+
)
145158
composite_input = CompositeRoutingInput(
146159
selector=selector_for_routing,
147160
surface=routing_surface,
@@ -150,12 +163,27 @@ async def resolve_target(
150163
),
151164
configured_max_hops=self._resolve_max_hops_from_config(),
152165
default_backend=self._resolve_default_backend(),
166+
prefer_first_weighted_branch=prefer_first_weighted_branch,
153167
)
154-
return await self._composite_routing_service.resolve_target(
155-
routing_input=composite_input,
156-
request=request,
157-
context=context,
158-
)
168+
try:
169+
return await self._composite_routing_service.resolve_target(
170+
routing_input=composite_input,
171+
request=request,
172+
context=context,
173+
)
174+
finally:
175+
if should_consume_weighted_first_flag:
176+
session_for_update = cast(Any, session)
177+
base_state = self._session_state_as_session_state(
178+
session_for_update.state
179+
)
180+
if base_state is not None:
181+
session_for_update.update_state(
182+
base_state.with_weighted_first_request_consumed(True)
183+
)
184+
await self._session_service.update_session(
185+
session_for_update
186+
)
159187

160188
# Extract session ID and fetch session
161189
session_id = (
@@ -451,6 +479,50 @@ def _build_default_composite_routing_service(self) -> CompositeRoutingService:
451479
diagnostics_publisher=diagnostics_publisher,
452480
)
453481

482+
@staticmethod
483+
def _session_state_as_session_state(state_obj: Any) -> SessionState | None:
484+
if isinstance(state_obj, SessionState):
485+
return state_obj
486+
487+
to_dict_fn = getattr(state_obj, "to_dict", None)
488+
if callable(to_dict_fn):
489+
raw_state = to_dict_fn()
490+
if isinstance(raw_state, dict):
491+
try:
492+
return cast(SessionState, SessionState.from_dict(raw_state))
493+
except Exception:
494+
return None
495+
return None
496+
497+
async def _resolve_weighted_first_session_state(
498+
self,
499+
*,
500+
request: ChatRequest,
501+
context: RequestContext | None,
502+
) -> tuple[Any | None, bool]:
503+
session_id: str | None = None
504+
if context is not None and isinstance(context.session_id, str):
505+
candidate = context.session_id.strip()
506+
if candidate:
507+
session_id = candidate
508+
if session_id is None and isinstance(request.extra_body, dict):
509+
extra_session_id = request.extra_body.get("session_id")
510+
if isinstance(extra_session_id, str):
511+
candidate = extra_session_id.strip()
512+
if candidate:
513+
session_id = candidate
514+
515+
if session_id is None:
516+
return None, True
517+
518+
session = cast(Any, await self._session_service.get_session(session_id))
519+
if session is None:
520+
return None, True
521+
522+
return session, not bool(
523+
getattr(session.state, "weighted_first_request_consumed", False)
524+
)
525+
454526
def _resolve_default_backend(self) -> str:
455527
app_config = cast(AppConfig, self._config)
456528
if hasattr(app_config, "backends"):

src/core/services/composite_routing_coordinator.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,9 @@ async def execute(
103103
)
104104
return resolved
105105
if isinstance(root, CompositeWeightedGroupNode):
106-
selected_leaf = self._weighted_branch_selector.select(root)
106+
selected_leaf = self._weighted_branch_selector.select(
107+
root, prefer_first=routing_input.prefer_first_weighted_branch
108+
)
107109
resolved = await self._resolve_leaf(
108110
request=request,
109111
context=context,

src/core/services/composite_selector_parser.py

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@
2525
__all__ = ["CompositeSelectorParser"]
2626

2727
_WEIGHT_PREFIX_PATTERN = re.compile(r"^\[weight=([^\]]+)\](.*)$")
28+
_FIRST_PREFIX_PATTERN = re.compile(r"^\[first(?:=(true|yes|1))?\](.*)$", re.IGNORECASE)
29+
_FIRST_NEGATIVE_PREFIX_PATTERN = re.compile(
30+
r"^\[first=(false|no|0)\](.*)$", re.IGNORECASE
31+
)
2832
_INTEGER_PATTERN = re.compile(r"^[+-]?\d+$")
2933

3034

@@ -103,6 +107,13 @@ def parse(self, routing_input: CompositeRoutingInput) -> CompositeRoutePlan:
103107
)
104108
for segment in parts
105109
]
110+
first_count = sum(1 for leaf in leaves if leaf.leaf.first_annotation)
111+
if first_count > 1:
112+
self._raise_validation_error(
113+
code=CompositeValidationErrorCode.UNSUPPORTED_CONSTRUCT,
114+
selector=routing_input.selector,
115+
message="Only one branch can have a [first] annotation in a weighted group.",
116+
)
106117
normalized = "^".join(leaf.normalized_leaf_for_plan for leaf in leaves)
107118
return CompositeRoutePlan(
108119
source_selector=routing_input.selector,
@@ -182,25 +193,51 @@ def _parse_leaf(
182193
)
183194

184195
weight_annotation: int | None = None
196+
first_annotation: bool = False
185197
normalized_leaf_selector = raw_leaf_text
186198
normalized_leaf_for_plan = raw_leaf_text
187199

188200
if is_weighted_group:
189-
weight_annotation, normalized_leaf_selector = self._extract_weight_prefix(
190-
raw_leaf_text,
201+
remaining = raw_leaf_text
202+
weight_annotation = None
203+
first_annotation = False
204+
205+
# Extract first prefix regardless of order (could be [first] or [weight=N])
206+
first_annotation, remaining = self._extract_first_prefix(
207+
remaining,
191208
source_selector=routing_input.selector,
192209
)
210+
weight_annotation, remaining = self._extract_weight_prefix(
211+
remaining,
212+
source_selector=routing_input.selector,
213+
)
214+
215+
# If first pass didn't find [first], try after extracting [weight]
216+
if not first_annotation:
217+
first_annotation, remaining = self._extract_first_prefix(
218+
remaining,
219+
source_selector=routing_input.selector,
220+
)
221+
222+
normalized_leaf_selector = remaining
193223
if weight_annotation is None:
194224
weight_annotation = 1
195-
normalized_leaf_for_plan = (
196-
f"[weight={weight_annotation}]{normalized_leaf_selector}"
197-
)
225+
prefix_parts = ""
226+
prefix_parts += f"[weight={weight_annotation}]"
227+
prefix_parts += "[first]" if first_annotation else ""
228+
normalized_leaf_for_plan = f"{prefix_parts}{normalized_leaf_selector}"
198229
elif raw_leaf_text.startswith("[weight="):
199230
self._raise_validation_error(
200231
code=CompositeValidationErrorCode.UNSUPPORTED_CONSTRUCT,
201232
selector=routing_input.selector,
202233
message="Weight annotations are only supported for weighted ('^') selectors.",
203234
)
235+
elif "[first" in raw_leaf_text.lower() and raw_leaf_text.startswith("[first"):
236+
self._raise_validation_error(
237+
code=CompositeValidationErrorCode.UNSUPPORTED_CONSTRUCT,
238+
selector=routing_input.selector,
239+
message="First annotations are only supported for weighted ('^') selectors.",
240+
)
204241

205242
parsed_leaf = parse_model_with_params(
206243
normalized_leaf_selector,
@@ -242,6 +279,7 @@ def _parse_leaf(
242279
raw_selector=raw_leaf_text,
243280
normalized_selector=normalized_leaf_selector,
244281
weight_annotation=weight_annotation if is_weighted_group else None,
282+
first_annotation=first_annotation if is_weighted_group else False,
245283
uri_params=parsed_leaf.uri_params,
246284
backend_type=parsed_leaf.backend_type,
247285
model_name=parsed_leaf.model_name,
@@ -296,6 +334,37 @@ def _extract_weight_prefix(
296334

297335
return weight_value, trailing_selector.strip()
298336

337+
def _extract_first_prefix(
338+
self,
339+
leaf_text: str,
340+
*,
341+
source_selector: str,
342+
) -> tuple[bool, str]:
343+
# Reject negative forms like [first=false], [first=0], [first=no]
344+
neg_match = _FIRST_NEGATIVE_PREFIX_PATTERN.match(leaf_text)
345+
if neg_match:
346+
self._raise_validation_error(
347+
code=CompositeValidationErrorCode.UNSUPPORTED_CONSTRUCT,
348+
selector=source_selector,
349+
message=f"Unsupported [first] annotation '{neg_match.group(0)}'. Use [first] without negation.",
350+
)
351+
352+
match = _FIRST_PREFIX_PATTERN.match(leaf_text)
353+
if not match:
354+
return False, leaf_text
355+
356+
trailing_selector = match.group(2) or ""
357+
if not trailing_selector or trailing_selector[0].isspace():
358+
self._raise_validation_error(
359+
code=CompositeValidationErrorCode.UNSUPPORTED_CONSTRUCT,
360+
selector=source_selector,
361+
message=(
362+
"[first] must appear immediately before a selector without whitespace."
363+
),
364+
)
365+
366+
return True, trailing_selector.strip()
367+
299368
@staticmethod
300369
def _raise_validation_error(
301370
*,

src/core/services/weighted_branch_selector.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,30 @@ def select_index_from_weights(self, weights: Sequence[int]) -> int:
6060

6161
return len(normalized_weights) - 1
6262

63-
def select(self, weighted_node: CompositeWeightedGroupNode) -> CompositeLeafNode:
63+
def select(
64+
self,
65+
weighted_node: CompositeWeightedGroupNode,
66+
*,
67+
prefer_first: bool = False,
68+
) -> CompositeLeafNode:
6469
if not weighted_node.children:
6570
raise ValueError("Weighted node must contain at least one branch.")
6671

72+
if prefer_first:
73+
first_branches = [
74+
child
75+
for child in weighted_node.children
76+
if child.leaf_selector.first_annotation
77+
]
78+
if len(first_branches) == 1:
79+
return first_branches[0]
80+
if len(first_branches) > 1:
81+
raise ValueError(
82+
"multiple branches annotated with [first]; "
83+
"exactly one is required when prefer_first is enabled"
84+
)
85+
# No first_annotation found — fall through to weighted selection
86+
6787
normalized_weights: list[int] = []
6888
for branch in weighted_node.children:
6989
weight = branch.leaf_selector.weight_annotation

0 commit comments

Comments
 (0)