[FEATURE] ModelRouter: per-rule and per-backend latency timeout fields

## Feature Description

Add `timeout` (`*metav1.Duration`) to `ModelRouter.spec.rules[]` and
`ModelRouter.spec.backends[]`. At dispatch time the proxy applies
the timeout via a per-request `context.WithTimeout` with the
resolution order `rule.timeout || backend.timeout || proxy default`.

## Problem Statement

> As a platform operator with rules of mixed latency profiles in a
> single ModelRouter, I want each rule and backend to have its own
> timeout budget, so PII rules can fail-fast while long reasoning
> rules can be patient.

Today there is exactly one proxy-side timeout
(`ResponseHeaderTimeout`, currently 30s). This collapses three
operationally distinct profiles into one:

- **Strict / fail-fast rules** (sensitive data, health, sync agent
  steps): want sub-10s caps. Hung backend should fail fast so the
  caller falls over to its own retry.
- **Normal interactive chat** (default route): 30-60s.
- **Deliberately slow reasoning** (`taskComplexity: complex`,
  Anthropic Claude thinking, DeepSeek-V4 reasoning): 120s+.

Per-backend overrides matter too: in-cluster Gemma 12B at ~30tok/s
and a remote Anthropic API with global LB in front have different
P99 envelopes for the same prompt.

## Proposed Solution

### Per-rule

```yaml
rules:
  - name: pii-stays-local
    match: { dataClassification: [pii] }
    route: { backends: [local-chat] }
    failClosed: true
    timeout: 8s            # NEW

  - name: complex-to-chat
    match: { taskComplexity: complex }
    route:
      backends: [local-chat, local-coder]
      strategy: primary-fallback
    timeout: 120s          # NEW
```

`spec.rules[].timeout` (`*metav1.Duration`, optional). Overrides
the proxy default for any dispatch matched by this rule. Applied
via per-request `context.WithTimeout`, not by mutating the shared
transport.

### Per-backend

```yaml
backends:
  - name: cloud-anthropic
    external:
      provider: anthropic
      model: claude-opus-4-7
    tier: cloud
    timeout: 180s          # NEW
```

`spec.backends[].timeout` (`*metav1.Duration`, optional).

### Resolution order

`rule.timeout` wins over `backend.timeout` wins over the proxy
default (set in #457).

### Status surface

`status.rules[].observedTimeout` reflects what the proxy is
actually using after resolution. Helps catch CRD-to-runtime drift
via `kubectl describe`.

## Alternatives Considered

- **One global flag (`--response-header-timeout`), no per-rule**:
  shipped in #457. Sufficient for single-purpose routers; falls
  apart on mixed-workload routers where PII and long reasoning
  share a config.
- **Per-namespace timeout**: too coarse; doesn't help when one
  ModelRouter mixes routes.
- **Header-driven timeout (`x-llmkube-timeout`)**: leaks operator
  policy to application teams. Wrong layer.

## Additional Context

- Related issues: #457 (global default knob — necessary precursor),
  #459 (keep-alive pool health — addresses the same symptom from a
  different angle).
- Mentioned in epic #437 as 'latency SLO' but not wired in v1alpha1.
- Surfaced by `hack/demo-modelrouter.sh`: the PII rule and the
  complex-reasoning rule both target Gemma but want very different
  timeouts. PII shouldn't wait 30s for a hung backend before fail-
  closing; complex reasoning shouldn't 503 on a healthy 60s call.

## Priority

- [x] High - Would significantly improve my workflow
- Mixed-workload ModelRouters are the common enterprise case;
  single-timeout is awkward.

## Willingness to Contribute

- [x] Yes, I can submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] ModelRouter: per-rule and per-backend latency timeout fields #458

Feature Description

Problem Statement

Proposed Solution

Per-rule

Per-backend

Resolution order

Status surface

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] ModelRouter: per-rule and per-backend latency timeout fields #458

Description

Feature Description

Problem Statement

Proposed Solution

Per-rule

Per-backend

Resolution order

Status surface

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions