Skip to content

delegate-to-ai: role-alias route table + local-offload criteria #348

@JacobPEvans-personal

Description

@JacobPEvans-personal

delegate-to-ai: role-alias route table + local-offload criteria

Context

The local MLX serving stack (llama-swap on 127.0.0.1:11434, Bifrost gateway at
localhost:30080) now serves a fast, crash-free MoE workhorse (~80 tok/s decode,
stable under 4-way concurrency, measured 2026-06-09 on the M4 Max). The
delegate-to-ai skill's route table still lists stale physical model ids
(Qwen3-235B-A22B, Qwen3.5-122B, Qwen3.5-27B) that are no longer registered.

Changes

  1. Route table → role aliases. Replace physical model ids with the llama-swap role
    aliases, which survive serving-side model swaps:

    • mlx-local/default — general summarize/extract/classify workhorse
    • mlx-local/quickest — shortest-latency small model
    • mlx-local/coding — code-leaning tasks
    • Paths: Bifrost http://localhost:30080/v1/chat/completions, or PAL chat with
      mlx-local/default.
  2. Add a "Local offload criteria" section, e.g.:

    Route to the local model when ALL hold:

    • task is summarization / extraction / classification / boilerplate / bulk
      mechanical transforms,
    • input fits ~30K chars,
    • no multi-step reasoning or correctness-critical judgment required.

    Everything else stays on cloud models.

  3. Keep cloud routing rules unchanged.

Verification

curl -s http://localhost:30080/v1/chat/completions -d '{"model":"mlx-local/default",...}'
returns 200 with a completion in a few seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions