feat(client): add dynamo_chat transport + routed_experts to renderer generate by biswapanda · Pull Request #79 · PrimeIntellect-ai/renderers

biswapanda · 2026-06-09T00:19:26Z

Description

Adds a dynamo_chat transport to the renderer-based generate() client so it can run against NVIDIA Dynamo, which serves no /inference/v1/generate route. Selected per-call via transport=; defaults to the existing vLLM path, so behavior is unchanged unless opted in.

Two transports:

vllm_generate (default): unchanged — messages → render_ids() → POST /inference/v1/generate → parse_response() (vLLM TITO surface).
dynamo_chat: messages → render_ids() → POST /v1/chat/completions with nvext.token_data (pre-tokenized prompt) + nvext.extra_fields=["engine_data"]. Completion token IDs and logprobs are read back from nvext.engine_data.

Dynamo wire shape (`_post_dynamo_chat`)

Mirrors the verifiers token client so the payload is identical whether a rollout goes through the token client or the renderer client. nvext.token_data (Dynamo skips tokenization when present); cache_salt → nvext.cache_salt, priority → nvext.agent_hints.priority; a single placeholder user message; sampling remap (max_tokens → max_completion_tokens, logprobs=N → logprobs=true + top_logprobs=N); passthrough fields ride the Dynamo allowlist. Tools are baked into token_data by the renderer (not sent on the wire).

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

(Supersedes the earlier "routed_experts intentionally NOT surfaced" note — it now is.) parse reads routed_experts from nvext.routed_experts (or nvext.engine_data.routed_experts) and maps it to the downstream RoutedExpertsPayload {data, shape, start, dtype}. The Dynamo worker returns full-sequence routing with start=0; the renderer row-trims the leading prompt rows only when the caller explicitly sets routed_experts_prompt_start — a first-turn request with no caller start stays full-sequence with start=0 (no phantom prefix). Completion logprobs prefer nvext.engine_data.completion_logprobs (the same authoritative source as the engine token IDs) over the chat echo; a present-but-empty engine list is authoritative and does not fall back to chat.

Other

Public RendererTransport = Literal["vllm_generate", "dynamo_chat"] alias. A present-but-empty completion_token_ids is a valid zero-token completion; only a fully absent field raises. Multimodal renderers raise NotImplementedError on dynamo_chat (vLLM path / token-client TITO remain available for VLMs).

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (F1/F2/F3 + the N1 logprob-presence finding resolved; head 5f2a914). All review threads resolved.

Testing

tests/test_client.py covers the Dynamo request body shape (priority/detokenize/sampling remap), routed_experts parse + row-trim (explicit prompt_start vs first-turn full-sequence), engine-logprob preference incl. present-but-empty, and missing/empty completion IDs.

Note

Medium Risk
New Dynamo wire/parse path affects RL-critical completion IDs, logprobs, and MoE routed_experts; strict runtime errors and no Dynamo multimodal are new failure modes for opted-in rollouts.

Overview
Adds a per-call transport parameter to generate() ("vllm" default, "dynamo" opt-in). The existing vLLM TITO flow is moved into _VllmGenerateTransport; behavior stays the same when transport is omitted.

Dynamo uses _DynamoChatTransport: pre-tokenized prompts go to POST /v1/chat/completions via nvext.token_data, with cache_salt, priority, and routed_experts_prompt_start mapped into nvext and vLLM-only sampling keys dropped. Responses read nvext.engine_data for completion IDs and logprobs (not chat echo), normalize routed_experts, keep large blobs as zero-copy memoryview, and optionally client-trim prompt rows when an older worker returns full-sequence routing.

Multimodal on Dynamo raises NotImplementedError; missing engine completion IDs or logprob length mismatches raise RuntimeError. Tests cover wire shape, nvext merge, and parse edge cases.

^{Reviewed by Cursor Bugbot for commit 57846ec. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add `dynamo` transport and routed-experts support to `generate()` in the renderer client

Adds a transport parameter to generate() in renderers/client.py, defaulting to 'vllm' (existing /inference/v1/generate path); passing 'dynamo' routes to OpenAI-compatible /v1/chat/completions with NVIDIA Dynamo nvext fields.
Introduces a _Transport ABC with _VllmGenerateTransport and _DynamoChatTransport implementations; each handles body construction, POST, and response normalization into a common _WireResult.
The Dynamo transport maps sampling params (dropping vLLM-only keys), moves cache_salt/priority into nvext, and prefers engine_data fields over chat-echo fields when parsing responses.
Adds client-side trimming of base64-encoded routed_experts via _trim_dynamo_routed_experts when routed_experts_prompt_start is set and the worker has not already trimmed.
Risk: generate() with transport='dynamo' raises RuntimeError on missing completion_token_ids or logprob/token-ID length mismatches, where the vLLM path does not.

^{Macroscope summarized 57846ec.}

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

…d, drop routed_experts on dynamo (codex round 2)

…ake); docstring fix

…ached endpoints

…erge nvext, canonical completion-ids, logprobs alignment)

… on dynamo path

…payload to contract

…only

…fset

…ing only)

…first-turn stays full)

…ith engine ids

… chat fallback)

…trim is now a back-compat fallback

…omments

…oid event-loop json.loads)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fec0a81. Configure here.}

…comments

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

biswapanda mentioned this pull request Jun 9, 2026

feat: dynamo inference backend integration PrimeIntellect-ai/prime-rl#2737

Open

1 task

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer generate()~~ feat(client): add dynamo_chat_nvext transport to renderer Jun 9, 2026

biswapanda added 2 commits June 8, 2026 19:11

feat(client): add transport selector + dynamo_chat_nvext branch

334e496

feat: forward Dynamo nvext TITO fields

6a21574

biswapanda force-pushed the rl-sdk-4 branch from 268e16b to 6a21574 Compare June 9, 2026 02:13

fix(client): address codex review — revert default vLLM path, drop to…

a35e023

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

biswapanda mentioned this pull request Jun 9, 2026

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) PrimeIntellect-ai/verifiers#1574

Open

1 task

biswapanda added 2 commits June 9, 2026 01:21

fix(client): gate nvext fallbacks to dynamo path, fix zero-token guar…

b6f50d0

…d, drop routed_experts on dynamo (codex round 2)

test(client): prove routed_experts dropped on dynamo (Dynamo-shaped f…

5dbf494

…ake); docstring fix

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer~~ feat(client): add dynamo_chat transport to renderer generate() Jun 9, 2026

biswapanda added 4 commits June 9, 2026 15:31

style: apply ruff format to client + tests (fix CI)

287871c

docs(client): trim verbose comments in dynamo_chat path

6041134

refactor(client): replace transport if/else with strategy classes + c…

503846c

…ached endpoints

fix(client): address codex F1-F4 on dynamo_chat (denylist sampling, m…

ed03eaa

…erge nvext, canonical completion-ids, logprobs alignment)

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate()~~ feat(client): add dynamo_chat transport to renderer generate Jun 10, 2026

biswapanda added 2 commits June 9, 2026 20:05

fix(client): route sampling_params cache_salt and priority into nvext…

eb0bdb2

… on dynamo path

feat(client): surface routed_experts on dynamo_chat transport

30c01b6

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

biswapanda added 8 commits June 10, 2026 11:30

fix(client): drop duplicate routed_experts request; normalize parsed …

28e3d02

…payload to contract

test(client): update dynamo extra_fields expectations to engine_data …

c31854f

…only

fix(client): stamp routed_experts.start on dynamo_chat from prompt of…

59553da

…fset

test(client): expect stamped routed_experts.start on dynamo_chat

b7927f8

fix(client): trim dynamo_chat routed_experts rows to start (was stamp…

b554520

…ing only)

fix(client): only trim routed_experts when caller sets prompt_start (…

010c894

…first-turn stays full)

fix(client): prefer engine_data.completion_logprobs to stay aligned w…

51d9154

…ith engine ids

fix(client): treat present-empty engine logprobs as authoritative (no…

5f2a914

… chat fallback)

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate~~ feat(client): add dynamo_chat transport + routed_experts to renderer generate Jun 10, 2026

feat(client): send routed_experts_prompt_start in nvext; client-side …

7567377

…trim is now a back-compat fallback

biswapanda mentioned this pull request Jun 11, 2026

feat(RL): forward routed_experts_prompt_start via nvext ai-dynamo/dynamo#10562

Open

3 tasks

biswapanda added 2 commits June 10, 2026 17:30

docs(client): drop PR-number references and stale vLLM version from c…

f5c480d

…omments

perf(client): zero-copy routed_experts on dynamo_chat (strip blob, av…

b62aabf

…oid event-loop json.loads)

biswapanda mentioned this pull request Jun 11, 2026

feat(client): add Dynamo inference backend PrimeIntellect-ai/prime-rl#2773

Open

biswapanda added 2 commits June 12, 2026 02:20

fix(dynamo): require aligned renderer logprobs

fe93638

chore(client): rename renderer transport values

fec0a81

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread renderers/client.py

fix(client): fall back to engine routed experts

b9d25b1

AmeenP reviewed Jun 18, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

AmeenP reviewed Jun 18, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

AmeenP reviewed Jun 18, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

fix(client): hard-fail on unknown routed_experts dtype; trim verbose …

57846ec

…comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79
biswapanda wants to merge 26 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

biswapanda commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dynamo wire shape (_post_dynamo_chat)

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

Other

Type of Change

Review

Testing

Add dynamo transport and routed-experts support to generate() in the renderer client

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Dynamo wire shape (`_post_dynamo_chat`)

Add `dynamo` transport and routed-experts support to `generate()` in the renderer client