Skip to content

[codex] Support raw image offload in v1 train client#1746

Open
eligotts wants to merge 11 commits into
mainfrom
codex/v1-raw-image-offload
Open

[codex] Support raw image offload in v1 train client#1746
eligotts wants to merge 11 commits into
mainfrom
codex/v1-raw-image-offload

Conversation

@eligotts

@eligotts eligotts commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Design update — inline/offload image storage

This PR now follows the prime-rl multimodal image storage policy:

  • offload: current behavior, rewrite base64 data images to file:// run assets and require file-backed image URLs.
  • inline: keep data:image/...;base64,... URLs in the message payload and validate them without rewriting.

TrainClient now calls the policy-aware image preparation helper, so prime-rl can be the single source of truth via environment/config propagation.

Validation after latest push: uv run pytest tests/v1/test_train_client_multimodal.py -q passed (5 passed). Commit/push hooks also passed (ruff check, ruff format, generated AGENTS/CLAUDE check, ty).

Design update — dropped the None/cache-only image path

This PR and its companions (prime-rl #2836 / verifiers #1746 / renderers #89) no longer use the "send None for already-cached images" mechanism. Every image carries its raw descriptor ref at every slot (current and prior turns); /inference/v1/generate rematerializes each ref from disk every request.

Why: the None path coupled correctness to deployment (LRU cache present, single replica / DP-affinity, no eviction) and surfaced a miss as a hard vLLM EngineDeadError (qwen3-vl mrope dereferences a None image_grid_thw) that the retry net couldn't catch across the engine→API IPC. Dropping it is deployment-agnostic (a miss is impossible) and non-hacky. vLLM's mm_hash encoder cache still skips the expensive GPU re-encode for free — we only forgo the cheap IPC/CPU-reprocess dedup.

Validated: color-codeword (Qwen3-VL-4B) under DP=2, no affinity / no cache reliance: 0 crashes, 0 data=None, multi-turn accumulation correct, reward ~0.84. Also confirmed under TP.

This repo: with every image carrying its ref, no cache miss can occur — removed the retry subsystem (_generate_with_image_ref_retry, _has_descriptor_only_images, _retryable_mm_error_type, _json_error_type, _RETRYABLE_MM_ERROR_TYPES). Rollouts call renderers.client.generate directly. Obsolete retry tests removed.


Original description

Summary

  • tighten v1 multimodal graph serialization around strict raw descriptor sidecars
  • reject processed multimodal payload keys recursively, including nested pixel_values, image_embeds, and image_features
  • update v1 multimodal tests to use strict prime_raw_mm_item envelopes instead of descriptor-only Qwen payloads
  • keep raw image offload and retry behavior aligned with the companion Renderers and Prime-RL PRs

Companion PRs

Notes

  • Draft/WIP: this depends on the renderer generic raw multimodal ref contract in the companion PR.
  • v1 multimodal sidecars intentionally carry raw descriptors only, not processed image tensors or image-processor payloads.
  • Prime/vLLM materialization happens from raw image refs rather than Verifiers-held processor outputs.

Validation

  • Commit hooks: ruff check, ruff format, generated AGENTS/CLAUDE check passed.
  • Push hook: ty (ci parity) passed.
  • End-to-end hosted-style smoke through Prime-RL with /home/ubuntu/verifiers, /home/ubuntu/renderers, and /home/ubuntu/prime-rl-v1-raw-mm-offload completed inference, env rollouts, train batch creation, trainer step 0, and decoded strict trainer-bound raw image refs.

[!NOTE]

Support raw image offload in the v1 train client for multimodal conversations

  • Adds prepare_images_inplace in multimodal.py that recursively walks message structures, offloads image URLs to run-asset file:// paths, and raises RuntimeError for invalid or non-offloaded URLs.
  • Adds prepare_request_body and prepare_messages hooks to the base Client class; TrainClient overrides both to run image preparation in a background thread for chat-dialect requests.
  • Extends TrainClient.get_response to allow bridge-to-next-turn for multimodal conversations, passing previous_multi_modal_data to multimodal renderers to maintain placeholder alignment.
  • Updates MessageNode serialization in graph.py to reject processed tensor payloads and only accept raw image descriptor sidecars; _attribute_mm now also tracks mm_placeholders.
  • Risk: image URLs that are not offloadable or do not resolve to file:// will raise at request time, breaking any callers that previously passed through non-file image URLs.

Macroscope summarized 22c7cf4.


Note

Medium Risk
Touches the multimodal training hot path (image offload failures halt rollouts at request time) and changes wire/graph multimodal payload shape, which must stay aligned with companion renderers and prime-rl PRs.

Overview
Adds prepare_images_inplace so renderer-backed paths rewrite image_url parts to file:// run assets (via renderers.mm_store) before tokenization, and wires that through v0 RendererClient.to_native_prompt, v1 TrainClient (prepare_request_body / prepare_messages), and the interception server so harness bodies and user-simulator messages match what the trace and trainer see.

Multimodal training sidecars move from processed tensors to raw image descriptors (raw_image_uri required; pixel_values / embed keys rejected). Graph serialization no longer numpy-encodes mm_items; mm_placeholders are attributed per node and merged on branches. PendingTurn.previous_multi_modal_data() feeds bridge_to_next_turn so multi-turn multimodal prompts can bridge (the old multimodal bridge block is removed).

Legacy v0 rollouts preserve live trajectory multi_modal_data when mapping to v1 traces. Docs and type comments are updated; eval dashboard reward rows use an explicit loop (no behavior change).

Reviewed by Cursor Bugbot for commit 22c7cf4. Bugbot is set up for automated code reviews on this repo. Configure here.

@eligotts eligotts force-pushed the codex/v1-raw-image-offload branch from 3f5bb1a to de37650 Compare June 25, 2026 06:40
Comment thread verifiers/v1/clients/train.py
Comment thread verifiers/v1/graph.py Outdated
Comment thread verifiers/v1/clients/train.py Outdated
S1ro1 and others added 3 commits June 27, 2026 00:18
Every image carries its ref, so no cache miss can occur. Removes _generate_with_image_ref_retry / _has_descriptor_only_images / _retryable_mm_error_type / _json_error_type / _RETRYABLE_MM_ERROR_TYPES; rollouts call generate() directly. Obsolete retry tests removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fload

# Conflicts:
#	verifiers/v1/clients/train.py
Comment thread verifiers/v1/utils/multimodal.py Outdated
@eligotts eligotts marked this pull request as ready for review June 29, 2026 16:36

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.

result = _offload_image_url(_image_source_url(source), image_dir)
if result is not None:
_set_image_source_url(source, result)
_require_file_image_url(source)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline image URLs always rejected

High Severity

The v1 train path always requires every image_url to become a file:// run asset after preparation, even when offload leaves a data:image/...;base64,... URL unchanged. That conflicts with the intended inline multimodal storage mode where base64 image URLs stay in the message payload, so inline training rollouts fail at request preparation instead of validating in place.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed inline mode so this is irrelevant

@macroscopeapp

macroscopeapp Bot commented Jun 29, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

2 blocking correctness issues found. This PR adds new multimodal image handling capabilities with significant new code and runtime behavior changes. Multiple unresolved review comments identify potential bugs including rejected inline images (high severity) and backwards compatibility concerns for existing saved rollouts.

You can customize Macroscope's approvability policy. Learn more.

Comment thread verifiers/v1/graph.py
return False


def _validate_raw_mm_item(item: Any) -> dict[str, Any]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium v1/graph.py:76

_validate_raw_mm_item now unconditionally rejects processed multimodal payloads containing keys like pixel_values, and deserialize_multi_modal_data runs it on every multi_modal_data field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain pixel_values now raises TypeError instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's before path) so old traces can still be loaded.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/graph.py around line 76:

`_validate_raw_mm_item` now unconditionally rejects processed multimodal payloads containing keys like `pixel_values`, and `deserialize_multi_modal_data` runs it on every `multi_modal_data` field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain `pixel_values` now raises `TypeError` instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's `before` path) so old traces can still be loaded.

Comment on lines +64 to +67
if value.get("type") == "image_url":
source = value.get("image_url")
if source is not None:
_prepare_image_source(source, image_dir=image_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium utils/multimodal.py:64

prepare_images_inplace skips validation when an image_url part has a missing or None image_url field: lines 65-67 only call _prepare_image_source when source is not None, so the malformed part passes through unchecked. Downstream, ChatDialect.parse_request normalizes it to ImageUrlSource(url=""), forwarding a request with an empty image URL instead of rejecting it. Consider calling _require_file_image_url(value) (or otherwise validating) when source is None so malformed parts are rejected.

Suggested change
if value.get("type") == "image_url":
source = value.get("image_url")
if source is not None:
_prepare_image_source(source, image_dir=image_dir)
if value.get("type") == "image_url":
source = value.get("image_url")
if source is not None:
_prepare_image_source(source, image_dir=image_dir)
else:
_require_file_image_url(value)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/utils/multimodal.py around lines 64-67:

`prepare_images_inplace` skips validation when an `image_url` part has a missing or `None` `image_url` field: lines 65-67 only call `_prepare_image_source` when `source is not None`, so the malformed part passes through unchecked. Downstream, `ChatDialect.parse_request` normalizes it to `ImageUrlSource(url="")`, forwarding a request with an empty image URL instead of rejecting it. Consider calling `_require_file_image_url(value)` (or otherwise validating) when `source` is `None` so malformed parts are rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants