[codex] Support raw image offload in v1 train client#1746
Conversation
7556743 to
3f5bb1a
Compare
3f5bb1a to
de37650
Compare
Every image carries its ref, so no cache miss can occur. Removes _generate_with_image_ref_retry / _has_descriptor_only_images / _retryable_mm_error_type / _json_error_type / _RETRYABLE_MM_ERROR_TYPES; rollouts call generate() directly. Obsolete retry tests removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fload # Conflicts: # verifiers/v1/clients/train.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.
| result = _offload_image_url(_image_source_url(source), image_dir) | ||
| if result is not None: | ||
| _set_image_source_url(source, result) | ||
| _require_file_image_url(source) |
There was a problem hiding this comment.
Inline image URLs always rejected
High Severity
The v1 train path always requires every image_url to become a file:// run asset after preparation, even when offload leaves a data:image/...;base64,... URL unchanged. That conflicts with the intended inline multimodal storage mode where base64 image URLs stay in the message payload, so inline training rollouts fail at request preparation instead of validating in place.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.
There was a problem hiding this comment.
removed inline mode so this is irrelevant
ApprovabilityVerdict: Needs human review 2 blocking correctness issues found. This PR adds new multimodal image handling capabilities with significant new code and runtime behavior changes. Multiple unresolved review comments identify potential bugs including rejected inline images (high severity) and backwards compatibility concerns for existing saved rollouts. You can customize Macroscope's approvability policy. Learn more. |
| return False | ||
|
|
||
|
|
||
| def _validate_raw_mm_item(item: Any) -> dict[str, Any]: |
There was a problem hiding this comment.
🟡 Medium v1/graph.py:76
_validate_raw_mm_item now unconditionally rejects processed multimodal payloads containing keys like pixel_values, and deserialize_multi_modal_data runs it on every multi_modal_data field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain pixel_values now raises TypeError instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's before path) so old traces can still be loaded.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/graph.py around line 76:
`_validate_raw_mm_item` now unconditionally rejects processed multimodal payloads containing keys like `pixel_values`, and `deserialize_multi_modal_data` runs it on every `multi_modal_data` field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain `pixel_values` now raises `TypeError` instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's `before` path) so old traces can still be loaded.
| if value.get("type") == "image_url": | ||
| source = value.get("image_url") | ||
| if source is not None: | ||
| _prepare_image_source(source, image_dir=image_dir) |
There was a problem hiding this comment.
🟡 Medium utils/multimodal.py:64
prepare_images_inplace skips validation when an image_url part has a missing or None image_url field: lines 65-67 only call _prepare_image_source when source is not None, so the malformed part passes through unchecked. Downstream, ChatDialect.parse_request normalizes it to ImageUrlSource(url=""), forwarding a request with an empty image URL instead of rejecting it. Consider calling _require_file_image_url(value) (or otherwise validating) when source is None so malformed parts are rejected.
| if value.get("type") == "image_url": | |
| source = value.get("image_url") | |
| if source is not None: | |
| _prepare_image_source(source, image_dir=image_dir) | |
| if value.get("type") == "image_url": | |
| source = value.get("image_url") | |
| if source is not None: | |
| _prepare_image_source(source, image_dir=image_dir) | |
| else: | |
| _require_file_image_url(value) |
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/utils/multimodal.py around lines 64-67:
`prepare_images_inplace` skips validation when an `image_url` part has a missing or `None` `image_url` field: lines 65-67 only call `_prepare_image_source` when `source is not None`, so the malformed part passes through unchecked. Downstream, `ChatDialect.parse_request` normalizes it to `ImageUrlSource(url="")`, forwarding a request with an empty image URL instead of rejecting it. Consider calling `_require_file_image_url(value)` (or otherwise validating) when `source` is `None` so malformed parts are rejected.


Design update — inline/offload image storage
This PR now follows the prime-rl multimodal image storage policy:
offload: current behavior, rewrite base64 data images tofile://run assets and require file-backed image URLs.inline: keepdata:image/...;base64,...URLs in the message payload and validate them without rewriting.TrainClientnow calls the policy-aware image preparation helper, so prime-rl can be the single source of truth via environment/config propagation.Validation after latest push:
uv run pytest tests/v1/test_train_client_multimodal.py -qpassed (5 passed). Commit/push hooks also passed (ruff check,ruff format, generated AGENTS/CLAUDE check,ty).Design update — dropped the
None/cache-only image pathThis PR and its companions (prime-rl #2836 / verifiers #1746 / renderers #89) no longer use the "send
Nonefor already-cached images" mechanism. Every image carries its raw descriptor ref at every slot (current and prior turns);/inference/v1/generaterematerializes each ref from disk every request.Why: the
Nonepath coupled correctness to deployment (LRU cache present, single replica / DP-affinity, no eviction) and surfaced a miss as a hard vLLMEngineDeadError(qwen3-vl mrope dereferences aNoneimage_grid_thw) that the retry net couldn't catch across the engine→API IPC. Dropping it is deployment-agnostic (a miss is impossible) and non-hacky. vLLM'smm_hashencoder cache still skips the expensive GPU re-encode for free — we only forgo the cheap IPC/CPU-reprocess dedup.Validated: color-codeword (Qwen3-VL-4B) under DP=2, no affinity / no cache reliance: 0 crashes, 0
data=None, multi-turn accumulation correct, reward ~0.84. Also confirmed under TP.This repo: with every image carrying its ref, no cache miss can occur — removed the retry subsystem (
_generate_with_image_ref_retry,_has_descriptor_only_images,_retryable_mm_error_type,_json_error_type,_RETRYABLE_MM_ERROR_TYPES). Rollouts callrenderers.client.generatedirectly. Obsolete retry tests removed.Original description
Summary
pixel_values,image_embeds, andimage_featuresprime_raw_mm_itemenvelopes instead of descriptor-only Qwen payloadsCompanion PRs
Notes
Validation
ruff check,ruff format, generated AGENTS/CLAUDE check passed.ty (ci parity)passed./home/ubuntu/verifiers,/home/ubuntu/renderers, and/home/ubuntu/prime-rl-v1-raw-mm-offloadcompleted inference, env rollouts, train batch creation, trainer step 0, and decoded strict trainer-bound raw image refs.Note
Medium Risk
Touches the multimodal training hot path (image offload failures halt rollouts at request time) and changes wire/graph multimodal payload shape, which must stay aligned with companion renderers and prime-rl PRs.
Overview
Adds
prepare_images_inplaceso renderer-backed paths rewriteimage_urlparts tofile://run assets (viarenderers.mm_store) before tokenization, and wires that through v0RendererClient.to_native_prompt, v1TrainClient(prepare_request_body/prepare_messages), and the interception server so harness bodies and user-simulator messages match what the trace and trainer see.Multimodal training sidecars move from processed tensors to raw image descriptors (
raw_image_urirequired;pixel_values/ embed keys rejected). Graph serialization no longer numpy-encodesmm_items;mm_placeholdersare attributed per node and merged on branches.PendingTurn.previous_multi_modal_data()feedsbridge_to_next_turnso multi-turn multimodal prompts can bridge (the old multimodal bridge block is removed).Legacy v0 rollouts preserve live trajectory
multi_modal_datawhen mapping to v1 traces. Docs and type comments are updated; eval dashboard reward rows use an explicit loop (no behavior change).Reviewed by Cursor Bugbot for commit 22c7cf4. Bugbot is set up for automated code reviews on this repo. Configure here.