feat(mtmd): add Qwen2VL SMT patch preprocessing by LFF28 · Pull Request #9 · spacemit-com/llama.cpp

LFF28 · 2026-06-18T01:32:58Z

feat(mtmd): add Qwen2VL SMT patch preprocessing

Overview

This PR adds a Qwen2VL-specific SMT vision preprocessing path for ONNX vision encoders exported from model.visual.

Qwen2VL model.visual does not consume a normal CHW image tensor. It expects the processor output tensor shaped as:

[grid_h * grid_w, 3 * temporal_patch_size * 14 * 14]

The SMT preprocessor now detects qwen2vl / qwen2_vl architectures and converts resized RGB input into the flattened patch tensor expected by the exported ONNX model. The path applies the configured rescale and normalization values, then lays out patches in Qwen2VL merge order instead of returning a CHW tensor.

This enables the MinerU2.5 split deployment path:

text side:   GGUF model loaded by llama.cpp
vision side: Qwen2VL visual ONNX loaded through the SMT vision backend

Additional information

Companion deployment/export project:

https://gitlab.dc.com:8443/liangjunzhao/mineru2.5-split

Model artifacts have been uploaded externally and are not included in this PR:

https://archive.spacemit.com/spacemit-ai/model_zoo/vlm/

Validation context:

Model: MinerU2.5-Pro-2605-1.2B
Text model: mineru2.5-text-Q4_1.gguf
Vision models:
- mineru2.5-vision-224.f16.onnx
- mineru2.5-vision-504.f16.onnx
- shared external data: mineru2.5-vision-shared.f16.data
Current deployment config uses the 504 static vision graph.
224 and 504 ONNX graphs share the same external initializer data; the graph files differ by static input/output shapes.

The 1008 static vision export was attempted but could not complete on the available local machine configuration. The 504 static graph was exported and validated on the target device.

Requirements

git diff --check
cmake --build build --target llama-server -j 8
Verified that tools/mtmd/smt-vision-preprocess.cpp compiles as part of llama-server
Verified the MinerU2.5 504 deployment starts with SMT media backend and /health returns {"status":"ok"}
Verified input.pdf inference through the split GGUF + ONNX deployment on the target device
Model binaries and exported ONNX/GGUF/data files are archived externally, not committed to source

Add a Qwen2VL preprocessing path that converts resized RGB images into the flattened patch tensor expected by model.visual ONNX exports. Detect qwen2vl/qwen2_vl architectures in the SMT vision preprocessor and route them through rescale, normalization, and merge-ordered patch flattening instead of CHW image tensors.

* spec: support MTP * fix batch size * rename files * cont : simplify (spacemit-com#7) * MTP: clean-up (spacemit-com#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: ggml-org@8c05923 Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (spacemit-com#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (ggml-org#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Copilot

Pull request overview

Adds a Qwen2VL-specific vision preprocessing path to the SMT (ONNX) vision pipeline so that Qwen2VL model.visual exports can consume the flattened, merged patch tensor layout they expect (instead of a CHW image tensor). This supports split deployments where llama.cpp runs the text GGUF while SMT runs the Qwen2VL vision ONNX.

Changes:

Extend SMT vision preprocessing spec resolution to detect qwen2vl / qwen2_vl architectures and enable a new patch-flatten mode.
Implement RGB->normalized CHW conversion + Qwen2VL patch flattening in Qwen2VL merge order, producing [grid_h * grid_w, 3 * temporal_patch_size * 14 * 14]-style packed float data.
Route preprocessing output selection through the new qwen2vl_patch_flatten flag and reuse a single resolved preprocess_config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions Bot added the examples label Jun 18, 2026

alex-spacemit requested a review from Copilot June 22, 2026 06:40

Copilot started reviewing on behalf of alex-spacemit June 22, 2026 06:40 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

alex-spacemit merged commit 2415c9a into spacemit-com:spacemit-mtmd Jun 22, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mtmd): add Qwen2VL SMT patch preprocessing#9

feat(mtmd): add Qwen2VL SMT patch preprocessing#9
alex-spacemit merged 1 commit into
spacemit-com:spacemit-mtmdfrom
LFF28:mineru2.5

LFF28 commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

LFF28 commented Jun 18, 2026

Overview

Additional information

Requirements

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants