Add text-only mode support for Qwen 3.5 model builder by apsonawane · Pull Request #2157 · microsoft/onnxruntime-genai

apsonawane · 2026-05-12T21:47:45Z

Description

Adds support for running Qwen 3.5 as a standalone text-only LLM (without vision/embedding pipelines).

Changes

2D position_ids support: When exclude_embeds=false, the builder creates a 2D [B, S] position_ids graph input and internally expands it to 3D [3, B, S] for mRoPE compatibility. This allows the standard onnxruntime-genai runtime to provide position_ids without requiring the multimodal pipeline.
Tokenizer regex fix: Added save_processing override that patches unsupported \p{M} (Unicode Mark category) from tokenizer regex patterns after export. The C++ std::regex engine in onnxruntime-extensions does not support this Unicode property class.
genai_config correctness: The internal 3D expanded tensor name is stored in a separate _pos_ids_3d attribute so self.input_names["position_ids"] remains as "position_ids" — ensuring genai_config.json references the actual graph input.

Usage

python3 -m onnxruntime_genai.models.builder \
  --model_name Qwen/Qwen3.5-2B \
  --precision int4 \
  --execution_provider cuda \
  --output /path/to/output \
  --extra_options exclude_embeds=false prune_lm_head=true int4_algo_config=k_quant_last

Copilot

Pull request overview

Adds support for building and running Qwen 3.5 as a standalone text-only LLM (without multimodal embedding/vision pipeline), including runtime-side fixes to avoid incorrectly injecting input_ids into decoders that only accept inputs_embeds.

Changes:

Add “text-only mode” to the Qwen 3.5 builder, including 2D position_ids support with internal expansion for mRoPE and a tokenizer-regex post-export patch.
Fix multimodal runtime logic to check decoder inputs against decoder-only session metadata (avoids false positives from the embedding session).
Register qwen3_5_text as an LLM model type in C++.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/python/py/models/builders/qwen.py	Adds text-only build path for Qwen 3.5, adjusts `position_ids` handling, and patches exported tokenizer regex.
src/models/multi_modal.cpp	Refines decoder input detection to avoid injecting `input_ids` when the decoder session doesn’t accept it.
src/models/model_type.h	Adds `qwen3_5_text` to the LLM model-type allowlist.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

kunal-vaishnavi · 2026-05-15T06:43:14Z

+        # Position IDs input.
+        # In text-only mode the runtime provides standard 2D [B, S] position_ids.
+        # We expand them to 3D [3, B, S] inside the graph so mRoPE works unchanged.
+        # In VL mode the pipeline provides 3D position_ids directly.


Why do the position ids have to differ in text-only mode vs. vision-language mode? It should be an identical decoder besides the input_ids vs inputs_embeds input.

The difference exists because of who provides them, not the decoder itself.

VL mode: The multimodal pipeline computes 3D [3, B, S] position IDs externally — image patches need distinct spatial (H/W) positions vs text tokens (temporal only).

Text-only mode: The onnxruntime-genai runtime only generates standard 2D [B, S] position IDs. For pure text, all 3 mRoPE dimensions are identical (sequential), so we Tile [B, S] → [3, B, S] in the graph.

To eliminate the Tile, we'd need to:

Add qwen3_5_text to IsQwenVLFamily() in model_type.h

Guard vision token config reads in Qwen2VLPositionInputs for the no-vision case in position_inputs.cpp

Remove the is_text_only branching in the Python builder

There is no perf regression, adding it in runtime will add code complexity I would say. Also, regarding above changes keeping text model out of VL model type makes sense to me.

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

apsonawane added 2 commits May 12, 2026 21:44

Add qwen3.5 text model type

495e4c4

Merge branch 'main' into asonawane/qwen_text

74e7a97

Copilot AI review requested due to automatic review settings May 12, 2026 21:47

apsonawane requested a review from a team as a code owner May 12, 2026 21:47

Copilot started reviewing on behalf of apsonawane May 12, 2026 21:48 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread src/python/py/models/builders/qwen.py Outdated

Comment thread src/python/py/models/builders/qwen.py

Comment thread src/models/multi_modal.cpp Outdated

Address comments and add test

5fe0d8e

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread test/python/test_qwen35_text_only.py Fixed

apsonawane requested a review from Copilot May 14, 2026 17:56

Copilot started reviewing on behalf of apsonawane May 14, 2026 17:59 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread src/python/py/models/builders/qwen.py Outdated

Comment thread src/python/py/models/builders/qwen.py Outdated

Comment thread src/python/py/models/builders/qwen.py Outdated

Comment thread test/python/test_qwen35_text_only.py Outdated

hanbitmyths added the 0.14.0 label May 14, 2026

apsonawane added 2 commits May 15, 2026 01:07

Merge branch 'main' into asonawane/qwen_text

0e7c68e

Address copilot comments

c14354a

kunal-vaishnavi reviewed May 15, 2026

View reviewed changes

Comment thread src/python/py/models/builders/qwen.py Outdated

kunal-vaishnavi reviewed May 15, 2026

View reviewed changes

Comment thread src/python/py/models/builders/qwen.py Outdated

kunal-vaishnavi reviewed May 15, 2026

View reviewed changes

apsonawane added 2 commits May 15, 2026 15:58

Address comments

40a253d

Merge branch 'main' into asonawane/qwen_text

faf35a4

github-advanced-security AI found potential problems May 15, 2026

View reviewed changes

Comment thread src/python/py/models/builders/qwen.py Fixed

kunal-vaishnavi mentioned this pull request May 15, 2026

Enable Qwen3.5 TRT-RTX EP path with CUDA graph #2139

Open

apsonawane and others added 2 commits May 15, 2026 18:04

Update extensions commit

83b2bb9

Potential fix for pull request finding 'CodeQL / Unused import'

2d08052

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text-only mode support for Qwen 3.5 model builder#2157

Add text-only mode support for Qwen 3.5 model builder#2157
apsonawane wants to merge 9 commits into
mainfrom
asonawane/qwen_text

apsonawane commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi May 15, 2026

Uh oh!

apsonawane May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

apsonawane commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi May 15, 2026

Choose a reason for hiding this comment

Uh oh!

apsonawane May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apsonawane commented May 12, 2026 •

edited

Loading

apsonawane May 15, 2026 •

edited

Loading