Skip to content

Add text-only mode support for Qwen 3.5 model builder#2157

Open
apsonawane wants to merge 9 commits into
mainfrom
asonawane/qwen_text
Open

Add text-only mode support for Qwen 3.5 model builder#2157
apsonawane wants to merge 9 commits into
mainfrom
asonawane/qwen_text

Conversation

@apsonawane
Copy link
Copy Markdown
Contributor

@apsonawane apsonawane commented May 12, 2026

Description

Adds support for running Qwen 3.5 as a standalone text-only LLM (without vision/embedding pipelines).

Changes

  • 2D position_ids support: When exclude_embeds=false, the builder creates a 2D [B, S] position_ids graph input and internally expands it to 3D [3, B, S] for mRoPE compatibility. This allows the standard onnxruntime-genai runtime to provide position_ids without requiring the multimodal pipeline.
  • Tokenizer regex fix: Added save_processing override that patches unsupported \p{M} (Unicode Mark category) from tokenizer regex patterns after export. The C++ std::regex engine in onnxruntime-extensions does not support this Unicode property class.
  • genai_config correctness: The internal 3D expanded tensor name is stored in a separate _pos_ids_3d attribute so self.input_names["position_ids"] remains as "position_ids" — ensuring genai_config.json references the actual graph input.

Usage

python3 -m onnxruntime_genai.models.builder \
  --model_name Qwen/Qwen3.5-2B \
  --precision int4 \
  --execution_provider cuda \
  --output /path/to/output \
  --extra_options exclude_embeds=false prune_lm_head=true int4_algo_config=k_quant_last

Copilot AI review requested due to automatic review settings May 12, 2026 21:47
@apsonawane apsonawane requested a review from a team as a code owner May 12, 2026 21:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for building and running Qwen 3.5 as a standalone text-only LLM (without multimodal embedding/vision pipeline), including runtime-side fixes to avoid incorrectly injecting input_ids into decoders that only accept inputs_embeds.

Changes:

  • Add “text-only mode” to the Qwen 3.5 builder, including 2D position_ids support with internal expansion for mRoPE and a tokenizer-regex post-export patch.
  • Fix multimodal runtime logic to check decoder inputs against decoder-only session metadata (avoids false positives from the embedding session).
  • Register qwen3_5_text as an LLM model type in C++.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/python/py/models/builders/qwen.py Adds text-only build path for Qwen 3.5, adjusts position_ids handling, and patches exported tokenizer regex.
src/models/multi_modal.cpp Refines decoder input detection to avoid injecting input_ids when the decoder session doesn’t accept it.
src/models/model_type.h Adds qwen3_5_text to the LLM model-type allowlist.

Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py
Comment thread src/models/multi_modal.cpp Outdated
Comment thread test/python/test_qwen35_text_only.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread test/python/test_qwen35_text_only.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
# Position IDs input.
# In text-only mode the runtime provides standard 2D [B, S] position_ids.
# We expand them to 3D [3, B, S] inside the graph so mRoPE works unchanged.
# In VL mode the pipeline provides 3D position_ids directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the position ids have to differ in text-only mode vs. vision-language mode? It should be an identical decoder besides the input_ids vs inputs_embeds input.

Copy link
Copy Markdown
Contributor Author

@apsonawane apsonawane May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference exists because of who provides them, not the decoder itself.

  1. VL mode: The multimodal pipeline computes 3D [3, B, S] position IDs externally — image patches need distinct spatial (H/W) positions vs text tokens (temporal only).
  2. Text-only mode: The onnxruntime-genai runtime only generates standard 2D [B, S] position IDs. For pure text, all 3 mRoPE dimensions are identical (sequential), so we Tile [B, S] → [3, B, S] in the graph.

To eliminate the Tile, we'd need to:

  1. Add qwen3_5_text to IsQwenVLFamily() in model_type.h
  2. Guard vision token config reads in Qwen2VLPositionInputs for the no-vision case in position_inputs.cpp
  3. Remove the is_text_only branching in the Python builder

There is no perf regression, adding it in runtime will add code complexity I would say. Also, regarding above changes keeping text model out of VL model type makes sense to me.

Comment thread src/python/py/models/builders/qwen.py Fixed
apsonawane and others added 2 commits May 15, 2026 18:04
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants