feat(vllm_batch): add reasoning_parser support for batch inference by lilyz-ai · Pull Request #804 · scaleapi/llm-engine

lilyz-ai · 2026-04-03T23:34:04Z

Summary

Adds reasoning_parser field to VLLMModelConfig so batch jobs can configure reasoning parsers (e.g. nemotron_v3, deepseek_r1)
Passes it to structured_outputs_config in init_engine (async path) — previously hardcoded to None
Filters it from AsyncEngineArgs (it's a serving-layer arg, not an engine arg)
init_vllm subprocess path already handles it correctly via CLI arg conversion

Motivation

Needed to deploy NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 to batch inference with reasoning_parser=nemotron_v3, tool_call_parser=qwen3_coder, and enable_auto_tool_choice=True.

🤖 Generated with Claude Code

Greptile Summary

This PR adds reasoning_parser support for vLLM batch inference by introducing the field in VLLMModelConfig and wiring it through the async init_engine path's structured_outputs_config. It also correctly expands _serving_only_keys to filter tool_call_parser, enable_auto_tool_choice, and chat_template from AsyncEngineArgs — fixing a latent TypeError for any batch job using those fields.

Key changes:

VLLMModelConfig gains a new reasoning_parser: Optional[str] field alongside the existing serving-layer fields.
_serving_only_keys now covers all four serving-layer args so they are safely excluded when constructing AsyncEngineArgs.
structured_outputs_config is populated with parsed_configs.reasoning_parser instead of the previously hardcoded None.

Issue found:

tool_call_parser and enable_auto_tool_choice are correctly removed from engine_args_dict but their user-configured values are never forwarded to app_state_args. default_app_state_args hardcodes them as None/False, and since they no longer arrive via the **engine_args_dict merge, any batch job configured with tool_call_parser=qwen3_coder or enable_auto_tool_choice=True will silently use the defaults — the exact use case motivating this PR.

Confidence Score: 4/5

Not safe to merge as-is — the primary use case (tool_call_parser + enable_auto_tool_choice for the Nemotron deployment) is silently broken.

One P1 bug: tool_call_parser and enable_auto_tool_choice are filtered from engine_args_dict but never explicitly propagated to app_state_args, so user-configured values are silently ignored. This directly undermines the stated motivation of the PR. The fix is a two-line change in default_app_state_args. The reasoning_parser wiring and the DTO addition are correct.

model-engine/model_engine_server/inference/vllm/vllm_batch.py — specifically default_app_state_args where tool_call_parser and enable_auto_tool_choice need to reference parsed_configs instead of being hardcoded.

Important Files Changed

Filename	Overview
model-engine/model_engine_server/common/dtos/llms/vllm.py	Adds `reasoning_parser: Optional[str]` field to `VLLMModelConfig` — straightforward, well-placed alongside the other serving-layer fields (`tool_call_parser`, `enable_auto_tool_choice`).
model-engine/model_engine_server/inference/vllm/vllm_batch.py	Extends `_serving_only_keys` to filter serving-layer args from `AsyncEngineArgs` and correctly passes `reasoning_parser` through `structured_outputs_config`, but `tool_call_parser` and `enable_auto_tool_choice` from `parsed_configs` are silently dropped — they're filtered out of `engine_args_dict` but hardcoded as `None`/`False` in `default_app_state_args` instead of being sourced from `parsed_configs`.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[VLLMModelConfig\nreasoning_parser\ntool_call_parser\nenable_auto_tool_choice\nchat_template] --> B[parsed_configs.model_dump\nexclude_none=True]
    B --> C{k in _serving_only_keys?}
    C -- No --> D[engine_args_dict]
    C -- Yes --> E[Filtered out]
    D --> F[AsyncEngineArgs **engine_args_dict]
    D --> G[app_state_args merge\n**default_app_state_args\n**engine_args_dict]
    E --> H[reasoning_parser\n→ structured_outputs_config ✅]
    E --> I[chat_template\n→ default_app_state_args ✅]
    E --> J[tool_call_parser\n→ hardcoded None ❌]
    E --> K[enable_auto_tool_choice\n→ hardcoded False ❌]
    H --> G
    I --> G
    G --> L[init_app_state]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: model-engine/model_engine_server/inference/vllm/vllm_batch.py
Line: 327-329

Comment:
**`tool_call_parser` and `enable_auto_tool_choice` silently ignored**

`tool_call_parser` and `enable_auto_tool_choice` are now excluded from `engine_args_dict` (correctly, to prevent `TypeError` in `AsyncEngineArgs`), but they are also **not** propagated to `app_state_args`. Since `engine_args_dict` no longer carries them into the `**engine_args_dict` merge on line 351, the final `app_state_args` will always use these hardcoded defaults — ignoring whatever the user configured in `VLLMModelConfig`.

This contradicts the PR's stated motivation of deploying with `tool_call_parser=qwen3_coder` and `enable_auto_tool_choice=True`.

Compare how `chat_template` (line 343) and `reasoning_parser` (lines 330–332) are handled — both are sourced directly from `parsed_configs` inside `default_app_state_args`. The same pattern needs to be applied here:

```python
        enable_auto_tool_choice=parsed_configs.enable_auto_tool_choice or False,
        tool_call_parser=parsed_configs.tool_call_parser,
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: model-engine/model_engine_server/inference/vllm/vllm_batch.py
Line: 293

Comment:
**Add a comment explaining why these keys are excluded**

The `_serving_only_keys` set is non-obvious — it's not immediately clear why these specific keys must be excluded from `engine_args_dict`. A short inline comment would make the intent clear for future contributors who add new serving-layer fields to `VLLMModelConfig`.

```suggestion
    # These are vLLM serving-layer args that are not accepted by AsyncEngineArgs.
    # They are handled separately: chat_template and tool_call_parser/enable_auto_tool_choice
    # go into default_app_state_args directly, and reasoning_parser goes into structured_outputs_config.
    _serving_only_keys = {"reasoning_parser", "tool_call_parser", "enable_auto_tool_choice", "chat_template"}
```

**Rule Used:** Add comments to explain complex or non-obvious log... ([source](https://app.greptile.com/review/custom-context?memory=928586f9-9432-435e-a385-026fa49318a2))

**Learnt From**
[scaleapi/scaleapi#126958](https://github.com/scaleapi/scaleapi/pull/126958)

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "fix(vllm_batch): filter serving-only arg..." | Re-trigger Greptile}

- Add `reasoning_parser` field to VLLMModelConfig - Pass it to `structured_outputs_config` in init_engine (async path) - Filter it from AsyncEngineArgs (it's a serving-layer arg, not engine arg) - init_vllm subprocess path already handles it via CLI arg conversion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

model-engine/model_engine_server/inference/vllm/vllm_batch.py

tool_call_parser, enable_auto_tool_choice, chat_template, and reasoning_parser are app-state args, not AsyncEngineArgs params. Pass them via app_state_args only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

model-engine/model_engine_server/inference/vllm/vllm_batch.py Outdated Show resolved Hide resolved

fix(vllm_batch): filter serving-only args from AsyncEngineArgs

0924eeb

tool_call_parser, enable_auto_tool_choice, chat_template, and reasoning_parser are app-state args, not AsyncEngineArgs params. Pass them via app_state_args only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm_batch): add reasoning_parser support for batch inference#804

feat(vllm_batch): add reasoning_parser support for batch inference#804
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/nemotron-reasoning-parser

lilyz-ai commented Apr 3, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lilyz-ai commented Apr 3, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lilyz-ai commented Apr 3, 2026 •

edited by greptile-apps bot

Loading