Skip to content

Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265)#1277

Merged
kcz358 merged 2 commits intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1265_vllm_top_p
Apr 10, 2026
Merged

Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265)#1277
kcz358 merged 2 commits intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1265_vllm_top_p

Conversation

@akawincent
Copy link
Copy Markdown
Contributor

Fixes #1265.

Summary

Fix a vLLM backend compatibility issue where task configs that set top_p: 0 crash during request construction.

This change normalizes top_p=0 to top_p=1.0 before building vLLM SamplingParams, which preserves the intended greedy decoding behavior while satisfying vLLM's validation rules.

Root Cause

Several task configs in lmms-eval use:

  • temperature: 0
  • top_p: 0
  • do_sample: false

This pattern is commonly used to express greedy decoding and works in Hugging Face-style generation paths.

However, the vLLM wrappers were forwarding top_p directly into SamplingParams, and vLLM requires top_p to be in (0, 1]. As a result, tasks using top_p: 0 failed before generation started.

What Changed

  • Added a small normalization step in the shared vLLM wrapper to convert top_p=0 to 1.0
  • Reused that logic in:
    • vllm
    • vllm_chat
    • vllm_generate
  • Added a lightweight regression test covering the normalization path

Impact

Tasks that rely on greedy-style generation settings with top_p: 0 can now run on vLLM without crashing.

This keeps task YAML unchanged and limits the fix to the vLLM compatibility layer.

Validation

Ran:

  • python -m unittest test.models.test_vllm_sampling_params -v
  • python -m unittest test.models.test_model_registry_v2 -v

@akawincent akawincent changed the title Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265) Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265) Mar 27, 2026
@akawincent
Copy link
Copy Markdown
Contributor Author

Hi @Luodian

Could you review this PR please?

The code changes are minimal: it mainly converts top_p=0 to top_p=1 when calling the vLLM backend, so that vLLM's greedy decoding can start.

Additionally, I added a regression test.

Comment thread test/models/test_vllm_sampling_params.py Outdated
@kcz358 kcz358 merged commit 17c4461 into EvolvingLMMs-Lab:main Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] vLLM backend crashes with "top_p must be in (0, 1]" when task config sets top_p=0

2 participants