Fix the incompatibility issue caused by top_p=0 when using vllm to inference (#1265)#1277
Merged
kcz358 merged 2 commits intoEvolvingLMMs-Lab:mainfrom Apr 10, 2026
Merged
Conversation
top_p=0 when using vllm to inference (#1265)
Contributor
Author
|
Hi @Luodian Could you review this PR please? The code changes are minimal: it mainly converts top_p=0 to top_p=1 when calling the vLLM backend, so that vLLM's greedy decoding can start. Additionally, I added a regression test. |
kcz358
reviewed
Apr 9, 2026
kcz358
approved these changes
Apr 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1265.
Summary
Fix a vLLM backend compatibility issue where task configs that set
top_p: 0crash during request construction.This change normalizes
top_p=0totop_p=1.0before building vLLMSamplingParams, which preserves the intended greedy decoding behavior while satisfying vLLM's validation rules.Root Cause
Several task configs in
lmms-evaluse:temperature: 0top_p: 0do_sample: falseThis pattern is commonly used to express greedy decoding and works in Hugging Face-style generation paths.
However, the vLLM wrappers were forwarding
top_pdirectly intoSamplingParams, and vLLM requirestop_pto be in(0, 1]. As a result, tasks usingtop_p: 0failed before generation started.What Changed
top_p=0to1.0vllmvllm_chatvllm_generateImpact
Tasks that rely on greedy-style generation settings with
top_p: 0can now run on vLLM without crashing.This keeps task YAML unchanged and limits the fix to the vLLM compatibility layer.
Validation
Ran:
python -m unittest test.models.test_vllm_sampling_params -vpython -m unittest test.models.test_model_registry_v2 -v