Skip to content

Server returns empty answer when too many tokens requested #652

@marta-sd

Description

@marta-sd

Describe the bug

When the "max_tokens" in the payload is higher than --inference_max_seq_length passed to the server (in my case: 8192 vs 4096) the server responds with empty assistant message.

Steps/Code to reproduce bug

Deployment snippet (Eos cluster):

python \
  /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
  --megatron_checkpoint /lustre/fsw/coreai_dlalgo_ci/nemo_export_deploy_eval_checkpoints/mbridge/meta-llama/Llama-3.1-8B-Instruct/iter_0000000/ \
  --model_id megatron_model \
  --port 8886 \
  --host 0.0.0.0 \
  --num_gpus 8 \
  --tensor_model_parallel_size 1 \
  --pipeline_model_parallel_size 1 \
  --expert_model_parallel_size 1 \
  --max_batch_size 2 \
  --num_replicas 8 \
  --inference_max_seq_length 4096 \
  --runtime_env '{"py_executable": "/opt/venv/bin/python"}' &

Then to send a request to the model:

import requests
model_name="megatron_model"
endpoint_url="http://0.0.0.0:8886/v1/chat/completions"

payload = {"model": model_name, "max_tokens": 8192, "top_p": 0.9999999, "temperature": 1e-07, "messages": [{"role": "user", "content": "## Instruction:\n\nPlease answer this question by first reasoning and then selecting the correct choice.\nPresent your reasoning and solution in the following json format.\nPlease show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`.\n\n```json\n{\n    \"reasoning\": \"___\",\n    \"answer\": \"___\"\n}\n```\n\n## Question:\n\nWhich of the following is a disorder characterized by uncontrollable episodes of falling asleep during the day?\n\n## Choices:\n\n- (A) Dyslexia\n- (B) Epilepsy\n- (C) Hydrocephalus\n- (D) Narcolepsy\n\n## Answer:"}]}

response = requests.post(endpoint_url, json=payload)
response.json()

Expected behavior

The server should respond with descriptive error

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions