Skip to content

Incorrect handling of reasoning traces for nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 in MBridge format #682

@marta-sd

Description

@marta-sd

Describe the bug

When serving the model in-framework, we return incorrect responses for the chat endpoint. The response clearly contain reasoning, but it's not wrapped with special tokens. It looks like special tokens were removed without stripping the thinking part first and it's mixed up with model's response.

Example:

request:

{'messages': [{'role': 'user', 'content': 'Write a resume for a fresh high school graduate who is seeking their first job. Make sure to include at least 12 placeholder represented by square brackets, such as [address], [name].'}], 'model': 'megatron_model', 'max_tokens': 1280, 'temperature': 1e-07, 'stop': [], 'stream': False, 'seed': 1234, 'top_p': 0.9999999}

response:

[{'message': {'role': 'assistant', 'content': "We need to write a resume for a fresh high school graduate seeking first job. Include at least 12 placeholders represented by square brackets, such as [address], [name]. So we need placeholders like [Name], [Address], [Phone], [Email], [Objective], [Education], [Skills], [Experience], [Volunteer], [Certifications], [Awards], [Languages], [Interests] etc. At least 12 placeholders. Use square brackets exactly. Provide a resume format. Ensure placeholders are at least 12 distinct ones. Provide them in the resume. Let's count: [Full Name], [Address], [Phone Number], [Email Address], [Objective], [Education], [High School Name], [Graduation Year], [Relevant Coursework], [Skills], [Volunteer Experience], [Community Service], [Certifications], [Awards], [Languages], [Interests]. That's more than 12. Ensure each placeholder is in square brackets. Provide a resume with sections. Let's produce.\n</think>\n**[Full Name]**  \n[Address] • [Phone Number] • [Email Address] • [LinkedIn URL] (optional)\n\n---\n\n### Objective\nEnthusiastic and motivated high‑school graduate seeking a part‑time position where I can apply strong communication skills, a solid work ethic, and a passion for [Industry/Field] to contribute to a dynamic team.\n\n---\n\n### Education\n**[High School Name]** – [City, State]  \n*Graduation Year:* **[Graduation Year]**  \n- **GPA:** [GPA] (if 3.5 or higher)  \n- **Relevant Coursework:** [Relevant Coursework]  \n- **Honors/Distinctions:** [Awards/Honors]  \n\n---\n\n### Skills\n- **Technical:** [Skill 1], [Skill 2], [Skill 3]  \n- **Soft Skills:** Communication, Teamwork, Problem‑Solving, Time Management  \n- **Languages:** [Language 1] (Proficiency: [Level]), [Language 2] (Proficiency: [Level])  \n\n---\n\n### Volunteer & Community Service\n- **[Role/Title]** – [Organization Name], [City, State] – [Month Year] – Present  \n  - *Description:* Brief bullet points highlighting responsibilities and achievements.  \n\n- **[Role/Title]** – [Organization Name], [City, State] – [Month Year] – [Month Year]  \n  - *Description:* Brief bullet points highlighting responsibilities and achievements.  \n\n---\n\n### Work Experience\n**[Job Title]** – [Company Name], [City, State] – [Month Year] – [Month Year]  \n- *Responsibility/Task:* Brief description of duties and measurable outcomes.  \n\n*(Add additional positions as needed, using the same format.)*\n\n---\n\n### Certifications & Training\n- **[Certification Name]** – Issuing Organization, [Year]  \n- **[Certification Name]** – Issuing Organization, [Year]  \n\n---\n\n### Interests\n- [Interest 1], [Interest 2], [Interest 3]  \n\n---\n\n*References available upon request.*"}, 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 34, 'completion_tokens': 340, 'total_tokens': 374}}

Steps/Code to reproduce bug

The issue was discovered when running e2e deployment + eval script from Evaluator (https://github.com/NVIDIA-NeMo/Evaluator/blob/main/scripts/evaluation_with_nemo_run.py)

uv run --active --no-sync python scripts/evaluation_with_nemo_run.py --megatron_checkpoint /lustre/fsw/coreai_dlalgo_ci/nemo_export_deploy_eval_checkpoints/mbridge/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/iter_0000000/ --serving_backend ray --server_port 1235 --num_replicas 1 --endpoint_type chat --tensor_parallelism_size 2 --expert_model_parallel_size 2 --batch_size 4 --eval_task ifeval --parallel_requests 4 --request_timeout 6400 --tag ray_with_metadata --evaluation_result_dir /lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/ --slurm --nodes 1 --devices 8 --container_image nvcr.io/nvidian/nemo:dgxctestingtemp-nemofw-nightly.52075726 --account coreai_dlalgo_ci --partition batch --time_limit 02:00:00 --local_tunnel --custom_mounts /lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/:/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/,/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227:/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227,/lustre:/lustre

Expected behavior

There are two ways to properly return reasoning traces:

  1. As part of the content, wrapped with thinking tokens (for Nemotron they are <think> ... </think>)
  2. As separate reasoning_content field, without special tokens included.

For option 1. we just need to keep the special tokens in model's response. For option 2 we need to extract part inside thinking tokens and return it alongside content in a separate reasoning_content field. I recommend option 2 as it is more user-friendly (no postprocessing on the client side needed).

Additional context

Evaluator docs on reasoning output formats: https://docs.nvidia.com/nemo/evaluator/latest/evaluation/run-evals/reasoning.html#reasoning-output-formats

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions