Incorrect handling of reasoning traces for nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 in MBridge format

**Describe the bug**

When serving the model in-framework, we return incorrect responses for the chat endpoint. The response clearly contain reasoning, but it's not wrapped with special tokens. It looks like special tokens were removed without stripping the thinking part first and it's mixed up with model's response.

Example:

request:
```
{'messages': [{'role': 'user', 'content': 'Write a resume for a fresh high school graduate who is seeking their first job. Make sure to include at least 12 placeholder represented by square brackets, such as [address], [name].'}], 'model': 'megatron_model', 'max_tokens': 1280, 'temperature': 1e-07, 'stop': [], 'stream': False, 'seed': 1234, 'top_p': 0.9999999}
```

response:
```
[{'message': {'role': 'assistant', 'content': "We need to write a resume for a fresh high school graduate seeking first job. Include at least 12 placeholders represented by square brackets, such as [address], [name]. So we need placeholders like [Name], [Address], [Phone], [Email], [Objective], [Education], [Skills], [Experience], [Volunteer], [Certifications], [Awards], [Languages], [Interests] etc. At least 12 placeholders. Use square brackets exactly. Provide a resume format. Ensure placeholders are at least 12 distinct ones. Provide them in the resume. Let's count: [Full Name], [Address], [Phone Number], [Email Address], [Objective], [Education], [High School Name], [Graduation Year], [Relevant Coursework], [Skills], [Volunteer Experience], [Community Service], [Certifications], [Awards], [Languages], [Interests]. That's more than 12. Ensure each placeholder is in square brackets. Provide a resume with sections. Let's produce.\n</think>\n**[Full Name]**  \n[Address] • [Phone Number] • [Email Address] • [LinkedIn URL] (optional)\n\n---\n\n### Objective\nEnthusiastic and motivated high‑school graduate seeking a part‑time position where I can apply strong communication skills, a solid work ethic, and a passion for [Industry/Field] to contribute to a dynamic team.\n\n---\n\n### Education\n**[High School Name]** – [City, State]  \n*Graduation Year:* **[Graduation Year]**  \n- **GPA:** [GPA] (if 3.5 or higher)  \n- **Relevant Coursework:** [Relevant Coursework]  \n- **Honors/Distinctions:** [Awards/Honors]  \n\n---\n\n### Skills\n- **Technical:** [Skill 1], [Skill 2], [Skill 3]  \n- **Soft Skills:** Communication, Teamwork, Problem‑Solving, Time Management  \n- **Languages:** [Language 1] (Proficiency: [Level]), [Language 2] (Proficiency: [Level])  \n\n---\n\n### Volunteer & Community Service\n- **[Role/Title]** – [Organization Name], [City, State] – [Month Year] – Present  \n  - *Description:* Brief bullet points highlighting responsibilities and achievements.  \n\n- **[Role/Title]** – [Organization Name], [City, State] – [Month Year] – [Month Year]  \n  - *Description:* Brief bullet points highlighting responsibilities and achievements.  \n\n---\n\n### Work Experience\n**[Job Title]** – [Company Name], [City, State] – [Month Year] – [Month Year]  \n- *Responsibility/Task:* Brief description of duties and measurable outcomes.  \n\n*(Add additional positions as needed, using the same format.)*\n\n---\n\n### Certifications & Training\n- **[Certification Name]** – Issuing Organization, [Year]  \n- **[Certification Name]** – Issuing Organization, [Year]  \n\n---\n\n### Interests\n- [Interest 1], [Interest 2], [Interest 3]  \n\n---\n\n*References available upon request.*"}, 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 34, 'completion_tokens': 340, 'total_tokens': 374}}
```

**Steps/Code to reproduce bug**

The issue was discovered when running e2e deployment + eval script from Evaluator (https://github.com/NVIDIA-NeMo/Evaluator/blob/main/scripts/evaluation_with_nemo_run.py)

```
uv run --active --no-sync python scripts/evaluation_with_nemo_run.py --megatron_checkpoint /lustre/fsw/coreai_dlalgo_ci/nemo_export_deploy_eval_checkpoints/mbridge/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/iter_0000000/ --serving_backend ray --server_port 1235 --num_replicas 1 --endpoint_type chat --tensor_parallelism_size 2 --expert_model_parallel_size 2 --batch_size 4 --eval_task ifeval --parallel_requests 4 --request_timeout 6400 --tag ray_with_metadata --evaluation_result_dir /lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/ --slurm --nodes 1 --devices 8 --container_image nvcr.io/nvidian/nemo:dgxctestingtemp-nemofw-nightly.52075726 --account coreai_dlalgo_ci --partition batch --time_limit 02:00:00 --local_tunnel --custom_mounts /lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/:/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227/evaluation_results/,/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227:/lustre/fsw/coreai_dlalgo_ci/evaluator_ci/323325227,/lustre:/lustre
```


**Expected behavior**

There are two ways to properly return reasoning traces:

1. As part of the `content`, wrapped with thinking tokens (for Nemotron they are `<think> ... </think>`)
2. As separate `reasoning_content` field, without special tokens included.

For option 1. we just need to keep the special tokens in model's response. For option 2 we need to extract part inside thinking tokens and return it alongside `content` in a separate `reasoning_content` field. **I recommend option 2 as it is more user-friendly** (no postprocessing on the client side needed).

**Additional context**

Evaluator docs on reasoning output formats: https://docs.nvidia.com/nemo/evaluator/latest/evaluation/run-evals/reasoning.html#reasoning-output-formats


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect handling of reasoning traces for nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 in MBridge format #682

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incorrect handling of reasoning traces for nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 in MBridge format #682

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions