fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel by harshaljanjani · Pull Request #45074 · huggingface/transformers

harshaljanjani · 2026-03-27T20:02:28Z

What does this PR do?

The following dtype mismatch use cases were identified and fixed in this PR:

→ Switch Transformers: 7938e91fa refactored all MoE models for vLLM compatibility; in that refactor, the _cast_classifier() method was removed from SwitchTransformersTop1Router but no dtype cast was added. Casting hidden_states to classifier.weight.dtype before the linear call fixes that!
→ TimmWrapper: 6217adc6c8 changed the default dtype behavior to "auto"; in that commit, pixel_values.to(self.device, self.dtype) was regressed to pixel_values.to(self.device) dropping the dtype cast. I'm not too sure why it was dropped; but restoring it seems logical to fix the use case.
→ For more details on reproducing the bug and the output screenshots, please visit the linked issue!

cc: @Rocketknight1

Fixes #45072

CI run test coverage of this behavior (as suggested by @ydshieh) :):

SwitchTransformers:
→ test_modeling_switch_transformers.py::SwitchTransformersModelTest::test_generate_with_past_key_values
→ test_modeling_switch_transformers.py::SwitchTransformersModelTest::test_model_fp16_forward
→ test_modeling_switch_transformers.py::SwitchTransformerModelIntegrationTests::test_small_logits
TimmWrapper:
→ TimmWrapperModelTest does not have explicit bfloat16 forward pass tests; added one in this PR for complete coverage.

Repro output after the fixes (feel free to cross-check):

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you fix any necessary existing tests?

github-actions · 2026-03-27T20:03:40Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: switch_transformers, timm_wrapper

fix: Cast inputs to match weight dtype

cd1a4c9

new: Add test

e05bd6a

harshaljanjani marked this pull request as ready for review March 27, 2026 20:19

github-actions bot requested review from ArthurZucker and ydshieh March 27, 2026 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel#45074

fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel#45074
harshaljanjani wants to merge 2 commits intohuggingface:mainfrom
harshaljanjani:fix/switch-transformers-timm-wrapper-bf16-dtype

harshaljanjani commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harshaljanjani commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harshaljanjani commented Mar 27, 2026 •

edited

Loading