Skip to content

[megatron] support gemma4 megatron#9296

Open
Jintao-Huang wants to merge 6 commits into
modelscope:mainfrom
Jintao-Huang:support_gemma4_megatron
Open

[megatron] support gemma4 megatron#9296
Jintao-Huang wants to merge 6 commits into
modelscope:mainfrom
Jintao-Huang:support_gemma4_megatron

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation to reflect support for Gemma 4 models and refactors embedding handling in Megatron utilities to support multiple modules during device conversion. In swift/model/models/gemma.py, a suggestion was made to use inputs_embeds.device instead of multimodal_mask.device when moving the pad_embedding tensor to ensure better robustness and consistency across operands in the subsequent torch.where call.

Comment thread swift/model/models/gemma.py Outdated

if self.config.get_text_config().hidden_size_per_layer_input:
pad_embedding = self.language_model.embed_tokens.weight[self.config.text_config.pad_token_id, :]
pad_embedding = pad_embedding.to(multimodal_mask.device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using inputs_embeds.device as the target for the .to() call is generally more robust than multimodal_mask.device. Since inputs_embeds is the primary tensor representing the hidden states in this operation, it serves as the most reliable reference for the execution device, ensuring consistency across all operands in the subsequent torch.where call.

Suggested change
pad_embedding = pad_embedding.to(multimodal_mask.device)
pad_embedding = pad_embedding.to(inputs_embeds.device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants