Skip to content

Fix resized LM head weights being overwritten by post_init#45079

Merged
ArthurZucker merged 1 commit intohuggingface:mainfrom
javierdejesusda:fix/resize-embeddings-hf-initialized
Apr 2, 2026
Merged

Fix resized LM head weights being overwritten by post_init#45079
ArthurZucker merged 1 commit intohuggingface:mainfrom
javierdejesusda:fix/resize-embeddings-hf-initialized

Conversation

@javierdejesusda
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #35141

When tie_word_embeddings=False, calling resize_token_embeddings() then post_init() overwrites
the LM head weights with random values. This happens because _get_resized_lm_head() returns a new
nn.Linear without setting _is_hf_initialized, so post_init treats it as uninitialized.

The fix adds new_lm_head._is_hf_initialized = True at the end of _get_resized_lm_head(), after
all weight copying is done. _get_resized_embeddings() doesn't need this because it reuses the
original module object.

Test added to ModelTesterMixin in test_modeling_common.py, following the existing
test_resize_embeddings_untied pattern. Verified on GPT2, LLaMA, and OPT.

Supersedes #36221 (stale since Feb 2025, reviewer feedback never addressed).

  • I confirm that this is not a pure code agent PR.
  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@Rocketknight1 @ArthurZucker

When `tie_word_embeddings=False`, `_get_resized_lm_head()` creates a new
`nn.Linear` without `_is_hf_initialized`, causing `post_init()` to
reinitialize its weights. Set the flag after weight copying is done.

Fixes huggingface#35141
@javierdejesusda javierdejesusda force-pushed the fix/resize-embeddings-hf-initialized branch from 7b10fd4 to e59da7d Compare March 28, 2026 01:23
Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, LGTM!

@Rocketknight1 Rocketknight1 enabled auto-merge March 30, 2026 14:23
@Rocketknight1 Rocketknight1 added this pull request to the merge queue Mar 30, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! thanks 🤗

@ArthurZucker ArthurZucker added this pull request to the merge queue Mar 31, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Mar 31, 2026
@ArthurZucker ArthurZucker merged commit 4932e97 into huggingface:main Apr 2, 2026
29 checks passed
marvinzh pushed a commit to marvinzh/transformers that referenced this pull request Apr 3, 2026
…ce#45079)

When `tie_word_embeddings=False`, `_get_resized_lm_head()` creates a new
`nn.Linear` without `_is_hf_initialized`, causing `post_init()` to
reinitialize its weights. Set the flag after weight copying is done.

Fixes huggingface#35141
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Apr 4, 2026
…ce#45079)

When `tie_word_embeddings=False`, `_get_resized_lm_head()` creates a new
`nn.Linear` without `_is_hf_initialized`, causing `post_init()` to
reinitialize its weights. Set the flag after weight copying is done.

Fixes huggingface#35141
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
…ce#45079)

When `tie_word_embeddings=False`, `_get_resized_lm_head()` creates a new
`nn.Linear` without `_is_hf_initialized`, causing `post_init()` to
reinitialize its weights. Set the flag after weight copying is done.

Fixes huggingface#35141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resizing token embeddings causes output embedding to be reinitialized in post_init when tie_word_embedding is False

4 participants