Skip to content

Commit 2eaef42

Browse files
committed
Disable dropout to workaround PyTorch 2.11 checkpoint recomputation bug
Disable dropout (resid_pdrop=0, attn_pdrop=0, embd_pdrop=0) in the run_training_ac function to avoid SystemError from _VF.dropout returning NULL during backward recomputation of GPT2Block. Dropout is irrelevant to the memory profiling purpose of this tutorial. Issue: #3774
1 parent f92d01b commit 2eaef42

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

beginner_source/mosaic_memory_profiling_tutorial.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,13 @@ def run_training_ac(
306306

307307
# Load model
308308
print(f"Loading GPT-2 (activation_checkpointing={activation_checkpointing})...")
309-
model = GPT2LMHeadModel.from_pretrained("gpt2")
309+
# Disable dropout to avoid PyTorch 2.11 checkpoint recomputation bug (#3774).
310+
# _VF.dropout returns NULL without setting an exception during backward
311+
# recomputation of GPT2Block. Dropout is irrelevant to memory profiling.
312+
# Original: model = GPT2LMHeadModel.from_pretrained("gpt2")
313+
model = GPT2LMHeadModel.from_pretrained(
314+
"gpt2", resid_pdrop=0, attn_pdrop=0, embd_pdrop=0
315+
)
310316

311317
if activation_checkpointing:
312318
model.gradient_checkpointing_enable()

0 commit comments

Comments
 (0)