Skip to content

Remove AutoEP backward loss multiplier#23

Merged
tohtana merged 1 commit into
tohtana/add_autoepfrom
tohtana/remove-autoep-loss-multiplier
May 14, 2026
Merged

Remove AutoEP backward loss multiplier#23
tohtana merged 1 commit into
tohtana/add_autoepfrom
tohtana/remove-autoep-loss-multiplier

Conversation

@tohtana
Copy link
Copy Markdown
Owner

@tohtana tohtana commented May 14, 2026

Summary

  • Removes the AutoEP-only ep_size loss multiplier from DeepSpeedEngine.scale() and DeepSpeedEngine.backward().
  • Removes the now-unused AutoEP backward scaling state/helper.
  • Updates the AutoEP ZeRO-2 gradient parity test so all-gathered expert gradients are no longer divided by ep_size.

Context

Addresses Codex bot review comment: deepspeedai#7938 (comment)

This PR is intended to be merged into tohtana/add_autoep, the branch behind upstream PR deepspeedai#7938.

Testing

Environment: /mnt/local_storage/autoep_transformers_matrix_20260513_fixed/venv-5.8.1

Transformers version:

5.8.1

Commands run:

/mnt/local_storage/autoep_transformers_matrix_20260513_fixed/venv-5.8.1/bin/python -m compileall -q deepspeed/runtime/engine.py tests/unit/moe/test_autoep_grad_parity.py
flock /tmp/ds_gpu_test.lock -c 'PYTEST_XDIST_WORKER=gw1 DS_DISABLE_REUSE_DIST_ENV=1 CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/local_storage/autoep_transformers_matrix_20260513_fixed/venv-5.8.1/bin/python -m pytest -q tests/unit/moe/test_autoep_grad_parity.py'
flock /tmp/ds_gpu_test.lock -c 'PYTEST_XDIST_WORKER=gw1 DS_DISABLE_REUSE_DIST_ENV=1 CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/local_storage/autoep_transformers_matrix_20260513_fixed/venv-5.8.1/bin/python -m pytest -q tests/unit/moe/test_autoep_unit.py'
git diff --check

Results:

  • test_autoep_grad_parity.py: 1 passed
  • test_autoep_unit.py: 32 passed
  • git diff --check: passed

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana merged commit 045b061 into tohtana/add_autoep May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant