fix: Fix AutoEP DeepSeekV3 preset for split ModuleList experts by nathon-lee · Pull Request #7 · tohtana/DeepSpeed

nathon-lee · 2026-04-17T06:39:47Z

hi @tohtana could you please take a look at this PR when you have a chance? I updated the DeepSeekV3 AutoEP preset to match the actual HuggingFace expert layout and added regression coverage for the detection and repacking paths

I found the mismatch by instantiating the HuggingFace DeepSeekV3 model skeleton from config and observing that its experts are ModuleList entries with split gate_proj, up_proj, and down_proj layers rather than fused gate_up_proj weights.

import transformers.utils.import_utils
transformers.utils.import_utils.is_torch_fx_available = lambda: False

from transformers import AutoConfig, AutoModelForCausalLM
from accelerate import init_empty_weights

config = AutoConfig.from_pretrained("deepseek-ai/DeepSeek-V3", trust_remote_code=True)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)

print(model)

Title:

Fix DeepSeekV3 AutoEP preset mismatch

Body:

DeepSeekV3 experts in HuggingFace are stored as ModuleList entries with split gate_proj, up_proj, and down_proj layers, but the previous AutoEP preset assumed fused gate_up_proj expert weights. This mismatch could cause AutoEP to fail MoE layer detection.

This PR updates the deepseek_v3 preset to match the real model structure and adds regression tests for:

DeepSeekV3-style MoE detection
ModuleList expert repacking
the old fused expert assumption failing on the same layout

Test Result

root@b949db7a21db:/workspace/DeepSpeed_woo# python3.13 -m pytest tests/unit/moe/test_autoep_unit.py -k "test_autodetect_deepseek_v3_module_list_experts ortest_repack_module_list_separate_gate_and_up"
=================================================================== test session starts ===================================================================
platform linux -- Python 3.13.5, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3.13
cachedir: .pytest_cache
rootdir: /workspace/DeepSpeed_woo/tests
configfile: pytest.ini
plugins: anyio-4.13.0
collected 67 items / 65 deselected / 2 selected

tests/unit/moe/test_autoep_unit.py::TestMoEDetection::test_autodetect_deepseek_v3_module_list_experts PASSED                                        [ 50%]
tests/unit/moe/test_autoep_unit.py::TestWeightRepacking::test_repack_module_list_separate_gate_and_up PASSED                                        [100%]

==================================================================== warnings summary =====================================================================
../../usr/local/lib/python3.13/dist-packages/torch/jit/_script.py:365: 14 warnings
  /usr/local/lib/python3.13/dist-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

unit/moe/test_autoep_unit.py::TestMoEDetection::test_autodetect_deepseek_v3_module_list_experts
  /workspace/DeepSpeed_woo/tests/conftest.py:47: UserWarning: Running test without verifying torch version, please provide an expected torch version with --torch_ver
    warnings.warn(

unit/moe/test_autoep_unit.py::TestMoEDetection::test_autodetect_deepseek_v3_module_list_experts
  /workspace/DeepSpeed_woo/tests/conftest.py:54: UserWarning: Running test without verifying cuda version, please provide an expected cuda version with --cuda_ver
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== slowest durations ====================================================================

(6 durations < 1s hidden.)
====================================================== 2 passed, 65 deselected, 16 warnings in 1.67s ======================================================
root@b949db7a21db:/workspace/DeepSpeed_woo#

root@b949db7a21db:/workspace/DeepSpeed_woo# python3.13 -m pytest tests/unit/moe/test_autoep_unit.py --collect-only -k "test_deepseek_v3_old_fused_preset_assumption_fails or test_deepseek_v3_split_expert_names_detect_successfully"
=================================================================== test session starts ===================================================================
platform linux -- Python 3.13.5, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3.13
cachedir: .pytest_cache
rootdir: /workspace/DeepSpeed_woo/tests
configfile: pytest.ini
plugins: anyio-4.13.0
collected 67 items / 65 deselected / 2 selected

<Dir tests>
  <Package unit>
    <Dir moe>
      <Module test_autoep_unit.py>
        Unit tests for AutoEP feature (all phases append test classes here).
        <Class TestMoEDetection>
          Phase 3 tests for MoE layer detection.
          <Function test_deepseek_v3_old_fused_preset_assumption_fails>
            A fused gate_up_proj assumption cannot detect DeepSeekV3 ModuleList experts.
          <Function test_deepseek_v3_split_expert_names_detect_successfully>
            Using split gate/up names detects the same DeepSeekV3 ModuleList experts.

==================================================================== warnings summary =====================================================================
../../usr/local/lib/python3.13/dist-packages/torch/jit/_script.py:365: 14 warnings
  /usr/local/lib/python3.13/dist-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================== 2/67 tests collected (65 deselected) in 1.84s ======================================================
root@b949db7a21db:/workspace/DeepSpeed_woo#

Signed-off-by: nathon <leejianwoo@gmail.com>

tohtana · 2026-05-14T01:51:09Z

Hi @nathon-lee,
Thank you for catching this issue! Sorry I have missed this PR for a long time. As I made a lot of changes, I opened and merged this fix as a separated PR.
I think the main AutoEP PR is almost ready.

fix: Fix AutoEP DeepSeekV3 preset for split ModuleList experts

c95c856

Signed-off-by: nathon <leejianwoo@gmail.com>

nathon-lee requested a review from tohtana as a code owner April 17, 2026 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix AutoEP DeepSeekV3 preset for split ModuleList experts#7

fix: Fix AutoEP DeepSeekV3 preset for split ModuleList experts#7
nathon-lee wants to merge 1 commit into
tohtana:tohtana/add_autoepfrom
nathon-lee:nathon_autoep_feature1

nathon-lee commented Apr 17, 2026 •

edited

Loading

Uh oh!

tohtana commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nathon-lee commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title:

Body:

This PR updates the deepseek_v3 preset to match the real model structure and adds regression tests for:

Test Result

Uh oh!

tohtana commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nathon-lee commented Apr 17, 2026 •

edited

Loading