Skip to content

Commit 7029d07

Browse files
authored
Expand model support (#33)
* extend model support * extend model support * test * update doc * add filter control
1 parent c1f4bca commit 7029d07

16 files changed

Lines changed: 2545 additions & 35 deletions

README.md

Lines changed: 49 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -33,24 +33,46 @@ from defuser import convert_model, replace_fused_blocks
3333
```
3434

3535
- `replace_fused_blocks(model_type)` patches supported HF model classes before `from_pretrained()` or direct model construction.
36-
- `convert_model(model, cleanup_original=True, max_layers=None)` converts an already loaded model in place. This is the runtime defusion path used for `qwen3_5_moe` style checkpoints.
36+
- `convert_model(model, cleanup_original=True, max_layers=None, filter=None)` converts an already loaded model in place. This is the runtime defusion path for supported post-load expert and MLP conversions, including `qwen3_5_moe` style checkpoints.
3737
- Defuser is designed and CI-tested for `transformers>=5.3.0`, and support is only offered for that version range. Older versions log a warning on these public APIs and are skipped as unsupported.
3838

39+
`filter` is an optional list of PCRE regex rules evaluated against full module paths such as `model.layers.0.mlp.experts`:
40+
41+
- `+:regex` explicitly includes matching candidate module paths
42+
- `-:regex` explicitly excludes matching candidate module paths
43+
- `regex` is shorthand for `+:regex`
44+
- negative rules take priority over positive rules
45+
- when `filter` is provided, a candidate module is defused only if it matches at least one positive rule and no negative rules
46+
3947
## Supported Models
4048

41-
| Model type | Recommended entrypoint | Defused op performed |
49+
Defuser currently supports the following `transformers==5.3.0` `model_type` values.
50+
51+
### `replace_fused_blocks(model_type)` before load
52+
53+
| Model type | Defused op performed |
54+
| --- | --- |
55+
| `glm4_moe` | Replaces `Glm4MoeMoE` with a defused per-expert linear MoE block. |
56+
| `glm4v` | Replaces the fused text MLP with split `gate_proj`, `up_proj`, and `down_proj` layers. Also splits fused checkpoint `mlp.gate_up_proj.weight` into `mlp.gate_proj.weight` + `mlp.up_proj.weight`. |
57+
| `mixtral` | Replaces `MixtralSparseMoeBlock` with `LinearMixtralSparseMoeBlock`. Also remaps legacy Mixtral checkpoint keys and splits fused expert `gate_up_proj` tensors into per-expert `gate_proj` and `up_proj`, plus per-expert `down_proj`. |
58+
| `qwen2_moe` | Replaces `Qwen2MoeSparseMoeBlock` with a defused per-expert linear MoE block. |
59+
| `qwen3_moe` | Replaces `Qwen3MoeSparseMoeBlock` with a defused per-expert linear MoE block. |
60+
| `qwen3_next` | Replaces `Qwen3NextSparseMoeBlock` with a defused per-expert linear MoE block. |
61+
| `qwen3_omni_moe` | Replaces both thinker and talker text sparse MoE blocks with defused per-expert linear blocks and applies small runtime compatibility patches for text `forward()` and `generate()`. |
62+
63+
### `convert_model(model)` after load
64+
65+
| Pattern | Supported model types | Defused op performed |
4266
| --- | --- | --- |
43-
| `mixtral` | `replace_fused_blocks("mixtral")` before load | Replaces `MixtralSparseMoeBlock` with `LinearMixtralSparseMoeBlock`. Also remaps legacy Mixtral checkpoint keys and splits fused expert `gate_up_proj` tensors into per-expert `gate_proj` and `up_proj`, plus per-expert `down_proj`. |
44-
| `qwen2_moe` | `replace_fused_blocks("qwen2_moe")` before load | Replaces `Qwen2MoeSparseMoeBlock` with a defused per-expert linear MoE block. |
45-
| `qwen3_moe` | `replace_fused_blocks("qwen3_moe")` before load | Replaces `Qwen3MoeSparseMoeBlock` with a defused per-expert linear MoE block. |
46-
| `qwen3_5_moe` | `convert_model(model)` after load | Runtime expert tensor defusion. Splits fused `gate_up_proj` into `gate_proj` + `up_proj` and converts 3D expert tensors into numbered expert `nn.Linear` modules. |
47-
| `qwen3_5_moe_text` | `convert_model(model)` after load | Same runtime expert tensor defusion path as `qwen3_5_moe`, applied to the text-only backbone. |
48-
| `qwen3_next` | `replace_fused_blocks("qwen3_next")` before load | Replaces `Qwen3NextSparseMoeBlock` with a defused per-expert linear MoE block. |
49-
| `qwen3_omni_moe` | `replace_fused_blocks("qwen3_omni_moe")` before load | Replaces the thinker text sparse MoE block with a defused per-expert linear block and applies small runtime compatibility patches for text `forward()` and `generate()`. |
50-
| `glm4_moe` | `replace_fused_blocks("glm4_moe")` before load | Replaces `Glm4MoeMoE` with a defused per-expert linear MoE block. |
51-
| `glm4v` | `replace_fused_blocks("glm4v")` before load | Replaces the fused text MLP with split `gate_proj`, `up_proj`, and `down_proj` layers. Also splits fused checkpoint `mlp.gate_up_proj.weight` into `mlp.gate_proj.weight` + `mlp.up_proj.weight`. |
52-
| `gpt_oss` | `convert_model(model)` after load | Runtime expert tensor defusion. Splits fused transposed expert `gate_up_proj` into per-expert `gate_proj` + `up_proj`, carries over expert biases, and converts fused expert tensors into numbered expert `nn.Linear` modules. |
53-
| `llama4` | `convert_model(model)` after load | Runtime expert tensor defusion. Splits fused transposed expert `gate_up_proj` into per-expert `gate_proj` + `up_proj`, converts fused expert tensors into numbered expert `nn.Linear` modules, and preserves the llama4 batched expert-input execution contract. |
67+
| Standard routed expert tensors | `deepseek_v2`, `dots1`, `ernie4_5_moe`, `ernie4_5_vl_moe`, `exaone_moe`, `flex_olmo`, `glm4_moe_lite`, `glm4v_moe`, `hunyuan_v1_moe`, `jamba`, `lfm2_moe`, `minimax`, `minimax_m2`, `olmoe`, `qwen3_vl_moe`, `solar_open` | Splits fused expert tensors into numbered expert `nn.Linear` modules with per-expert `gate_proj`, `up_proj`, and `down_proj`. |
68+
| Mixed sparse and shared experts | `deepseek_v3`, `glm_moe_dsa`, `qwen3_5_moe`, `qwen3_5_moe_text` | Runtime expert tensor defusion for routed experts while preserving the model's shared-expert path. |
69+
| Transposed or packed expert tensors | `gpt_oss`, `phimoe` | Splits transposed fused expert `gate_up_proj` tensors into per-expert `gate_proj` + `up_proj`, preserves expert bias when present, and converts expert tensors into numbered expert `nn.Linear` modules. |
70+
| Flattened expert layout | `dbrx` | Rebuilds the flattened DBRX expert FFN weights into numbered expert `gate_proj`, `up_proj`, and `down_proj` `nn.Linear` modules. |
71+
| Batched expert-input execution | `llama4` | Runtime expert tensor defusion plus preservation of the llama4 batched expert-input execution contract. |
72+
| Non-gated expert MLPs | `nemotron_h` | Converts routed expert tensors into numbered `up_proj` and `down_proj` `nn.Linear` modules for non-gated experts. |
73+
| Parallel expert blocks | `granitemoe`, `granitemoehybrid`, `granitemoeshared`, `jetmoe` | Converts packed expert weight tensors into numbered expert `linear` modules while keeping grouped expert execution intact. |
74+
| Routed experts with identity experts | `longcat_flash` | Defuses routed experts into numbered `gate_proj`, `up_proj`, and `down_proj` modules and preserves zero or identity experts. |
75+
| Fused dense `gate_up_proj` MLPs | `dia`, `glm`, `glm4`, `glm_image`, `glm_ocr`, `phi3`, `phi4_multimodal`, `zamba2` | Splits fused dense `gate_up_proj` layers into `gate_proj` + `up_proj` and updates the block `forward()` to preserve the original MLP math. |
5476

5577
## Workflow Summary
5678

@@ -77,6 +99,20 @@ converted = convert_model(model)
7799
print(converted) # True when runtime defusion happened
78100
```
79101

102+
Use `filter` when only specific blocks should be defused:
103+
104+
```python
105+
from defuser import convert_model
106+
107+
convert_model(
108+
model,
109+
filter=[
110+
r"+:^model\.layers\.0\.mlp\.experts$",
111+
r"-:^model\.layers\.0\.mlp\.experts\.shared_",
112+
],
113+
)
114+
```
115+
80116
## Real Qwen3.5 MoE Example
81117

82118
The example below is written for the `transformers==5.3.0` public API surface and uses the real Hugging Face model `Qwen/Qwen3.5-35B-A3B-Instruct`. Defuser supports `transformers>=5.3.0`.

defuser/defuser.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ def convert_model(
117117
model: nn.Module,
118118
cleanup_original: bool = False,
119119
max_layers: int | None = None,
120+
filter: list[str] | None = None,
120121
) -> bool:
121122
"""Convert one loaded model in place from fused experts to defused modules."""
122123
if warn_if_public_api_transformers_unsupported("convert_model()", logger):
@@ -200,7 +201,7 @@ def convert_model(
200201
if not check_model_compatibility(model):
201202
return False
202203

203-
apply_model_patches(model)
204+
apply_model_patches(model, max_layers=max_layers, filter_rules=filter)
204205

205206
# If fused blocks have already been structurally replaced at load model before,
206207
# there is no need to perform runtime defusing again
@@ -214,6 +215,7 @@ def convert_model(
214215
model,
215216
cleanup_original=cleanup_original,
216217
max_layers=max_layers,
218+
filter_rules=filter,
217219
)
218220

219221
return True

defuser/model_registry.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,39 @@ class PATCH(str, Enum):
1616

1717

1818
MODEL_CONFIG = {
19+
"dbrx": {
20+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
21+
},
22+
"deepseek_v2": {
23+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
24+
},
25+
"deepseek_v3": {
26+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
27+
},
28+
"dia": {
29+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
30+
},
31+
"dots1": {
32+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
33+
},
34+
"ernie4_5_moe": {
35+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
36+
},
37+
"ernie4_5_vl_moe": {
38+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
39+
},
40+
"exaone_moe": {
41+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
42+
},
43+
"flex_olmo": {
44+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
45+
},
46+
"glm": {
47+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
48+
},
49+
"glm4": {
50+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
51+
},
1952
"mixtral": {
2053
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
2154
PATCH.REPLACE_MODULE: [
@@ -84,6 +117,10 @@ class PATCH(str, Enum):
84117
(
85118
"transformers.models.qwen3_omni_moe.modeling_qwen3_omni_moe.Qwen3OmniMoeThinkerTextSparseMoeBlock",
86119
"defuser.modeling.unfused_moe.qwen3_omni_moe.LinearQwen3OmniMoeThinkerTextSparseMoeBlock",
120+
),
121+
(
122+
"transformers.models.qwen3_omni_moe.modeling_qwen3_omni_moe.Qwen3OmniMoeTalkerTextSparseMoeBlock",
123+
"defuser.modeling.unfused_moe.qwen3_omni_moe.LinearQwen3OmniMoeTalkerTextSparseMoeBlock",
87124
)
88125
],
89126
},
@@ -96,6 +133,9 @@ class PATCH(str, Enum):
96133
)
97134
],
98135
},
136+
"glm4_moe_lite": {
137+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
138+
},
99139
"glm4v": {
100140
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
101141
PATCH.REPLACE_MODULE: [
@@ -116,9 +156,39 @@ class PATCH(str, Enum):
116156
),
117157
],
118158
},
159+
"glm4v_moe": {
160+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
161+
},
162+
"glm_image": {
163+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
164+
},
165+
"glm_moe_dsa": {
166+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
167+
},
168+
"glm_ocr": {
169+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
170+
},
119171
"gpt_oss": {
120172
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
121173
},
174+
"granitemoe": {
175+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
176+
},
177+
"granitemoehybrid": {
178+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
179+
},
180+
"granitemoeshared": {
181+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
182+
},
183+
"hunyuan_v1_moe": {
184+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
185+
},
186+
"jamba": {
187+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
188+
},
189+
"jetmoe": {
190+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
191+
},
122192
"llama4": {
123193
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
124194
PATCH.EXPERTS_DEFUSE: [
@@ -128,7 +198,40 @@ class PATCH(str, Enum):
128198
}
129199
],
130200
},
201+
"lfm2_moe": {
202+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
203+
},
204+
"longcat_flash": {
205+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
206+
},
207+
"minimax": {
208+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
209+
},
210+
"minimax_m2": {
211+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
212+
},
213+
"nemotron_h": {
214+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
215+
},
216+
"olmoe": {
217+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
218+
},
219+
"phi3": {
220+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
221+
},
222+
"phi4_multimodal": {
223+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
224+
},
131225
"phimoe": {
132226
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
133227
},
228+
"qwen3_vl_moe": {
229+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
230+
},
231+
"solar_open": {
232+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
233+
},
234+
"zamba2": {
235+
"min_transformers_version": MIN_SUPPORTED_TRANSFORMERS_VERSION,
236+
},
134237
}

0 commit comments

Comments
 (0)