fix(moe): route SwiGLU MXFP4 unshuffled weights to CK-Tile instead of CK2stages#3518
Open
srinivamd wants to merge 4 commits into
Open
fix(moe): route SwiGLU MXFP4 unshuffled weights to CK-Tile instead of CK2stages#3518srinivamd wants to merge 4 commits into
srinivamd wants to merge 4 commits into
Conversation
… CK2stages The old CK2stages codegen (gen_instances.py) only supports silu/gelu activations. Passing swiglu causes it to never generate gemm_moe_ck2stages_lookup.h, crashing with a hipcc fatal error. When serving gpt-oss-20b-w-mxfp4-a-bf16 (SwiGLU + MXFP4) with unshuffled HuggingFace weights (is_shuffled=False), the dispatch in fused_moe.py falls through three guards: 1. FlyDSL guard requires is_shuffled=True - fails 2. CK-Tile heuristic excludes Swiglu+fp4x2 activations - skipped 3. CK2stages fallthrough accepts fp4x2 weights - matches then crashes Insert a catch-all guard before the CK2stages fallthrough that routes SwiGLU + MXFP4 combinations to CK-Tile, which already supports swiglu (act_dict["swiglu"] = 2 in CK Tile gen_instances.py). Fixes: ROCM-25478
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
The previous commit accidentally deleted the CK2stages if-block body (condition continuation, flydsl/cktile/ck2stages stage2 dispatch, and return) when inserting the new SwiGLU guard above it. This left a truncated `if` expression with `return` inside the condition — a syntax error caught by ruff CI. Restore the full CK2stages block from main so non-SwiGLU MXFP4 paths (and all other CK2stages-eligible configurations) continue to work.
Author
|
Fixed in the latest push — the prior commit accidentally truncated the CK2stages |
Author
|
@ROCm/team_aiter please review and merge if it is ok |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes ROCM-25478: AITER JIT build crash (
fatal error: 'gemm_moe_ck2stages_lookup.h' file not found) when servingamd/gpt-oss-20b-w-mxfp4-a-bf16withVLLM_ROCM_USE_AITER=1on MI355 (gfx950).Problem
AITER has two CK MoE 2-stages codegen systems:
csrc/ck_gemm_moe_2stages_codegen/choices=["silu", "gelu"])csrc/ck_tile_gemm_moe_2stages/choices=["silu", "gelu", "swiglu"])When serving gpt-oss-20b (SwiGLU + MXFP4) with unshuffled HuggingFace weights (
is_shuffled=False), the dispatch infused_moe.pyfalls through three guards:is_shuffled=True→ fails for unshuffled weightsSwiglu + fp4x2activations → skippedfp4x2weights → matches → invokes old CK codegen with--activation swiglu→ argparse rejects → nogemm_moe_ck2stages_lookup.hgenerated → hipcc crashPRs #2972, #3123, #3153 added FlyDSL SwiGLU MXFP4 support but the FlyDSL guard requires
is_shuffled=True, so the fix is unreachable for unshuffled gpt-oss weights.Fix
Insert a new dispatch guard between the CK-Tile heuristic and the CK2stages fallthrough that catches SwiGLU + MXFP4 combinations and routes them to CK Tile (which already supports swiglu via
act_dict["swiglu"] = 2incsrc/ck_tile_gemm_moe_2stages/gen_instances.py) instead of falling through to the old CK codegen (which structurally cannot handle swiglu).The guard matches:
activation == Swiglu AND q_dtype_w == fp4x2 AND q_type == per_1x32 AND dtype in [bf16, fp16] AND no explicit kernelName1.Test plan
Related
is_shuffled=True)