ck_tile grouped gemm: more padding by matthiasdiener · Pull Request #574 · ROCm/TransformerEngine

matthiasdiener · 2026-05-05T00:15:07Z

Description

Enabling padding always causes a significant (~15%) reduction in speed, so only enable it when necessary.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…-gemm-padding

aris134

LGTM!

…m-padding

aris134 · 2026-05-27T03:18:47Z

Quick follow-up question: are there certain padding cases/shapes where we should prefer fallback due to the performance penalty of the padded path?

matthiasdiener · 2026-05-27T15:40:53Z

Quick follow-up question: are there certain padding cases/shapes where we should prefer fallback due to the performance penalty of the padded path?

I looked at this briefly but could not find a config where this would be profitable, at least for bf16.

ipanfilo · 2026-05-29T15:20:58Z

+        )
+
+        for o, o_ref in zip(out, out_ref):
+            if IS_HIP_EXTENSION and accumulate and dtype == torch.bfloat16 and get_device_compute_capability() == (9, 4):


The test itself is IS_HIP_EXTENSION only

ipanfilo · 2026-05-30T04:39:40Z

+        n_val = unaligned_n if "N" in pad_dim else n_aligned
+
+        total_m = sum(m_vals)
+        os.environ["NVTE_USE_CUTLASS_GROUPED_GEMM"] = "1"


nit: better use monkeypath to make sure the envs are cleared if tests fails

ipanfilo · 2026-05-30T04:42:15Z

+        # M: not multiples of tile (256), varies per group.
+        # N: multiple of 16 but not multiple of tile (128).
+        unaligned_k = 2016
+        unaligned_m = [100, 300, 150, 200, 50, 350, 250, 180]


I think z should be derived as len of unaligned_m, or it should be asserted that they are equal

ipanfilo · 2026-05-30T04:52:38Z

But why separate translation units are needed for every ck_tile_grouped_gemm_fp16_dispatch_* and why the methods themselves are needed instead of directly calling ck_tile_grouped_gemm_fp16_dispatch_layout<>() ?

ck_tile grouped gemm: more padding

95f984c

matthiasdiener requested a review from sudhu2k May 5, 2026 00:15

matthiasdiener self-assigned this May 5, 2026

matthiasdiener requested review from aris134 May 5, 2026 15:36

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16.cpp

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_common.h Outdated

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 requested changes May 6, 2026

View reviewed changes

matthiasdiener added the ci-level 1 CI test level 1 label May 15, 2026

matthiasdiener added 3 commits May 15, 2026 19:57

Merge branch 'dev' into mdiener/cktile-grouped-gemm-padding

cfbc537

address review comments

225c3dc

NT workaround, split, address review comments

2939017

matthiasdiener marked this pull request as ready for review May 15, 2026 22:24

matthiasdiener requested review from ipanfilo, wangye805 and wenchenvincent as code owners May 15, 2026 22:24

matthiasdiener requested a review from aris134 May 15, 2026 22:24

aris134 reviewed May 19, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16_nn.cpp

aris134 requested changes May 19, 2026

View reviewed changes

matthiasdiener added 2 commits May 19, 2026 15:51

Merge remote-tracking branch 'origin/dev' into mdiener/cktile-grouped…

fa87ccc

…-gemm-padding

factor out templating

01f62d0

matthiasdiener requested a review from aris134 May 19, 2026 19:13

aris134 reviewed May 20, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 reviewed May 20, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 reviewed May 20, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 reviewed May 20, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16.cpp Outdated

aris134 reviewed May 20, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 requested changes May 20, 2026

View reviewed changes

matthiasdiener added 2 commits May 20, 2026 16:59

address review comments, capture fallbacks

aee2c4c

Merge remote-tracking branch 'origin/dev' into mdiener/cktile-grouped…

f830b89

…-gemm-padding

matthiasdiener requested a review from aris134 May 20, 2026 22:06

aris134 approved these changes May 21, 2026

View reviewed changes

Merge remote-tracking branch 'origin' into mdiener/cktile-grouped-gem…

2751b2a

…m-padding

matthiasdiener added ci-level 3 CI test level 3 and removed ci-level 1 CI test level 1 labels May 23, 2026

fix env reset

a59a4ae

ipanfilo requested changes May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ck_tile grouped gemm: more padding#574

ck_tile grouped gemm: more padding#574
matthiasdiener wants to merge 10 commits into
devfrom
mdiener/cktile-grouped-gemm-padding

matthiasdiener commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aris134 left a comment

Uh oh!

aris134 commented May 27, 2026

Uh oh!

matthiasdiener commented May 27, 2026

Uh oh!

ipanfilo May 29, 2026

Uh oh!

ipanfilo May 30, 2026

Uh oh!

ipanfilo May 30, 2026

Uh oh!

ipanfilo May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthiasdiener commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aris134 left a comment

Choose a reason for hiding this comment

Uh oh!

aris134 commented May 27, 2026

Uh oh!

matthiasdiener commented May 27, 2026

Uh oh!

ipanfilo May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ipanfilo May 30, 2026

Choose a reason for hiding this comment

Uh oh!

ipanfilo May 30, 2026

Choose a reason for hiding this comment

Uh oh!

ipanfilo May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthiasdiener commented May 5, 2026 •

edited

Loading