[inductor][rocm] make AMD MM matrix_instr_nonkdim configurable#3234
Open
reger-men wants to merge 1 commit into
Open
[inductor][rocm] make AMD MM matrix_instr_nonkdim configurable#3234reger-men wants to merge 1 commit into
reger-men wants to merge 1 commit into
Conversation
|
Jenkins build for 78e99ac5b00742298bb83fe8d49b1a3d5991856c commit finished as FAILURE |
78e99ac to
e2926f2
Compare
|
Jenkins build for e2926f2d3ada9da02d55fc19a97bd0028fe6f1f5 commit finished as FAILURE |
e2926f2 to
f26feec
Compare
Adds torch._inductor.config.rocm.mfma_nonkdim, reading the env var TORCHINDUCTOR_MFMA_NONKDIM. The AMD MM Triton template autotune sweep is now driven from this knob rather than being hard-coded to [0, 16]. Default behaviour is unchanged on ROCm; ignored on other backends. Recognised values: unset upstream default ([0, 16] sweep, ROCmGemmConfig default 16) "0" / "16" / "32" force a single value; sweep collapses to [value] "auto" extend the autotune sweep to [0, 16, 32]; default stays 16 mfma_32x32x*_bf16 is only emitted when 32 is in the sweep, so "auto" is the safe opt-in for shapes where the mfma_32 path might win. Test under test/inductor/test_amd_mfma_nonkdim_config.py covers all modes (unset / forced int / "auto" / garbage) by patching the config attribute in-process via torch._inductor.config.patch, plus a subprocess probe that spawns a fresh Python with the env var set to exercise the import-time env parser.
|
Jenkins build for f26feec7ab0bf69d6f3b70be5a08fd354f690cc8 commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
torch._inductor.config.rocm.mfma_nonkdim, reading the env varTORCHINDUCTOR_MFMA_NONKDIM. The AMD MM Triton template autotune sweep is now driven from this knob rather than being hard-coded to[0, 16]. Default behaviour is unchanged on ROCm; ignored on other backends.Recognised values:
matrix_instr_nonkdim[0, 16]0/16/32[value]auto[0, 16, 32]mfma_32x32x*_bf16is only emitted when 32 is in the sweep, soautois the safe opt-in for shapes where the mfma_32 path might win and32forces it on. Per-workload tuning knob, do not set system-wide.Test plan
test_amd_mfma_nonkdim_config.pycovers unset / forced 0 / forced 16 / forced 32 /auto/ garbage viatorch._inductor.config.patch0,16,32,auto,AUTO, an empty string, and a non-integer