[AIROCMLIR-80] Utilize amd_arch_db python bindings where applicable#2352
[AIROCMLIR-80] Utilize amd_arch_db python bindings where applicable#2352mirza-halilcevic wants to merge 32 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces duplicated/hardcoded GPU architecture capability logic in Python utilities by switching to the amd_arch_db pybind module and extends the underlying arch database with additional exposed properties.
Changes:
- Updated performance sweep/analysis scripts to query arch capabilities via
amd_arch_db(MFMA/WMMA/FP8/FP4, etc.) instead of hardcoded chip lists/prefix checks. - Added/extended
amd_arch_dbPython bindings (enum arithmetic + additional feature flags/fields) and wired the module into build/test dependencies. - Updated lit/common Python utilities to derive feature strings and booleans from
amd_arch_db.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/utils/performance/tests/test_perfRunner.py | Removes unit tests related to chiplet counting logic. |
| mlir/utils/performance/tests/mock_hip.py | Extends CI mocks to include an amd_arch_db stub module. |
| mlir/utils/performance/perfRunner.py | Switches dtype/feature gating and topology queries to amd_arch_db. |
| mlir/utils/performance/parameterSweeps.py | Infers codepaths/flags via amd_arch_db feature bits (adds has_feature). |
| mlir/utils/performance/attentionSweeps.py | Uses amd_arch_db features to pick MFMA vs WMMA for attention sweeps. |
| mlir/utils/performance/analysis/testing-metrics.py | Derives EU-per-CU via amd_arch_db instead of hardcoded value. |
| mlir/utils/performance/analysis/quickTuningGen.py | Uses amd_arch_db features to classify instruction type. |
| mlir/utils/performance/CMakeLists.txt | Adds amd_arch_db as a dependency of the copied performance scripts target. |
| mlir/unittests/Dialect/Rock/AmdArchDbTests.cpp | Extends native-vs-preset comparisons to include new fields. |
| mlir/test/lit.site.cfg.py.in | Adds tools dir to sys.path so amd_arch_db can be imported in lit utils. |
| mlir/test/fusion/e2e/lit.site.cfg.py.in | Uses shared feature detection helper instead of hardcoded arch checks. |
| mlir/test/e2e/lit.site.cfg.py.in | Adds tools dir to sys.path for amd_arch_db import. |
| mlir/test/common_utils/common.py | Replaces hardcoded feature decoding with amd_arch_db lookups. |
| mlir/test/CMakeLists.txt | Adds amd_arch_db to test target dependencies. |
| mlir/lib/Dialect/Rock/utility/Bindings/CMakeLists.txt | Builds amd_arch_db and attempts to make it globally importable via .pth. |
| mlir/lib/Dialect/Rock/utility/Bindings/AmdArchDbBindings.cpp | Enables enum arithmetic and binds additional feature bits/fields. |
| mlir/lib/Dialect/Rock/IR/AmdArchDb.cpp | Extends arch presets with hasFp4 and updates related comments. |
| mlir/include/mlir/Dialect/Rock/IR/RockAttrDefs.td | Documents new “keep in sync” relationship for feature name mapping. |
| mlir/include/mlir/Dialect/Rock/IR/AmdArchDb.h | Adds hasFp4 field to AmdArchInfo and updates constructor. |
Comments suppressed due to low confidence (2)
mlir/utils/performance/tests/test_perfRunner.py:92
- The tests for
get_num_chipletswere removed, but the function implementation was also changed substantially in this PR. Since CI now mocksamd_arch_db, consider reintroducing unit tests that validate the new behavior (e.g., by adjusting the mocklookup_arch_info/_DEFAULT_MOCK_INFO.max_num_xccand assertingget_num_chipletsandget_num_cureturn expected values).
def test_read_nonexistent_returns_none(self):
db = perfRunner.read_tuning_db("/nonexistent/path.tsv")
assert db is None
class TestParseDataTypes:
"""Tests for parse_data_types (gemm data types)."""
def test_empty_returns_defaults(self):
dtypes, out_map = perfRunner.parse_data_types(None)
mlir/utils/performance/parameterSweeps.py:122
infer_codegen_flags_from_archno longer returns('unknown', [])(it defaults to'vanilla'), but the docstring still claims it can return'unknown'and callers (e.g.main()) still branch oncodepath == 'unknown'. Either restore the'unknown'sentinel when inference fails, or update the docstring + remove/adjust the dead'unknown'handling so behavior is consistent.
def infer_codegen_flags_from_arch(arch: str,
requested_codepath: str = 'none') -> tuple[str, list[str]]:
"""Infers codepath and optional rocmlir-gen flags from architecture.
The inferred codepath is used to pick the perf_config family. By default we
rely on rocmlir-gen arch auto-detection and return no explicit feature
flags; flags are only emitted when a codepath override is explicitly
requested.
Returns ('unknown', []) when inference fails.
"""
supported_codepath = ['mfma', 'vanilla', 'wmma']
codepath = requested_codepath
if codepath not in supported_codepath:
features = lookup_arch_info(arch).default_features
if has_feature(features, GemmFeatures.MFMA):
codepath = 'mfma'
elif has_feature(features, GemmFeatures.WMMA):
codepath = 'wmma'
else:
codepath = 'vanilla'
if requested_codepath in supported_codepath:
return (codepath, get_codegen_flags_for_codepath(arch, codepath))
return (codepath, [])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
mlir/utils/performance/perfRunner.py:2316
get_num_cu()unconditionally useslookup_arch_info("native:<id>"). In builds whereROCMLIR_ENABLE_NATIVE_ARCHis disabled, the Python binding raisesValueErrorfor anynative:*lookup, so this will crash even if the caller only needs preset arch info.
Please catch that error and fall back to a non-native CU query (e.g. HIP hipGetDeviceProperties(...).multiProcessorCount, or the previous rocminfo-based approach) so perfRunner.py remains usable when native arch support is not compiled in.
def get_num_cu(device_id: int = 0):
# In native mode, min_num_cu contains the actual number of CUs instead of the minimum
return lookup_arch_info(f"native:{device_id}").min_num_cu
mlir/utils/performance/analysis/testing-metrics.py:34
assign_num_cu()unconditionally callslookup_arch_info("native:0"). WhenROCMLIR_ENABLE_NATIVE_ARCHis OFF, the binding raisesValueErrorfor native lookups, so the script will fail even on systems with ROCm installed.
Please catch that error and fall back to another CU count source (e.g. HIP multiProcessorCount, or preset arch info with a warning about accuracy).
def assign_num_cu():
if args.c:
return int(args.c)
print(
"Using info from GPU 0 in your system, the data should have be obtained from the same GPU.")
return lookup_arch_info("native:0").min_num_cu
There was a problem hiding this comment.
Verdict: APPROVE -- submitted as COMMENT (automated reviews are advisory) · Findings: 1 (0 Critical, 0 Major, 1 Minor)
Scope
Replaces hardcoded arch-feature/CU/XCC lookups across the perf scripts, mlir/test/common_utils/common.py, and lit site configs with calls into the new amd_arch_db pybind module; adds a hasFp4 field to AmdArchInfo (and updates every preset + unittest expectation), wires the bindings to raise py::value_error for native:* when ROCMLIR_ENABLE_NATIVE_ARCH=OFF, and gates add_dependencies(... amd_arch_db) on TARGET amd_arch_db so non-WIN32-without-Python builds still work. A mock_hip.py-style mock of the C++ extension is added so test_perfRunner can import without ROCm.
Findings
- Minor: comment block inside the
lookup_arch_infolambda is under-indented (mlir/lib/Dialect/Rock/utility/Bindings/AmdArchDbBindings.cpp:67).
Notes
features_to_stringinmlir/test/common_utils/common.pynow emits feature names in the bindings'.value(...)insertion order, which differs from the old hardcoded spellings (e.g.atomic_add_bf16|atomic_add_f16rather thanatomic_add_f16|atomic_add_bf16for gfx950). The author has flagged the constraint in a sync comment; this is fine as long as lit tests that match on the exact string have been updated — worth one more spot-check before merge.throw py::value_error(...)in the binding is the idiomatic pybind11 way to surface a Python-levelValueErrorand is not the kind of C++ exception the rocMLIR coding rules forbid; no change needed.- Mock
_MockGemmFeaturesbit positions inmlir/utils/performance/tests/mock_hip.pymust stay in sync withGemmFeaturesinRockAttrDefs.td(already called out in a comment); if the enum gets a new bit, the mock and the binding's.value(...)chain both need updates.
CI status
No non-pipeline failing checks in /tmp/pr/checks.json — Jenkins, Python performance script tests, Python format/lint, and the Windows component build are all green; the remaining PENDING entries are heavy downstream jobs.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2352 +/- ##
===========================================
+ Coverage 79.50% 82.03% +2.52%
===========================================
Files 100 119 +19
Lines 31016 42665 +11649
Branches 4819 7082 +2263
===========================================
+ Hits 24659 34997 +10338
- Misses 4245 5055 +810
- Partials 2112 2613 +501
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Motivation
Arch information is usually hardcoded and duplicated across python scripts.
Technical Details
Take advantage of the amd_arch_db pybind11 module.
Test Plan
CI passes.
Test Result
Submission Checklist