docs: refresh CLAUDE.md for 2 months of changes + expr-neutral / helper-placement conventions#659
Open
jhinpan wants to merge 4 commits into
Open
docs: refresh CLAUDE.md for 2 months of changes + expr-neutral / helper-placement conventions#659jhinpan wants to merge 4 commits into
jhinpan wants to merge 4 commits into
Conversation
Agent guidance was last updated 2026-04-30 (ROCm#463); ~187 PRs have landed since. Refresh every section against the current tree and add two maintainer conventions. Every cited path was verified file-by-file against upstream/main HEAD. - Repository Layout: expr/ direct children are target-neutral; add expr/rocdl/ package (shadows legacy expr/rocdl.py), lib/Dialect/FlyROCDL/ per-subtarget split, .claude/skills/, SmemAllocator->SharedAllocator note, fix tests/python/examples comment. - Kernel Entry Points: add FP8 GEMM (4wave/8wave + utils), RDNA GEMM (gfx11*/gfx120*), AITER HGEMM (splitk/small_m), MoE routing (sorting/topk_gating + moe_common), fused-quant (qk_norm_rope_quant, silu_and_mul_fq), pa_decode_swa, communication shim. - GPU Architecture Support: add gfx11* row; document the gfx1250 is_rdna_arch()==False / get_warp_size()==64 wave32 footgun. - Build & Test / Code Style: run_benchmark subset+CSV+compare_benchmark, check_python_style.sh, the pre-checks.yaml Python+C++ CI style gate. - Environment Variables: FLYDSL_RUNTIME_RUN_ONLY, FLYDSL_COMPILE_LLVM_DIR. - Testing Notes: pytest.ini markers (multi_gpu/benchmark) + 8-GPU CI lane; drop the removed torch_mha_extend2(). - Documentation Map: external bitcode integration guide. Maintainer conventions (Kernel Authoring Conventions): - expr/ is target-neutral: direct children must not import ROCDL/HIP bindings (enforced by tests/unit/test_expr_optional_rocdl.py); new target code goes in expr/rocdl/ and is lazy-loaded (ROCm#521). - Helper placement: reuse-first; kernels_common.py vs topical *_common/_utils vs expr/utils vs flydsl/utils -- don't scatter or duplicate helpers (ROCm#388, ROCm#448). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
15 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates CLAUDE.md to better document FlyDSL’s current repo layout, kernel/arch conventions, CI/style workflows, environment variables, and benchmarking/testing guidance.
Changes:
- Expanded repo tree and kernel authoring conventions (SharedAllocator, helper placement, ROCDL lazy-loading rules).
- Added documentation pointers for extern bitcode integration, benchmarking workflows, CI style gate reproduction, and new env vars.
- Clarified arch-specific behavior (RDNA vs CDNA, gfx1250 caveats) and enriched testing notes (markers, multi-GPU gating).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ├── python/ | ||
| │ ├── flydsl/ # Python DSL core | ||
| │ │ ├── expr/ # DSL expression API: primitive, typing, arith, vector, gpu, math, rocdl, buffer_ops | ||
| │ │ ├── expr/ # DSL expression API; direct children are TARGET-NEUTRAL (typing, primitive, gpu, derived, struct, numeric, math, vector, arith, meta, extern; + utils/) |
| bash scripts/run_tests.sh # Pytest + examples + MLIR FileCheck | ||
| RUN_TESTS_FULL=1 bash scripts/run_tests.sh # Include large_shape tests | ||
| bash scripts/run_benchmark.sh # Performance benchmarks | ||
| RUN_TESTS_FULL=1 bash scripts/run_tests.sh # Include large_shape tests; this is the CI invocation (flydsl.yaml, test-whl.yaml) |
sjfeng1999
previously approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
CLAUDE.md(agent guidance) was last refreshed on 2026-04-30 (#463); ~187 PRs have merged since. This updates the file against the current tree while keeping it scoped as agent instructions, not a repo index that needs constant kernel-by-kernel synchronization.The refresh keeps durable guidance in
CLAUDE.md: commands, maintenance guardrails, architecture footguns, and routing hints that prevent likely agent mistakes. Detailed kernel catalogs stay in the docs and the current source tree.Two maintainer conventions added (Kernel Authoring Conventions)
expr/is target-neutral. The direct child modules ofpython/flydsl/expr/must stay backend-agnostic — no ROCDL/HIP imports — soimport flydsl.exprworks without the FlyROCDL bindings (enforced in CI bytests/unit/test_expr_optional_rocdl.py). New target-specific code goes in theexpr/rocdl/package and is lazy-loaded fromexpr/__init__.py(_LAZY_MODULES), never in a new top-levelexpr/*.py. (PR fix(python): lazily load ROCDL expr modules #521)kernels/kernels_common.py; domain-shared → the topical module (moe_common.py,layout_utils.py,fp8_gemm_utils.py, …); DSL numeric/type helpers →expr/utils/&expr/numeric.py; compiler/runtime-wide →flydsl/utils/. (PR [Perf] Port mixed_moe kernel optimizations for stage1/stage2 #388, Reduce redundant FlyDSL numeric wrappers #448)Staleness fixed
expr/direct-children-are-neutral note +expr/rocdl/package (shadows legacyexpr/rocdl.py);lib/Dialect/FlyROCDL/{CDNA3,CDNA4,GFX11,GFX1250}/per-subtarget split;.claude/skills/;SmemAllocator→SharedAllocator; correctedtests/python/examples/description.docs/prebuilt_kernels_guide.md, and when a rule belongs inCLAUDE.mdat all.gfx11*(RDNA3/3.5) row; documented thegfx1250footgun —is_rdna_arch()returns False andget_warp_size()returns 64, so those kernels hardcodeWAVE_SIZE = 32.run_benchmark.shsubset/--list/--output_csv+compare_benchmark.py;check_python_style.sh; thepre-checks.yamlPython (black+ruff) + C++ (clang-format-18) CI gate;RUN_TESTS_FULL=1is the CI invocation.FLYDSL_RUNTIME_RUN_ONLY(AOT-cache-only) andFLYDSL_COMPILE_LLVM_DIR(external LLVM codegen).pytest.inimarkers (multi_gpu,benchmark) + the label-gated 8-GPU CI lane; clarified the 2-GPU shmem, 4-GPU allreduce accuracy, and 8-GPU allreduce accuracy/benchmark split; removed the deletedtorch_mha_extend2()reference.181 → 216 lines after replacing the kernel file list with routing guidance.
🤖 Generated with Claude Code