[Docs] Onboarding notebooks (1/n): expr foundation#635
Open
jhinpan wants to merge 7 commits into
Open
Conversation
Interactive, bottom-up onboarding notebooks for the flydsl.expr foundation,
bridging a newcomer to the existing examples/. This first set (1/n) covers:
- 00_hello_flydsl - the @flyc.kernel / @flyc.jit trace model; reading dumped IR
- 01_numeric_types - the scalar type system (ints, floats, bf16/fp8), casts,
promotion, and Constexpr vs runtime values
- 02_struct - @fx.struct aggregate value types and their memory layout
- 03_universal_ops - the target-agnostic Universal* atoms, with a fully-universal
vector-add capstone validated against torch
Layout algebra (make_layout / logical_divide / tiled copy / MMA) is intentionally
deferred to a follow-up series. All cells were run-verified on an MI350X (gfx950)
and committed with outputs cleared. A short README indexes the series.
Refs ROCm#573.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity)
Sharpen the series for agent consumers (fast ramp, fewer source lookups):
- README: add a flydsl.expr API cheat-sheet (kernel/jit/launch, scalars, structs,
copy atoms + register tensors) and the three printf/wurlitzer/Constexpr gotchas,
so the whole foundation is reachable in one place.
- 03_universal_ops:
- explain the host->device tensor handoff (raw torch tensor vs from_dlpack +
mark_layout_dynamic) and annotate the jit C param as fx.Tensor for consistency;
- bridge nb01's '+' operator to nb03's register-tensor
memref_load_vec / arith.addf / memref_store_vec compute;
- stop calling UniversalFMA 'the MMA atom' (it has no real usage; MMA lowers via
rocdl.MFMA) in the atom-family list and the Recap;
- fix the pass name convert_fly_to_rocdl -> convert-fly-to-rocdl.
Re-ran all four notebooks on a current gfx950 build: 00/01/02/03 clean, vadd matches
torch, IR shows universal_copy<32>. Outputs committed cleared.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…c metadata) Acted on the substantive Copilot comments: - Close the IR-dump file reads with `with open(...)` in nb00 and nb03 (C6). - Pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR after the dump cell so the env we set doesn't linger for later cells (C4/C5) -- without the suggested try/finally, which would be defensive noise for a teaching cell. - Strip per-cell `metadata.execution` timestamps from all four notebooks so re-running doesn't churn the diff (C7); matches the outputs-cleared convention. Did not act on the false positives: the README table uses single-pipe rows (not '||'); `block=[...]` and a dynamic `n: fx.Int32` launcher arg both match the canonical examples/01-vectorAdd.py and run clean on gfx950. Re-ran all four on a current build: 00/01/02/03 clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a set of onboarding notebooks introducing the flydsl.expr foundation and a README to guide users through running them.
Changes:
- Added four onboarding Jupyter notebooks (00–03) covering kernels/JIT, numeric types, structs, and universal atoms.
- Added
examples/notebooks/README.mdwith an ordered notebook index, an API cheat-sheet, and runtime instructions.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/notebooks/README.md | Introduces the notebook series, provides a compact API cheat-sheet, and documents how to run notebooks. |
| examples/notebooks/00_hello_flydsl.ipynb | Notebook 00: explains @flyc.kernel/@flyc.jit and how to dump/inspect IR. |
| examples/notebooks/01_numeric_types.ipynb | Notebook 01: documents scalar type families, operations, casts, and Constexpr. |
| examples/notebooks/02_struct.ipynb | Notebook 02: introduces @fx.struct, layout queries, and struct usage patterns. |
| examples/notebooks/03_universal_ops.ipynb | Notebook 03: explains Universal* atoms and demonstrates a vector-add capstone. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+10
to
+13
| | # | Notebook | Topic | | ||
| |---|----------|-------| | ||
| | 00 | [`00_hello_flydsl.ipynb`](00_hello_flydsl.ipynb) | the `@flyc.kernel` / `@flyc.jit` model; reading dumped IR | | ||
| | 01 | [`01_numeric_types.ipynb`](01_numeric_types.ipynb) | scalar types: ints, floats, `bf16`/`fp8`, casts, `Constexpr` | |
Comment on lines
+151
to
+152
| "A = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n", | ||
| "B = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n", |
Comment on lines
+113
to
+115
| "def vadd_kernel(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, block_dim: fx.Constexpr[int]):\n", | ||
| " bid = fx.block_idx.x\n", | ||
| " tid = fx.thread_idx.x\n", |
Comment on lines
+119
to
+121
| " tA = fx.logical_divide(A, fx.make_layout(block_dim, 1))\n", | ||
| " tB = fx.logical_divide(B, fx.make_layout(block_dim, 1))\n", | ||
| " tC = fx.logical_divide(C, fx.make_layout(block_dim, 1))\n", |
Comment on lines
+133
to
+134
| " fx.copy_atom_call(copy, fx.slice(tA, (None, tid)), rA) # global -> register\n", | ||
| " fx.copy_atom_call(copy, fx.slice(tB, (None, tid)), rB)\n", |
| " vC = fx.arith.addf(fx.memref_load_vec(rA), fx.memref_load_vec(rB))\n", | ||
| " fx.memref_store_vec(vC, rC)\n", | ||
| "\n", | ||
| " fx.copy_atom_call(copy, rC, fx.slice(tC, (None, tid))) # register -> global\n", |
Comment on lines
+145
to
+146
| " grid_x = (n + block_dim - 1) // block_dim\n", | ||
| " vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n", |
| "def vadd(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, n: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n", | ||
| " block_dim = 64\n", | ||
| " grid_x = (n + block_dim - 1) // block_dim\n", | ||
| " vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n", |
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR (1/n) toward the onboarding notebook series requested in #573. Rather than jumping straight to vector-add + Layout 101, this set builds the
flydsl.exprfoundation bottom-up and stops before layout algebra (deferred to a follow-up series), so the later layout material rests on solid primitives.Four notebooks in
examples/notebooks/:00_hello_flydsl@flyc.kernel/@flyc.jittrace model; reading dumped IR (FLYDSL_DUMP_IR)01_numeric_typesbf16/fp8), casts, promotion,Constexprvs runtime02_struct@fx.structaggregate value types and their C-style memory layout03_universal_opsUniversal*atoms + a fully-universal vector-add capstone (validated vs torch)Emphasis throughout is arch-neutrality: the capstone moves data with
UniversalCopy32b(norocdl/CDNA-specific atoms), and the IR peek shows the!fly.universal_copy<32>op before it specializes inconvert_fly_to_rocdl.Notes
wurlitzer(pip install jupyter wurlitzer) to show GPUprintfinline — Jupyter doesn't capture device stdout on its own. Seeexamples/notebooks/README.md.nbsphinxdocs rendering, annbmakeexecute-check CI job, and the layout/MMA notebook series.Test plan
torch(torch.allclose)examples/notebooks/) and whether to wirenbmakeCI now or in the follow-upRefs #573.
🤖 Generated with Claude Code
Recreated from #584 after the original head repository was accidentally deleted; this branch is restored at the old PR head commit fad0bc7.