[Docs] Onboarding notebooks (1/n): expr foundation by jhinpan · Pull Request #635 · ROCm/FlyDSL

jhinpan · 2026-06-03T08:04:15Z

Summary

First PR (1/n) toward the onboarding notebook series requested in #573. Rather than jumping straight to vector-add + Layout 101, this set builds the flydsl.expr foundation bottom-up and stops before layout algebra (deferred to a follow-up series), so the later layout material rests on solid primitives.

Four notebooks in examples/notebooks/:

#	Notebook	Topic
00	`00_hello_flydsl`	the `@flyc.kernel` / `@flyc.jit` trace model; reading dumped IR (`FLYDSL_DUMP_IR`)
01	`01_numeric_types`	scalar type system (ints, floats, `bf16`/`fp8`), casts, promotion, `Constexpr` vs runtime
02	`02_struct`	`@fx.struct` aggregate value types and their C-style memory layout
03	`03_universal_ops`	target-agnostic `Universal*` atoms + a fully-universal vector-add capstone (validated vs torch)

Emphasis throughout is arch-neutrality: the capstone moves data with UniversalCopy32b (no rocdl/CDNA-specific atoms), and the IR peek shows the !fly.universal_copy<32> op before it specializes in convert_fly_to_rocdl.

Notes

Run-verified end-to-end on an MI350X (gfx950); committed with outputs cleared for clean diffs (re-run to populate).
Notebooks need wurlitzer (pip install jupyter wurlitzer) to show GPU printf inline — Jupyter doesn't capture device stdout on its own. See examples/notebooks/README.md.
Deferred to follow-ups to keep this PR small/reviewable: nbsphinx docs rendering, an nbmake execute-check CI job, and the layout/MMA notebook series.

Test plan

All four notebooks execute top-to-bottom with no errors on gfx950
Capstone matches torch (torch.allclose)
Reviewer: confirm location (examples/notebooks/) and whether to wire nbmake CI now or in the follow-up

Refs #573.

🤖 Generated with Claude Code

Recreated from #584 after the original head repository was accidentally deleted; this branch is restored at the old PR head commit fad0bc7.

@fx

Interactive, bottom-up onboarding notebooks for the flydsl.expr foundation, bridging a newcomer to the existing examples/. This first set (1/n) covers: - 00_hello_flydsl - the @flyc.kernel / @flyc.jit trace model; reading dumped IR - 01_numeric_types - the scalar type system (ints, floats, bf16/fp8), casts, promotion, and Constexpr vs runtime values - 02_struct - @fx.struct aggregate value types and their memory layout - 03_universal_ops - the target-agnostic Universal* atoms, with a fully-universal vector-add capstone validated against torch Layout algebra (make_layout / logical_divide / tiled copy / MMA) is intentionally deferred to a follow-up series. All cells were run-verified on an MI350X (gfx950) and committed with outputs cleared. A short README indexes the series. Refs ROCm#573. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rity) Sharpen the series for agent consumers (fast ramp, fewer source lookups): - README: add a flydsl.expr API cheat-sheet (kernel/jit/launch, scalars, structs, copy atoms + register tensors) and the three printf/wurlitzer/Constexpr gotchas, so the whole foundation is reachable in one place. - 03_universal_ops: - explain the host->device tensor handoff (raw torch tensor vs from_dlpack + mark_layout_dynamic) and annotate the jit C param as fx.Tensor for consistency; - bridge nb01's '+' operator to nb03's register-tensor memref_load_vec / arith.addf / memref_store_vec compute; - stop calling UniversalFMA 'the MMA atom' (it has no real usage; MMA lowers via rocdl.MFMA) in the atom-family list and the Recap; - fix the pass name convert_fly_to_rocdl -> convert-fly-to-rocdl. Re-ran all four notebooks on a current gfx950 build: 00/01/02/03 clean, vadd matches torch, IR shows universal_copy<32>. Outputs committed cleared. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…c metadata) Acted on the substantive Copilot comments: - Close the IR-dump file reads with `with open(...)` in nb00 and nb03 (C6). - Pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR after the dump cell so the env we set doesn't linger for later cells (C4/C5) -- without the suggested try/finally, which would be defensive noise for a teaching cell. - Strip per-cell `metadata.execution` timestamps from all four notebooks so re-running doesn't churn the diff (C7); matches the outputs-cleared convention. Did not act on the false positives: the README table uses single-pipe rows (not '||'); `block=[...]` and a dynamic `n: fx.Int32` launcher arg both match the canonical examples/01-vectorAdd.py and run clean on gfx950. Re-ran all four on a current build: 00/01/02/03 clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a set of onboarding notebooks introducing the flydsl.expr foundation and a README to guide users through running them.

Changes:

Added four onboarding Jupyter notebooks (00–03) covering kernels/JIT, numeric types, structs, and universal atoms.
Added examples/notebooks/README.md with an ordered notebook index, an API cheat-sheet, and runtime instructions.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
examples/notebooks/README.md	Introduces the notebook series, provides a compact API cheat-sheet, and documents how to run notebooks.
examples/notebooks/00_hello_flydsl.ipynb	Notebook 00: explains `@flyc.kernel`/`@flyc.jit` and how to dump/inspect IR.
examples/notebooks/01_numeric_types.ipynb	Notebook 01: documents scalar type families, operations, casts, and `Constexpr`.
examples/notebooks/02_struct.ipynb	Notebook 02: introduces `@fx.struct`, layout queries, and struct usage patterns.
examples/notebooks/03_universal_ops.ipynb	Notebook 03: explains `Universal*` atoms and demonstrates a vector-add capstone.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+| # | Notebook | Topic |
+|---|----------|-------|
+| 00 | [`00_hello_flydsl.ipynb`](00_hello_flydsl.ipynb) | the `@flyc.kernel` / `@flyc.jit` model; reading dumped IR |
+| 01 | [`01_numeric_types.ipynb`](01_numeric_types.ipynb) | scalar types: ints, floats, `bf16`/`fp8`, casts, `Constexpr` |


+    "A = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",
+    "B = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",


+    "def vadd_kernel(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, block_dim: fx.Constexpr[int]):\n",
+    "    bid = fx.block_idx.x\n",
+    "    tid = fx.thread_idx.x\n",


+    "    tA = fx.logical_divide(A, fx.make_layout(block_dim, 1))\n",
+    "    tB = fx.logical_divide(B, fx.make_layout(block_dim, 1))\n",
+    "    tC = fx.logical_divide(C, fx.make_layout(block_dim, 1))\n",


+    "    fx.copy_atom_call(copy, fx.slice(tA, (None, tid)), rA)   # global -> register\n",
+    "    fx.copy_atom_call(copy, fx.slice(tB, (None, tid)), rB)\n",


+    "    vC = fx.arith.addf(fx.memref_load_vec(rA), fx.memref_load_vec(rB))\n",
+    "    fx.memref_store_vec(vC, rC)\n",
+    "\n",
+    "    fx.copy_atom_call(copy, rC, fx.slice(tC, (None, tid)))    # register -> global\n",


+    "    grid_x = (n + block_dim - 1) // block_dim\n",
+    "    vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",


+    "def vadd(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, n: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n",
+    "    block_dim = 64\n",
+    "    grid_x = (n + block_dim - 1) // block_dim\n",
+    "    vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",


jhinpan and others added 6 commits May 28, 2026 07:41

Merge branch 'main' into docs/notebooks-573-expr-foundation

35eb36f

Merge branch 'main' into docs/notebooks-573-expr-foundation

c9122fb

Merge branch 'main' into docs/notebooks-573-expr-foundation

fad0bc7

Copilot AI review requested due to automatic review settings June 3, 2026 08:04

jhinpan mentioned this pull request Jun 3, 2026

[Docs] Onboarding notebooks (1/n): expr foundation #584

Closed

3 tasks

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Merge branch 'main' into docs/notebooks-573-expr-foundation

c8a7b2c

jhinpan mentioned this pull request Jun 4, 2026

📋 FlyDSL upstream tracker — jhinpan issues & PRs jhinpan/flydsl-kernel-profiling#7

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Onboarding notebooks (1/n): expr foundation#635

[Docs] Onboarding notebooks (1/n): expr foundation#635
jhinpan wants to merge 7 commits into
ROCm:mainfrom
jhinpan:docs/notebooks-573-expr-foundation

jhinpan commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"A = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",
		"B = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",

		" fx.copy_atom_call(copy, fx.slice(tA, (None, tid)), rA) # global -> register\n",
		" fx.copy_atom_call(copy, fx.slice(tB, (None, tid)), rB)\n",

		" grid_x = (n + block_dim - 1) // block_dim\n",
		" vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",

Conversation

jhinpan commented Jun 3, 2026

Summary

Notes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants