Skip to content

[Docs] Onboarding notebooks (1/n): expr foundation#635

Open
jhinpan wants to merge 7 commits into
ROCm:mainfrom
jhinpan:docs/notebooks-573-expr-foundation
Open

[Docs] Onboarding notebooks (1/n): expr foundation#635
jhinpan wants to merge 7 commits into
ROCm:mainfrom
jhinpan:docs/notebooks-573-expr-foundation

Conversation

@jhinpan
Copy link
Copy Markdown
Contributor

@jhinpan jhinpan commented Jun 3, 2026

Summary

First PR (1/n) toward the onboarding notebook series requested in #573. Rather than jumping straight to vector-add + Layout 101, this set builds the flydsl.expr foundation bottom-up and stops before layout algebra (deferred to a follow-up series), so the later layout material rests on solid primitives.

Four notebooks in examples/notebooks/:

# Notebook Topic
00 00_hello_flydsl the @flyc.kernel / @flyc.jit trace model; reading dumped IR (FLYDSL_DUMP_IR)
01 01_numeric_types scalar type system (ints, floats, bf16/fp8), casts, promotion, Constexpr vs runtime
02 02_struct @fx.struct aggregate value types and their C-style memory layout
03 03_universal_ops target-agnostic Universal* atoms + a fully-universal vector-add capstone (validated vs torch)

Emphasis throughout is arch-neutrality: the capstone moves data with UniversalCopy32b (no rocdl/CDNA-specific atoms), and the IR peek shows the !fly.universal_copy<32> op before it specializes in convert_fly_to_rocdl.

Notes

  • Run-verified end-to-end on an MI350X (gfx950); committed with outputs cleared for clean diffs (re-run to populate).
  • Notebooks need wurlitzer (pip install jupyter wurlitzer) to show GPU printf inline — Jupyter doesn't capture device stdout on its own. See examples/notebooks/README.md.
  • Deferred to follow-ups to keep this PR small/reviewable: nbsphinx docs rendering, an nbmake execute-check CI job, and the layout/MMA notebook series.

Test plan

  • All four notebooks execute top-to-bottom with no errors on gfx950
  • Capstone matches torch (torch.allclose)
  • Reviewer: confirm location (examples/notebooks/) and whether to wire nbmake CI now or in the follow-up

Refs #573.

🤖 Generated with Claude Code


Recreated from #584 after the original head repository was accidentally deleted; this branch is restored at the old PR head commit fad0bc7.

jhinpan and others added 6 commits May 28, 2026 07:41
Interactive, bottom-up onboarding notebooks for the flydsl.expr foundation,
bridging a newcomer to the existing examples/. This first set (1/n) covers:

- 00_hello_flydsl     - the @flyc.kernel / @flyc.jit trace model; reading dumped IR
- 01_numeric_types    - the scalar type system (ints, floats, bf16/fp8), casts,
                        promotion, and Constexpr vs runtime values
- 02_struct           - @fx.struct aggregate value types and their memory layout
- 03_universal_ops    - the target-agnostic Universal* atoms, with a fully-universal
                        vector-add capstone validated against torch

Layout algebra (make_layout / logical_divide / tiled copy / MMA) is intentionally
deferred to a follow-up series. All cells were run-verified on an MI350X (gfx950)
and committed with outputs cleared. A short README indexes the series.

Refs ROCm#573.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity)

Sharpen the series for agent consumers (fast ramp, fewer source lookups):

- README: add a flydsl.expr API cheat-sheet (kernel/jit/launch, scalars, structs,
  copy atoms + register tensors) and the three printf/wurlitzer/Constexpr gotchas,
  so the whole foundation is reachable in one place.
- 03_universal_ops:
  - explain the host->device tensor handoff (raw torch tensor vs from_dlpack +
    mark_layout_dynamic) and annotate the jit C param as fx.Tensor for consistency;
  - bridge nb01's '+' operator to nb03's register-tensor
    memref_load_vec / arith.addf / memref_store_vec compute;
  - stop calling UniversalFMA 'the MMA atom' (it has no real usage; MMA lowers via
    rocdl.MFMA) in the atom-family list and the Recap;
  - fix the pass name convert_fly_to_rocdl -> convert-fly-to-rocdl.

Re-ran all four notebooks on a current gfx950 build: 00/01/02/03 clean, vadd matches
torch, IR shows universal_copy<32>. Outputs committed cleared.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…c metadata)

Acted on the substantive Copilot comments:
- Close the IR-dump file reads with `with open(...)` in nb00 and nb03 (C6).
- Pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR after the dump cell so the env we set
  doesn't linger for later cells (C4/C5) -- without the suggested try/finally, which
  would be defensive noise for a teaching cell.
- Strip per-cell `metadata.execution` timestamps from all four notebooks so re-running
  doesn't churn the diff (C7); matches the outputs-cleared convention.

Did not act on the false positives: the README table uses single-pipe rows (not
'||'); `block=[...]` and a dynamic `n: fx.Int32` launcher arg both match the
canonical examples/01-vectorAdd.py and run clean on gfx950.

Re-ran all four on a current build: 00/01/02/03 clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 08:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a set of onboarding notebooks introducing the flydsl.expr foundation and a README to guide users through running them.

Changes:

  • Added four onboarding Jupyter notebooks (00–03) covering kernels/JIT, numeric types, structs, and universal atoms.
  • Added examples/notebooks/README.md with an ordered notebook index, an API cheat-sheet, and runtime instructions.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
examples/notebooks/README.md Introduces the notebook series, provides a compact API cheat-sheet, and documents how to run notebooks.
examples/notebooks/00_hello_flydsl.ipynb Notebook 00: explains @flyc.kernel/@flyc.jit and how to dump/inspect IR.
examples/notebooks/01_numeric_types.ipynb Notebook 01: documents scalar type families, operations, casts, and Constexpr.
examples/notebooks/02_struct.ipynb Notebook 02: introduces @fx.struct, layout queries, and struct usage patterns.
examples/notebooks/03_universal_ops.ipynb Notebook 03: explains Universal* atoms and demonstrates a vector-add capstone.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +13
| # | Notebook | Topic |
|---|----------|-------|
| 00 | [`00_hello_flydsl.ipynb`](00_hello_flydsl.ipynb) | the `@flyc.kernel` / `@flyc.jit` model; reading dumped IR |
| 01 | [`01_numeric_types.ipynb`](01_numeric_types.ipynb) | scalar types: ints, floats, `bf16`/`fp8`, casts, `Constexpr` |
Comment on lines +151 to +152
"A = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",
"B = torch.randint(0, 10, (n,), dtype=torch.float32).cuda()\n",
Comment on lines +113 to +115
"def vadd_kernel(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, block_dim: fx.Constexpr[int]):\n",
" bid = fx.block_idx.x\n",
" tid = fx.thread_idx.x\n",
Comment on lines +119 to +121
" tA = fx.logical_divide(A, fx.make_layout(block_dim, 1))\n",
" tB = fx.logical_divide(B, fx.make_layout(block_dim, 1))\n",
" tC = fx.logical_divide(C, fx.make_layout(block_dim, 1))\n",
Comment on lines +133 to +134
" fx.copy_atom_call(copy, fx.slice(tA, (None, tid)), rA) # global -> register\n",
" fx.copy_atom_call(copy, fx.slice(tB, (None, tid)), rB)\n",
" vC = fx.arith.addf(fx.memref_load_vec(rA), fx.memref_load_vec(rB))\n",
" fx.memref_store_vec(vC, rC)\n",
"\n",
" fx.copy_atom_call(copy, rC, fx.slice(tC, (None, tid))) # register -> global\n",
Comment on lines +145 to +146
" grid_x = (n + block_dim - 1) // block_dim\n",
" vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",
"def vadd(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, n: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n",
" block_dim = 64\n",
" grid_x = (n + block_dim - 1) // block_dim\n",
" vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants