[Frontend] Remove the dead convert_index floor/mod codegen path by YWHyuk · Pull Request #271 · PSAL-POSTECH/PyTorchSim

YWHyuk · 2026-06-24T12:22:32Z

Summary

The load/store affine-index path (convert_index and _convert_sympy_to_mlir_expr in PyTorchSimFrontend/mlir/mlir_codegen_backend.py) used to lower FloorDiv/ModularIndexing sub-expressions (view/reshape/transpose indices) into affine.apply maps with a constant divisor and a single free symbol.

This is superseded by axis-split's affine-only contract: axis_split.py strips FloorDiv/ModularIndexing from index expressions upstream at the Inductor scheduling layer (see docs/axis-split-scheduling.md), so MLIR codegen now only ever receives pure affine, constant-stride indices. The DMA-index path already asserts this.

Verification

A temporary tripwire (raise RuntimeError) placed at the entry of both floor/mod handlers (the ModularIndexing/// branches of convert_index, and the ModularIndexing/FloorDiv branches of _convert_sympy_to_mlir_expr) never fired across:

View/reshape/transpose suite: test_floormod_axis_split (group_norm c//(C/G), repeat mod, repeat_interleave floor), test_transpose2D, test_transpose3D, test_view3D_2D, test_cat
Broad sanity set: test_add, test_matmul, test_reduce, test_softmax, test_layernorm, test_batchnorm, test_conv2d

confirming the floor/mod codegen path is dead.

Change

Remove the floor/mod lowering branches. Mirroring the existing DMA-index assert, replace them with a clear NotImplementedError so any residual floor/mod that escapes axis-split fails loudly instead of being silently mis-lowered. The now-unused re import is dropped.

The full view/op test suite passes unchanged (allclose) after removal.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

… feed Skeleton + EmitC + cost/dep analysis on the frontend; the trace runtime, loader, bridge, and Core feed on the simulator; shared MLIR pass helpers and the pipeline tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

Per-record tag key in the bridge plus per-iteration tag alloc in dma-fine-grained so multi-tile-K and conv loads do not collide; strip the reduction accum marker from the memory_barrier slot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

togsim_dispatch with TILE_BEGIN/TILE_END; outline each work-item into togsim_kernel_tile. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

DMA-capacity throttle and frozen-state guard, per-core VMEM in the configs, and the SA weight-buffer throttle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

trace_timeline.py with per-work-item grouping and resource-centric DMA lanes; the trace logs the first DRAM response and the assigned systolic array, and scopes the compute barrier to its dispatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

Default to the trace path; fix uninitialized Instruction fields, the matmul accumulator wedge, fused-subtile dedup, nested/fused epilogue dataflow, and dma_wait fusion; bound concurrent dispatches to the spad, round-robin work-items within a partition, benchmark autotune and run the multi-tenant scheduler through the trace path, and emit trace.so for pooling/reduction. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

Carry simulator headers through the wrapper for cache-safe replay; drop verbose [P3-trace] logs; fix the key.mlir compile race in load(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

… runtime model Replace the trace bridge's accumulated special cases with one dataflow rule and clean up the runtime that consumes it. Dependency rule: per SRAM buffer keep a writers SET; a reader depends on all current writers (occupancy=ISSUE when both are systolic-array ops, else latency=DONE); a writer REPLACEs the set. The only exception is is_mm_accum (a matmul that reads and writes the same buffer = a commutative accumulator): skip its read edge and UNION its write, waiting only the non-matmul init seed and not ordering co-matmuls. This drops the matmul-accumulator chain that deadlocked the SA weight-slot pipeline while keeping the init->matmul edge, and lets a vector epilogue or the store wait every K matmul (fixes the pure-vector store that an empty COMPUTE_BAR let slip). Remove COMPUTE_BAR entirely: a matmul is its own DONE-handle (finish == SA drain), so the store JOINs the matmul writers directly. The whole emit/loader chain is gone -- build_skeleton, lower_to_emitc, togsim.compute_barrier, the runtime symbol, the Opcode/case/_fence_finish, and TraceRec::COMPUTE_BAR -- so a stale producer fails loudly instead of emitting records the bridge would drop. Only MEMORY_BAR remains (an async load's DONE is its data arrival, not issue). Model compute-output spad footprint in the SRAM version/capacity machinery so buffer reuse (WAR) is capacity-modeled, not a hard edge. The output size comes from the DMA records that touch the same buffer (a buf_bytes pre-pass); an in-place buffer (accumulator, relu) is version-transparent so footprint is not double-counted. The occupy gate and version release sit in the MOVIN/MOVOUT/COMP issue points (release before the COMP skip path so a skipped matmul still frees). Runtime: collapse child_inst / _pipeline_children into one event-indexed _deps[ISSUE|DONE] with add_dep(c, on) and fire(e); collapse the weight-slot release queue and the async-load wakeup into one _due_events timed-effect table drained by process_due_events. Both are behavior-preserving (byte-identical). Require the weight-slot model: sa_weight_buffer_depth must be > 0 (errors at init), and the round-robin disable mode is removed. Degenerate traces (a consumer-less preload, an unpinned matmul) hit explicit error+exit guards rather than asserts that vanish under NDEBUG. Mark the legacy ONNX TOG path deprecated: it is superseded by the trace path, so TileGraphParser logs a deprecation warning and the TORCHSIM_LEGACY_TOG=1 opt-in warns at command build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

axis-split's affine-only contract (docs/axis-split-scheduling.md) linearizes aligned FloorDiv/ModularIndexing into per-axis affine indices upstream, so the load/store index reaching codegen is already pure affine. The convert_index method (which lowered floor/mod view indices to affine.apply with a constant divisor and a single free symbol) and the helper that guarded it are therefore dead -- a tripwire at every floor/mod branch never fired across the view/op test suite. Remove convert_index entirely and inline its only live behaviour at the call sites in _convert_sympy_to_mlir_expr and parse_index_list. A residual FloorDiv/ModularIndexing now fails loudly via the whole-expression guard in each of those functions (mirroring the DMA-index assert) instead of silently mis-lowering. Also drop the assumption-stripping Symbol(str(...)) + expr.replace round-trip in the affine-map builder: it only mattered when convert_index transformed the term, and is now a verified no-op (indices and the affine string depend only on symbol names). Drop the now-unused re import. Verified: the full view/reshape/transpose suite (test_floormod_axis_split incl. group_norm/repeat/repeat_interleave/mixed-radix/pixel_shuffle, transpose2D/3D, view3D_2D, cat) plus add/matmul/reduce/softmax/layernorm/batchnorm pass unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfwHCV7TaX4s9xkn8i7anG

YWHyuk force-pushed the cleanup/remove-convert-index branch from 95f127e to f4ac374 Compare June 24, 2026 12:58

YWHyuk force-pushed the feature/togsim-cpp-trace branch from 9b913d4 to 4767e8a Compare June 24, 2026 13:16

YWHyuk force-pushed the cleanup/remove-convert-index branch from f4ac374 to b16b2d8 Compare June 24, 2026 13:29

YWHyuk and others added 10 commits June 24, 2026 22:35

[Docs] C++ trace pipeline design (runtime-tag pairing, ABI)

fd152bf

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

[TOGSim] Work-item outlining and ABI v12 dispatch

b189df4

togsim_dispatch with TILE_BEGIN/TILE_END; outline each work-item into togsim_kernel_tile. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

[TOGSim] Make the trace runtime test self-contained

76a2862

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno

YWHyuk force-pushed the feature/togsim-cpp-trace branch from b7c1ec4 to 9033945 Compare June 24, 2026 13:37

YWHyuk force-pushed the cleanup/remove-convert-index branch from b16b2d8 to aae710a Compare June 24, 2026 13:47

YWHyuk mentioned this pull request Jun 24, 2026

[Frontend] Simplify _convert_sympy_to_mlir_expr and retire the deprecated old-tog negative-coefficient affine formatting #273

Open

YWHyuk force-pushed the feature/togsim-cpp-trace branch from c166abd to ed5c747 Compare June 25, 2026 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Remove the dead convert_index floor/mod codegen path#271

[Frontend] Remove the dead convert_index floor/mod codegen path#271
YWHyuk wants to merge 11 commits into
feature/togsim-cpp-tracefrom
cleanup/remove-convert-index

YWHyuk commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

YWHyuk commented Jun 24, 2026

Summary

Verification

Change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant