cat > /tmp/issue_tls_arch_tsel_i8_min.pto <<"EOF"
// Minimal reproducer: parser-time arch fallback reads non-A5 and mis-verifies pto.tsel(i8) as A2/A3.
//
// Repro command:
// build/tools/ptoas/ptoas --pto-arch=a5 test/repro/issue_tls_arch_tsel_i8_min.pto -o -
//
// Actual (unexpected on A5):
// error: 'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32
//
// Control:
// Add module attr: module attributes {"pto.target_arch" = "a5"} { ... }
// Then the above verifier error disappears.
module {
func.func @f0() {
%mask = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%src0 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%src1 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%tmp = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%dst = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
pto.tsel ins(%mask, %src0, %src1, %tmp : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
outs(%dst : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
return
}
// Keep >=2 functions to trigger verifier's IsolatedFromAbove parallel branch.
func.func @f1() {
%mask = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%src0 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%src1 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%tmp = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
%dst = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
pto.tsel ins(%mask, %src0, %src1, %tmp : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
outs(%dst : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
return
}
}
EOF
build/tools/ptoas/ptoas --pto-arch=a5 /tmp/issue_tls_arch_tsel_i8_min.pto -o -
Component
PTO IR verifier / parser arch dispatch
Description
When compiling textual
.ptowithout module-levelpto.target_arch, using--pto-arch=a5can still mis-verifypto.tsel(i8)as A2/A3 during parse-time verification:'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32A5 is expected to accept i8 for
pto.tsel.I built a minimal reproducer with 2
func.funcops, each containing onlyalloc_tile + pto.tsel(i8).Reproduction (minimal)
Reproducer file:
test/repro/issue_tls_arch_tsel_i8_min.ptoCommand:
build/tools/ptoas/ptoas --pto-arch=a5 test/repro/issue_tls_arch_tsel_i8_min.pto -o -Expected behavior
pto.tsel(i8)should pass verifier on A5.Actual behavior / error logs
Control experiment
If I add module attr explicitly:
module attributes {"pto.target_arch" = "a5"} { ... }Then run:
build/tools/ptoas/ptoas test/repro/issue_tls_arch_tsel_i8_min_attr.pto -o -the
A2/A3 tselverifier error disappears.Suspected root cause
getVerifierTargetArch()falls back togetPTOParserTargetArch()when module attr is not yet visible.getPTOParserTargetArch()uses thread-local storage. MLIR parser verifies immediately after parse (verifyAfterParse=true) and verifier parallelizesIsolatedFromAbovechildren. Sincefunc.funcisIsolatedFromAbove, some op verifiers run on worker threads and may read default TLS arch (Unspecified/A3), causing wrong A2/A3 dispatch.Environment
hw-native-sys/PTOASInline minimal test case
Copy-paste repro commands