Skip to content

[Bug] --pto-arch=a5 may mis-verify pto.tsel(i8) as A2/A3 during parse-time verifier #487

@Zhendong404

Description

@Zhendong404

Component

PTO IR verifier / parser arch dispatch

Description

When compiling textual .pto without module-level pto.target_arch, using --pto-arch=a5 can still mis-verify pto.tsel(i8) as A2/A3 during parse-time verification:

'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32

A5 is expected to accept i8 for pto.tsel.

I built a minimal reproducer with 2 func.func ops, each containing only alloc_tile + pto.tsel(i8).

Reproduction (minimal)

Reproducer file:

test/repro/issue_tls_arch_tsel_i8_min.pto

Command:

build/tools/ptoas/ptoas --pto-arch=a5 test/repro/issue_tls_arch_tsel_i8_min.pto -o -

Expected behavior

pto.tsel(i8) should pass verifier on A5.

Actual behavior / error logs

loc("test/repro/issue_tls_arch_tsel_i8_min.pto":21:5): error: 'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32
loc("test/repro/issue_tls_arch_tsel_i8_min.pto":34:5): error: 'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32
Error: Failed to parse MLIR.

Control experiment

If I add module attr explicitly:

module attributes {"pto.target_arch" = "a5"} { ... }

Then run:

build/tools/ptoas/ptoas test/repro/issue_tls_arch_tsel_i8_min_attr.pto -o -

the A2/A3 tsel verifier error disappears.

Suspected root cause

getVerifierTargetArch() falls back to getPTOParserTargetArch() when module attr is not yet visible.

getPTOParserTargetArch() uses thread-local storage. MLIR parser verifies immediately after parse (verifyAfterParse=true) and verifier parallelizes IsolatedFromAbove children. Since func.func is IsolatedFromAbove, some op verifiers run on worker threads and may read default TLS arch (Unspecified/A3), causing wrong A2/A3 dispatch.

Environment

  • Repo: hw-native-sys/PTOAS
  • Local branch: feature-vpto-backend
  • Local commit: f4c679a
  • Tool: gh version 2.45.0 (2025-07-18 Ubuntu 2.45.0-1ubuntu0.3)

Inline minimal test case

Copy-paste repro commands

cat > /tmp/issue_tls_arch_tsel_i8_min.pto <<"EOF"
// Minimal reproducer: parser-time arch fallback reads non-A5 and mis-verifies pto.tsel(i8) as A2/A3.
//
// Repro command:
//   build/tools/ptoas/ptoas --pto-arch=a5 test/repro/issue_tls_arch_tsel_i8_min.pto -o -
//
// Actual (unexpected on A5):
//   error: 'pto.tsel' op expects A2/A3 tsel src0, src1, and dst element type to be i16/i32/f16/f32
//
// Control:
//   Add module attr: module attributes {"pto.target_arch" = "a5"} { ... }
//   Then the above verifier error disappears.

module {
  func.func @f0() {
    %mask = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %src0 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %src1 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %tmp  = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %dst  = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>

    pto.tsel ins(%mask, %src0, %src1, %tmp : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
             outs(%dst : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
    return
  }

  // Keep >=2 functions to trigger verifier's IsolatedFromAbove parallel branch.
  func.func @f1() {
    %mask = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %src0 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %src1 = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %tmp  = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
    %dst  = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>

    pto.tsel ins(%mask, %src0, %src1, %tmp : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=i8, rows=1, cols=64, v_row=1, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
             outs(%dst : !pto.tile_buf<loc=vec, dtype=i8, rows=2, cols=128, v_row=2, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
    return
  }
}
EOF
build/tools/ptoas/ptoas --pto-arch=a5 /tmp/issue_tls_arch_tsel_i8_min.pto -o -

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions