Skip to content

cuTile.jl-related crash in tileiras #17

@maleadt

Description

@maleadt

I have a Tile IR snippet that crashes in tileiras from CTK 13.2 when using -O1 or higher. cuda-tile-translate doesn't reveal anything, so the IR looks superficially well formed.

Julia MWE (using cuTile.jl#main):

function mwe_kernel(A::ct.TileArray{Float16, 2},
                    B::ct.TileArray{Float16, 2},
                    C::ct.TileArray{Float16, 2},
                    indices::ct.TileArray{Int32, 1},
                    TM::Int, TN::Int, TK::Int)
    bid = [ct.bid](https://ct.bid/)(1)
    row_indices = ct.gather(indices, ct.arange(TM))

    acc = zeros(Float32, TM, TN)
    num_k = cld(size(A, 2), Int32(TK))

    k = Int32(1)
    while k <= num_k
        col_indices = (k - Int32(1)) * Int32(TK) .+ ct.arange(TK)
        a = ct.gather(A, (reshape(row_indices, (TM, 1)),
                          reshape(col_indices, (1, TK))))
        b = ct.load(B; index=(k, bid), shape=(TK, TN),
                    padding_mode=[ct.PaddingMode.Zero](https://ct.paddingmode.zero/))
        acc = muladd(a, b, acc)
        k += Int32(1)
    end

    c_col_indices = (bid - Int32(1)) * Int32(TN) .+ ct.arange(TN)
    ct.scatter(C, (reshape(row_indices, (TM, 1)),
                   reshape(c_col_indices, (1, TN))),
               convert(ct.Tile{Float16}, acc))
    return nothing
end

M, K, N = 128, 512, 128
A = CUDA.rand(Float16, M, K)
B = CUDA.rand(Float16, K, N)
C = CUDA.zeros(Float16, M, N)
indices = CuArray(Int32.(1:128))

ct.launch(mwe_kernel, cld(N, 128), A, B, C, indices,
          ct.Constant(128), ct.Constant(128), ct.Constant(64))

Is there anything invalid in my IR? Comparing to the one generated by cuTile Python, there are 2 token iter_values joined outside of the loop, where Python has none, but changing our codegen to avoid that doesn't work around the issue.

I'll attach the full IR: mwe.zip
It crashes as follows:

❯ cuda-tile-translate --cudatilebc-to-mlir mwe.tile
cuda_tile.module @kernels {
    # works
}

❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O1
error: failed to compile Tile IR program

❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O0
# works

For future reference: are there ways to debug this on my end? Or better validate the IR I generate?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions