Test for intercepting ptxas invocations and capturing PTX inputs across multiple CUDA frameworks.
git clone https://github.com/dsl-learn/cuda-magic.git
cd cuda-magicpip install cuda-tile[tileiras]
# pip install cuda-tile
pip install nvidia-cutlass-dsl
pip install tritonFor a brief overview of ptxas_wrapper.py, see the PTXAS Wrapper section in the repository root README.
In this test directory, the wrapper is used in two ways:
tritonandcutedslrun the target Python script with framework-specific dump settings and collect the generated.ptxfiles.installswaps the discoveredptxaswith a shim, which is useful for flows that callptxasdirectly, such asnvcc.
Captured PTX files are written to ./ptx_dumps/ by default, or to PTX_DUMP_DIR when that environment variable is set.
For nvcc-based flows, install requires the ptxas path explicitly (e.g. /usr/local/cuda/bin/ptxas) since nvcc may use a different ptxas than the one auto-detected.
cuda.tile
CUDA_TILE_CACHE_DIR=0 python3 test/ptx_warpper/vec_add_cutile.pyCuteDSL (cutlass DSL)
python3 ptxas_wrapper.py cutedsl test/ptx_warpper/vec_add_cutedsl.pyTriton
python3 ptxas_wrapper.py triton test/ptx_warpper/vec_add_triton.pyCUDA C++ (via nvcc + ptxas wrapper)
sudo python3 ptxas_wrapper.py install
nvcc test/ptx_warpper/vec_add_cuda.cu -o /tmp/vec_add_cuda -arch=sm_75
sudo python3 ptxas_wrapper.py uninstallCUTLASS C++
Requires the CUTLASS source tree:
git clone https://github.com/NVIDIA/cutlass.gitsudo python3 ptxas_wrapper.py install
nvcc test/ptx_warpper/vec_add_cutlass.cu -o /tmp/vec_add_cutlass \
-I./cutlass/include \
-I./cutlass/tools/util/include
sudo python3 ptxas_wrapper.py uninstall