A systolic array-based GEMM implementation for AMD Versal ACAP.
- CMake 3.23+
- GNU Make
- Vitis 2023.1
- C++17
- (For solver)
amplpywith Gurobi,numpy - (For figures)
matplotlib,seaborn
Files to edit in a $SRC_DIR:
parameters.hh: Set design configurationxsa.cfg: Comment/uncomment connectivity based on array dimensionsgemm.hh: Comment/uncomment inlining for AIE simulation profilingCMakeLists.txt: Set desired frequency
mkdir build && cd build
cmake .. [-DVPP_JOBS=<n>] [-DVPP_OPTIMIZE=0..3] [-DXILINX_TARGET=hw|hw_emu|sw_emu] # cmake/xilinx-setup.cmake
make -j [VERBOSE=1] gemm
[XCL_EMULATION_MODE=sw_emu|hw_emu] ./bin/gemm ./<xclbin> [DEV_IDX]To generate simulation data, run the following script:
cd $SRC_DIR
./generate_gemm_data.py [--help] -d <PM,PK,PN> -t <AM,AK,AN> -a <R,C>Currently, this script only supports a single PL tile, so set (M, K, N) = (PL_M, PL_K, PL_N) in parameters.hh.
DEF_PARTS has no effect on simulation behavior, but needs to pass static asserts w.r.t. systolic array dimensions.
xsa.cfg does not need to be updated for simulation.
To run the simulation:
cd $BUILD_DIR
make -j [VERBOSE=1] gemm-x86sim|gemm-aiesim./scripts/model.py ./scripts/model.mod [--help] -s <M,K,N>./scripts/parse_profile.py [--help] -i <profile_instr> [-f <func>] [--no-stalls]
./scripts/monitor_power.py [--help] -d <bdf>
./scripts/heatmap.py [--help] -d <aiesimulator_output,...> -f <func,...> [-o heatmap.pdf]
./scripts/plot_bar.py
./scripts/plot_misc.py
./scripts/plot_scaling.pyInstructions for reproducing Tables III-VII and Figures 6-9 are available in REPRODUCING.md.