Skip to content

Q8 Matmul MMA+ Kernels Implementation#23

Open
shalinib-ibm wants to merge 5 commits into
masterfrom
p12_mma_plus
Open

Q8 Matmul MMA+ Kernels Implementation#23
shalinib-ibm wants to merge 5 commits into
masterfrom
p12_mma_plus

Conversation

@shalinib-ibm
Copy link
Copy Markdown
Owner

@shalinib-ibm shalinib-ibm commented Apr 22, 2026

Overview

This branch conatins matmul kernles for p12 MMA+ for Q8 data type.

Additional information

Currently the below kernels have been implemented and tested for functional correctness.
8x4
8x8
8x16
16x8
16x16

Steps to run and verify the kernels

Building llama.cpp with MMA+ gcc compiler .

Gcc compiler needs to be built from ibm/mmaplus branch and llama.cpp needs to be built with -O0 ( since -O3 with mma+ compiler gives functionally incorrect matrix multiplication results)

export PATH=/home/shalini/gcc-mma+/gcc/install/mmaplus-ppc64le/bin:$PATH
cd /home/shalini/llama_5_3_26/llama.cpp/
cmake -B build-gcc-mma+ -DCMAKE_C_COMPILER=//home/shalini/gcc-mma+/gcc/install/mmaplus-ppc64le/bin/gcc -DCMAKE_CXX_COMPILER=/home/shalini/gcc-mma+/gcc/install/mmaplus-ppc64le/bin/g++ -DCMAKE_C_FLAGS="-O0" -DCMAKE_CXX_FLAGS="-O0"
vim ggml/src/ggml-cpu/CMakeLists.txt -> change mcpu=power10 to -mpcu=future
export LD_LIBRARY_PATH=/home/shalini/gcc-mma+/gcc/install/mmaplus-ppc64le/bin/:$LD_LIBRARY_PATH
cmake --build build-gcc-mma+/ -j10

Follow the above steps on trout-lp1 machine . copy the libs to bohr machine and run the binary from tcl script to simulate in mambo p12 environment.

root and others added 3 commits April 22, 2026 07:46
add -mpcu=future under P9 condition in CMakeLists.txt
Add 16x16 kernel with P12 mma+ builtins.
Modified 8x8 kernel to use P12 mma+ builtins.
Modified 8x4 kernel to use P12 mma+ builtins

TO DO:
Create a saperate class for Q8 as P12 MMA+ only supports
INT8 and not INT4.

Signed-off-by: root <root@trout-lp1.rch.stglabs.ibm.com>
Also, print inpyts matrices for 8x4 kernel.
Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
8x8
8x16
16x8
16x16

Debug: Print accumulator content
Testcase: Make m n k cli arguments

Signed-off-by: root <root@trout-lp1.rch.stglabs.ibm.com>
@shalinib-ibm shalinib-ibm changed the title Q8 Matmul MMA Kernels Implementation Q8 Matmul MMA+ Kernels Implementation Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant