Skip to content

Prepack all k blocks of Matrices A and B#20

Open
shalinib-ibm wants to merge 1 commit into
masterfrom
q4_gemm_prepack_kblocks
Open

Prepack all k blocks of Matrices A and B#20
shalinib-ibm wants to merge 1 commit into
masterfrom
q4_gemm_prepack_kblocks

Conversation

@shalinib-ibm
Copy link
Copy Markdown
Owner

Inside the 8x8 kernel, isoalate the packing and MMA Computation.

Not much performance differnce from 4.3 t/s to 4.1 t/s (llama-bench Q4 model p 128 n 1 t 1 )

Make sure to read the contributing guidelines before submitting a PR

Inside the 8x8 kernel, isoalate the packing and MMA Computation.

Not much performance differnce from 4.3 t/s to 4.1 t/s
(llama-bench Q4 model p 128 n 1 t 1 )

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
@github-actions github-actions Bot added the ggml label Oct 8, 2025
shalinib-ibm pushed a commit that referenced this pull request Mar 20, 2026
…better shader parameter handling (ggml-org#20173)

* K quant speedup (#20)

* Basic JIT compilation for mul_mat, get_rows, and scale (#17)

* scale jit working

* preliminary working jit for getrows and mulmat, needs refining

* simplified mul_mat preprocessing switch statement

* get_rows fixes, mul_mat refinement

* formatted + last edits

* removed some extraneous prints

* fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish

* small fix

* some changes, working

* get_rows and mul_mat jit fixed and working

* Update formatting

* formatting

* Add header

---------

Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Start work on all-encompassing shader library

* refactor argmax, set_rows

* Refactor all but flashattention, mat mul

* no gibberish, all k quants added, merged

* vec memory fix

* q6_k matching metal on my machine, tests passing

* Set tile size for q6_k separately

* Separate out fast shaders

---------

Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>

* Move towards writeBuffer for params

* Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups

* Remove extra file

* Formatting

---------

Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant