Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
832 commits
Select commit Hold shift + click to select a range
cf6e65b
ggml : bump version to 0.11.1 (ggml/1484)
ggerganov May 10, 2026
4730e76
sync : ggml
ggerganov May 10, 2026
54ecc9d
talk-llama : sync llama.cpp
ggerganov May 10, 2026
f6f32a7
try to fix window cublas CI failure
danbev May 11, 2026
1665885
Revert "try to fix window cublas CI failure"
danbev May 11, 2026
e0bfd3a
try using CCCL 12.4.127 with cuda 11.8.0 to fix CI failure
danbev May 11, 2026
5b2d4af
Revert "try using CCCL 12.4.127 with cuda 11.8.0 to fix CI failure"
danbev May 11, 2026
633de7f
devops : add spirv-headers to vulkan dockerfile
danbev May 12, 2026
b1ebddf
ggml-cuda : add explicit casts to -INFINITY for float and half2 types
danbev May 12, 2026
b6a4b32
ggml-cuda : add ar_add() to avoid ambiguous operator+ for half/bfloat…
danbev May 12, 2026
d04a1fa
ci : update ONEAPI version to 2025.3.3-0-devel-ubuntu24.04
danbev May 12, 2026
ea29be5
squash! ci : update ONEAPI version to 2025.3.3-0-devel-ubuntu24.04
danbev May 12, 2026
db7bcdb
Revert "ggml-cuda : add ar_add() to avoid ambiguous operator+ for hal…
danbev May 14, 2026
5a24c75
Revert "ggml-cuda : add explicit casts to -INFINITY for float and hal…
danbev May 14, 2026
dd70679
ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)
robUx4 May 10, 2026
5f08683
metal : tighten input-position loop in kernel_conv_transpose_1d (ggml…
CrispStrobe May 10, 2026
73f63f5
ggml-virtgpu : include missing mutex header (llama/22810)
olliewalsh May 10, 2026
4db2f45
Add OP im2col_3d (llama/22903)
arthw May 11, 2026
0077a6d
CUDA: directly include cuda/iterator (llama/22936)
ORippler May 11, 2026
c0c1f99
vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (llama/22589)
jeffbolznv May 11, 2026
449b33f
Ggml/cuda snake fusion hardening (llama/22912)
ServeurpersoCom May 11, 2026
287f637
CUDA: handle OW > 65535 in im2col (2D and 3D) (llama/22944)
CrispStrobe May 11, 2026
ea4652c
opencl: add q4_1 MoE for Adreno (llama/22856)
shawngu-quic May 11, 2026
8ec91c9
metal : promote mul_mv/mul_mm batch divisors to function constants (l…
guyfischman May 12, 2026
20895ab
vulkan: Check shared memory size for mmq shaders (llama/22693)
jeffbolznv May 12, 2026
be5a35c
vulkan: Fix Windows performance regression on Intel GPU BF16 workload…
rillomas May 12, 2026
a9bcbf5
ggml-webgpu: address precision issues for multimodal (llama/22808)
Constannnnnt May 12, 2026
e8a7cd3
ggml-webgpu: Enables running gpt-oss-20b (llama/22906)
yomaytk May 12, 2026
1caed1d
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (llama/22755)
happyyzy May 12, 2026
bcaf449
hexagon: eliminate scalar VTCM loads via HVX splat helpers (llama/22993)
trivikram-reddy1 May 13, 2026
8b288f5
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes …
z-sachin May 13, 2026
cb7d38b
hexagon: add unary tanh op (llama/22999)
max-krasnyansky May 13, 2026
1cbbd0b
flush the gpu profile timestamp before the queryset is overflowed (ll…
yomaytk May 13, 2026
b19beb6
opencl: fix crash when warming up MoE on Adreno (llama/22876)
lhez May 13, 2026
d4a4d87
opencl: add q5_0 and q5_1 MoE for Adreno (llama/22985)
shaofeiqi May 13, 2026
97371e9
Fix for issue #22974. Cast intermediate results to float before addin…
scutler-nv May 13, 2026
e4ce42e
ggml-webgpu: only use subgroup-matrix path when head dims are divisib…
ArberSephirotheca May 13, 2026
69500f5
sync : ggml
ggerganov May 14, 2026
46ca43d
talk-llama : sync llama.cpp
ggerganov May 14, 2026
968eebe
server: add support for carry_initial_prompt (#3781)
alubbe May 15, 2026
6227a0e
server : Return speaker information in JSON (#3782)
alubbe May 18, 2026
47b9eb3
examples : fix memory leak in read_audio_data (#3810)
petterreinholdtsen May 18, 2026
afa2ea5
whisper : set bench data for each iteration (#3812)
danbev May 19, 2026
8443cf0
ci : use github ubuntu-22.04-arm runner instead of qemu (#3815)
danbev May 21, 2026
0ccd896
common : fix server /inference fails to decode in-memory audio (regre…
ServeurpersoCom May 22, 2026
b3877e1
fix: in bindings/ruby/test/jfk_reader/jfk_reader in jfk_reader.c (#3756)
orbisai0security May 25, 2026
e414ecf
cmake : add CMakePresets.json [no ci] (#3808)
danbev May 25, 2026
06cfc36
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocat…
PMZFX May 14, 2026
97ba443
vulkan: fix matmul integer pipeline selection (llama/23005)
0cc4m May 14, 2026
f022390
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (llam…
alex-spacemit May 14, 2026
592a8cd
logs : reduce (llama/23021)
ggerganov May 14, 2026
13133ab
ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040)
ArberSephirotheca May 14, 2026
e62d589
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (llama/22880)
JohannesGaessler May 14, 2026
18a61f4
ggml-hexagon: cpy: add contiguous fast-path in reshape copy (llama/23…
pdhinaka May 14, 2026
23f956d
llama + spec: MTP Support (llama/22673)
am17an May 16, 2026
587dca0
ggml : bump version to 0.12.0 (ggml/1494)
ggerganov May 16, 2026
3583e35
ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…
Dev-X25874 May 21, 2026
e78e693
ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)
OriPekelman May 21, 2026
ef5ddec
vulkan: removed duplicate #include <memory> in headers (llama/23144)
winstonma May 16, 2026
c7dd64c
vulkan: fuse SSM_CONV + BIAS + SILU (llama/22653)
jeffbolznv May 17, 2026
e417ce7
vulkan: Support unaligned tensors for ROPE (llama/22637)
jeffbolznv May 17, 2026
50482cb
vulkan: add cpy bf16 -> f32 pipelines (llama/22677)
ServeurpersoCom May 17, 2026
9e96e0e
ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (llama/22009)
jeeb May 17, 2026
53736a3
CUDA: Continue directly including cuda/iterator (llama/23102)
ORippler May 17, 2026
4fb3cca
feat: Support d_conv=15 for ssm-conv.cu (llama/23017)
gabe-l-hart May 17, 2026
619262a
sycl: route small f32 matmuls to oneMKL, bypass oneDNN (llama/22150)
aicss-genai May 18, 2026
c65b082
sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (llama/22156)
aicss-genai May 18, 2026
0a11c9f
ggml-hexagon: add PAD op HVX kernel (llama/23078)
pdhinaka May 18, 2026
eb558f2
hexagon: add support for TRI op (llama/22822)
pdhinaka May 18, 2026
3477fdb
rpc : keep last_graph_uid in the device context (llama/23273)
rgerganov May 19, 2026
28edd0c
sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (llama/22153)
aicss-genai May 19, 2026
6090f39
ggml-webgpu : extend GDN for K>1 (llama/23299)
reeselevine May 19, 2026
459ff07
hexagon: enable support for NORM op (llama/23319)
aparmp-quic May 19, 2026
aca63e7
hexagon: add MROPE and IMROPE support in HTP rope op (llama/23317)
aparmp-quic May 19, 2026
37f1720
opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (llama/23303)
shaofeiqi May 19, 2026
0a0a342
ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (llama/23349)
ravel7524 May 20, 2026
c58fc46
metal : optimize pad + cpy (llama/23354)
ggerganov May 20, 2026
3fa1955
Programmatic Dependent Launch (PDL) for more performance on newer NVI…
aendk May 20, 2026
b93a5ba
hexagon: HMX quantized matmul rework (llama/23368)
max-krasnyansky May 20, 2026
ad717a6
vulkan: optimize operations in the IM2COL shader (llama/22685)
daniandtheweb May 20, 2026
896718e
opencl: refactor backend initilization (llama/23318)
lhez May 20, 2026
6d1d66d
hexagon: ssm-conv fix for large prompts (llama/23307)
tboinovski1 May 21, 2026
03da9f1
ggml : Check the right iface method before using the fallback 2d get …
TheBlueMatt May 21, 2026
158d93c
metal : optimize concat kernel and fix set kernel threads (llama/23411)
ggerganov May 21, 2026
c436f14
fix(flash-attn): replace f32 with kv_type and q_type (llama/23372)
Constannnnnt May 21, 2026
8402c36
vulkan: fuse snake activation (mul, sin, sqr, mul, add) (llama/22855)
ServeurpersoCom May 21, 2026
ec18355
CUDA: fix PDL CC check for JIT compilation (llama/23471)
JohannesGaessler May 21, 2026
2d62953
ggml-zendnn : add Q8_0 quantization support (llama/23414)
z-sachin May 22, 2026
6fb7f1a
SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (lla…
PMZFX May 22, 2026
0416fee
SYCL : gated_delta_net K>1 (llama/23174)
karavayev May 22, 2026
21c65a7
sycl : Level Zero detection in ggml_sycl_init (llama/23097)
sanmai May 22, 2026
b0c9f90
SYCL: improve MoE prefill throughput (llama/23142)
sanmai May 22, 2026
aefffa1
opencl: generalize Adreno MoE kernels on M (llama/23449)
shawngu-quic May 23, 2026
6b85d73
vulkan: fix windows find_package of SPIRV-Headers (llama/23215)
jeffbolznv May 23, 2026
511f860
ggml : Check the right iface method before using the fallback 2d get …
dskwe May 23, 2026
b84d034
hexagon: apply repl optimization in flash attn softmax as #22993 (lla…
njsyw1997 May 24, 2026
1435988
opencl: batch profiling to improve speed and prevent memory leaks (ll…
shaofeiqi May 24, 2026
3306af6
TP: fix entirely zero-sized slices per device (llama/23525)
JohannesGaessler May 24, 2026
a369b39
ggml : Parallelize quant LUT init (llama/23595)
jeffbolznv May 25, 2026
946d681
ggml : bump version to 0.12.1 (ggml/1508)
ggerganov May 25, 2026
0a62a57
sync : ggml
ggerganov May 25, 2026
865ec17
talk-llama : sync llama.cpp
ggerganov May 25, 2026
44a50ca
readme : add AMD ROCm/HIP GPU build instructions (#3823)
Kaihui-AMD May 25, 2026
2979e5f
ggml: `gguf_init_from_callback` and `gguf_init_from_buffer` (llama/22…
giladgd May 25, 2026
bcff515
TP: fix ggml context size calculation (llama/22616)
JohannesGaessler May 25, 2026
1cf8e3a
ggml : bump version to 0.13.0 (ggml/1510)
ggerganov May 25, 2026
f14ae77
sync : ggml
ggerganov May 25, 2026
c245b3e
benches : update
ggerganov May 25, 2026
e0fd1f6
release : v1.8.5
ggerganov May 25, 2026
27101c0
cli : merge tokens split across UTF-8 boundaries in JSON output (#3751)
texasich May 26, 2026
ee540bf
docs : add AGENTS.md and CONTRIBUTING.md [no ci] (#3826)
danbev May 27, 2026
6dcdd65
ci : only run docker jobs when pushed to master [no ci] (#3828)
danbev May 27, 2026
f6e617b
ci : set GGML_NATIVE=OFF for bindings-java (#3830)
danbev May 28, 2026
9186e24
ci : renable arm64 docker builds (#3832)
danbev May 28, 2026
f41562b
ci : add on push/pull_request paths ruby job (#3833)
danbev May 28, 2026
e47a3ee
ci : fix include paths for bindings-go job [no ci] (#3835)
danbev May 28, 2026
c932729
ci : add ignore for bindings/{ruby, go} in build.yml [no ci] (#3837)
danbev May 28, 2026
205ee5a
CUDA: add fast walsh-hadamard transform (llama/23615)
am17an May 25, 2026
1c477d4
metal : add apple device id (llama/23566)
forforever73 May 25, 2026
2307712
CUDA: missing PDL sync for FWHT, better fallback (llama/23690)
JohannesGaessler May 26, 2026
bc77933
Check batch_compute_passes before sending passes when not doing GPU p…
nikhilJain17 May 26, 2026
00a5110
ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MU…
yomaytk May 26, 2026
049f0af
SYCL: implement ggml_sycl_pool_vmm (llama/22862)
sanmai May 26, 2026
f8df28d
hexagon: add support for CONCAT op (llama/23648)
max-krasnyansky May 26, 2026
a0efd13
vulkan: optimize conv2d and implement coopmat1 support (llama/22620)
jeffbolznv May 26, 2026
6a249cd
ggml-zendnn : fixed naming of matmul function (llama/20964)
truecoder34 May 26, 2026
80e87ec
vulkan: avoid preferring transfer queue on AMD UMA devices (llama/22455)
winstonma May 27, 2026
98c6722
CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (llama/23742)
ORippler May 27, 2026
c5cde8c
vulkan: add REPEAT op support for f16 to f16. (llama/23298)
l8bloom May 27, 2026
1b590bb
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul …
jeffbolznv May 27, 2026
8bce478
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (llama/22887)
TheBlueMatt May 27, 2026
a52bd38
ggml-webgpu: Fix how to dispatch WG to some ops (llama/23750)
yomaytk May 27, 2026
3bbe933
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (llama/23647)
max-krasnyansky May 27, 2026
8c8f213
ggml-webgpu: remove legacy constants (llama/23672)
reeselevine May 27, 2026
7e843a8
opencl: OP_GATED_DELTA_NET (llama/23312)
ymcki May 28, 2026
d284e1c
Hexagon: OP_GATED_DELTA_NET K>1 support (llama/23531)
ymcki May 28, 2026
8e40325
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (llama/22841)
martin-klacer-arm May 28, 2026
60e420f
cuda : fix KQ mask offset integer overflow in fattn MMA kernel (llama…
fairydreaming May 28, 2026
5db94ba
vulkan: Fix memory logger unsafe iterator access (llama/23667)
winstonma May 28, 2026
816c302
vulkan: fix wrong index variable in inner loop (llama/23665)
winstonma May 28, 2026
b896e91
vulkan: fast path for walsh-hadamard transform (llama/23687)
jeffbolznv May 28, 2026
1b241b8
hexagon: minor refresh for HMX FA and MM (llama/23796)
max-krasnyansky May 28, 2026
04795e6
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ll…
jadenmach2 May 28, 2026
4e8af44
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for ……
yaohengxu May 28, 2026
e1faa7c
ggml: auto apply iGPU flag CUDA/HIP if integrated device (llama/23007)
fl0rianr May 28, 2026
94922ce
opencl: move backend info printing into its own function (llama/23702)
lhez May 28, 2026
442be17
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (lla…
max-krasnyansky May 28, 2026
f1b687d
meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear …
TheBlueMatt May 29, 2026
e90501e
cuda : disables launch_fattn PDL enrollment due to compiler bug (llam…
aendk May 29, 2026
cc65eb1
sync : ggml
ggerganov May 29, 2026
5828fba
talk-llama : sync llama.cpp
ggerganov May 29, 2026
92fc3f2
ggml : bump version to 0.13.1 (ggml/1523)
ggerganov May 29, 2026
f24588a
sync : ggml
ggerganov May 29, 2026
f39cc71
common : re-implement `ffmpeg-transcode.cpp` + clarify ffmpeg usage (…
ggerganov May 31, 2026
6c343e7
common : pass sample rate to `ffmpeg_decode_audio()`
ggerganov May 31, 2026
2e045a9
ci : remove obsolete self-hosted label
ggerganov May 31, 2026
099af1c
pi : add config
ggerganov May 31, 2026
fe69461
ci : fix self-hosted paths to mnt
ggerganov May 31, 2026
0dff274
ci : fix path to whisper.h in examples.yml [no ci] (#3842)
danbev Jun 1, 2026
23ee035
release : v1.8.6
ggerganov Jun 1, 2026
ef24de1
cmake : do not assume /usr/lib library installation. (#3693)
waptaff Jun 2, 2026
e5d4412
server : merge split utf-8 token text in verbose json (#3850)
lyonsno Jun 2, 2026
610e664
whisper : catch C++ exceptions in whisper_init_with_params_no_state (…
danscMax Jun 2, 2026
02d5316
ci : refactor + optimize (#3847)
ggerganov Jun 4, 2026
12d1828
ci : only publish/push docker images daily (#3854)
danbev Jun 4, 2026
9302c06
ci : use ccache instead of sccache for windows-cublas [no ci] (#3855)
danbev Jun 4, 2026
7ecb08f
ci : pin github actions to commit SHAs (#3856)
danbev Jun 4, 2026
ad17783
ci : use emscripten-core and pin version (#3857)
danbev Jun 4, 2026
99613cb
ci: build-windows action slimming (#3858)
danbev Jun 4, 2026
574fc0d
ci : add ccache to quantize, vad, and wasm jobs (#3860)
danbev Jun 6, 2026
a8ec021
ci : add HF_TOKEN to docker.yml workflow [no ci] (#3861)
danbev Jun 6, 2026
e1da83d
ci : add ccache to build-sycl [no ci] (#3859)
danbev Jun 8, 2026
c50e951
model : support for DeepseekV32ForCausalLM with generic DeepSeek Spar…
fairydreaming May 29, 2026
f7aad4e
CUDA: Check PTX version on host side to guard PDL dispatch (llama/23530)
ORippler May 29, 2026
acd91d2
ggml-webgpu: add q4_0/q8_0 SET_ROWS (llama/23760)
reeselevine May 29, 2026
9147a96
ggml-webgpu: Check earlier for WebGPU required features (llama/23879)
reeselevine May 29, 2026
4317ddb
vulkan: add Flash Attention support for BFloat16 KV cache (llama/23420)
0cc4m May 30, 2026
64b0d6b
ggml : add some lsx support (llama/23798)
MQ-mengqing May 30, 2026
bf74b55
metal : restore im2col implementation for large kernels (llama/23901)
ggerganov May 30, 2026
1c0d1f0
opencl: support bf16 by converting to f16 (llama/23839)
lhez May 30, 2026
687fbcb
sycl : Optimize Q3_K mul_mat by reorder (llama/23725)
arthw Jun 1, 2026
20323e4
Add more types in GET_ROWS OP (llama/23710)
arthw Jun 1, 2026
ec0c661
Support Q4_1, Q5_0, Q5_1 in Flash-attention (llama/23812)
arthw Jun 1, 2026
aea93ad
vulkan: Removed unused functions (llama/23175)
winstonma Jun 1, 2026
982533f
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (lla…
TheBlueMatt Jun 1, 2026
e815b26
TP: quantized KV cache support (llama/23792)
JohannesGaessler Jun 1, 2026
c471bcc
vulkan: reduce host memory lock contention (llama/23376)
winstonma Jun 1, 2026
71d80aa
vulkan: don't hold the device mutex while compiling pipelines (llama/…
jeffbolznv Jun 1, 2026
050b856
metal: template GLU kernels to support f16/f32 (llama/23882)
shrivasshankar Jun 1, 2026
e728bae
opencl: add basic support for q5_0 and q5_1 (llama/23548)
shaofeiqi Jun 1, 2026
db2a395
revert to using global_invocation_id for cpy shader (llama/23955)
yomaytk Jun 1, 2026
9a0265d
opencl: fix compiler warnings for non-adreno path (llama/23922)
lhez Jun 2, 2026
7922370
clean up unused variables warnings (llama/23975)
anavp-nvidia Jun 2, 2026
754247f
hexagon: add gelu_quick (llama/24007)
tboinovski1 Jun 2, 2026
8d61a9e
hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimiza…
max-krasnyansky Jun 2, 2026
d31cb20
hexagon: profiler output fix and script updates (llama/24042)
max-krasnyansky Jun 2, 2026
f110ff5
opencl: use flat variants of q4_K and q6_K gemv for very large M (lla…
lhez Jun 2, 2026
d5a49eb
cuda: reserve space for quantize kv-cache at startup (llama/23907)
am17an Jun 3, 2026
750fa4c
ggml-cpu: use runtime SVE width in FWHT (llama/24059)
chaxu01 Jun 3, 2026
00a9728
Avoid PDL race conditions by disabling __restrict__ when PDL is used …
aendk Jun 3, 2026
a1a3186
ggml-cpu: extend RVV quantization vec dot to higher VLENs (llama/22754)
rehan-10xengineer Jun 4, 2026
e9dbd0c
ggml-webgpu: FlashAttention refactor + standardize quantization suppo…
reeselevine Jun 4, 2026
9d6e561
metal : reduce rset heartbeat from 500ms -> 5ms (llama/24074)
ggerganov Jun 4, 2026
991b5a8
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209)
sirohikartik Jun 4, 2026
4ecede8
sycl : port multi-column MMVQ from CUDA backend (llama/21845)
masonmilby Jun 5, 2026
4fa1e06
CUDA: enroll mul_mat_vec_q_moe into pdl (llama/24087)
ORippler Jun 5, 2026
facb02c
kleidiai : dynamic chunck-based scheduling for hybrid execution (llam…
chaxu01 Jun 5, 2026
5a1feed
vulkan: add fwht support for Intel with shmem reduction (llama/23964)
0cc4m Jun 5, 2026
a87e950
opencl: improve get_rows, cpy, concat and q6_k flat gemv (llama/24160)
lhez Jun 5, 2026
1777def
vulkan: check coopmat2 features before reporting support (llama/24186)
0cc4m Jun 6, 2026
2c139c2
metal : fix im2col 1D case (audio models) (llama/24220)
ngxson Jun 8, 2026
4669631
HIP: add gfx1152 and gfx1153 to RDNA3.5 (llama/24129)
harkgill-amd Jun 8, 2026
b932ec5
sync : ggml
ggerganov Jun 8, 2026
b31466b
ggml : bump version to 0.14.0 (ggml/1533)
ggerganov Jun 8, 2026
4df9a57
sync : ggml
ggerganov Jun 8, 2026
84bd03a
talk-llama : sync llama.cpp
ggerganov Jun 8, 2026
ba57392
coreml : fix --quantize crash for mlprogram format; fix --optimize-an…
krystophny Jun 9, 2026
df7638d
ci : pin github actions to commit sha's (#3865)
danbev Jun 9, 2026
782f122
cuda: reset cuda context after reading memory size (llama/23935)
0cc4m Jun 8, 2026
fbf720d
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (llama/23…
jeffbolznv Jun 8, 2026
490e500
Implement 2D workgroups for scale, binary, and unary ops (llama/24044)
nikhilJain17 Jun 8, 2026
15e5d40
Handle buffer overlap / buffer aliasing for concat operator (llama/24…
nikhilJain17 Jun 8, 2026
aa42b48
ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul fo…
yomaytk Jun 8, 2026
e69e513
ggml-webgpu: Add clang-format job (llama/24308)
reeselevine Jun 9, 2026
72894aa
Remove case for GGML_TYPE_Q4_K in mvvq.cu (llama/23528)
ravel7524 Jun 9, 2026
2d68a30
ggml-cpu : fix rms_norm_back wrong output under in-place aliasing (ll…
devYRPauli Jun 9, 2026
28c7ed3
ggml : add GGML_OP_COL2IM_1D (llama/24206)
ServeurpersoCom Jun 9, 2026
686bc80
vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication …
0cc4m Jun 9, 2026
dc79430
vulkan: reduce iq1 shared memory usage for mul_mm (llama/24287)
jeffbolznv Jun 9, 2026
ef85b26
CUDA: Fix ssm_scan_f32 data-races (llama/24360)
ORippler Jun 10, 2026
1a1900f
Remove padding and multiple D2D copies for MTP (llama/24086)
gaugarg-nv Jun 10, 2026
a512e4c
vulkan: use medium matmul tile on Asahi Linux (llama/24306)
xingjianll Jun 11, 2026
6870cfd
vulkan: add fast path for contiguous buffer transfers (llama/23973)
winstonma Jun 11, 2026
b04008f
ggml : bump version to 0.15.0 (ggml/1539)
ggerganov Jun 11, 2026
afd5592
vulkan: ifdef eMesaHoneykrisp (build fix) (llama/24479)
jeffbolznv Jun 11, 2026
2dcfd49
opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (llama/24319)
shaofeiqi Jun 12, 2026
882736f
ggml: support concat for scalar types at cuda backend (llama/24011)
zihaomu Jun 12, 2026
f35f47b
ggml : bump version to 0.15.1 (ggml/1541)
ggerganov Jun 12, 2026
0a3fa9c
sync : ggml
ggerganov Jun 15, 2026
0ec0845
talk-llama : sync llama.cpp
ggerganov Jun 15, 2026
db5a84b
cli : add --version flag (#3878)
rumitvn Jun 16, 2026
48f628a
release : v1.8.7 (#3881)
danbev Jun 16, 2026
3805e60
ci : only trigger release jobs for tags (#3883)
danbev Jun 16, 2026
9efddaf
parakeet : add support for NVIDIA Parakeet (#3735)
danbev Jun 16, 2026
0d14756
ruby : add support for Parakeet (#3885)
KitaitiMakoto Jun 17, 2026
86c40c3
release : v1.9.0 (#3886)
danbev Jun 17, 2026
200b119
ci : add GGML_NATIVE=OFF and GGML_BMI2=OFF to windows-blas (#3891)
danbev Jun 18, 2026
f049fff
release : v1.9.1 (#3892)
danbev Jun 19, 2026
f9b9eff
Merge tag 'v1.9.1' into sync-upstream-v1.9.1
xAlcahest Jun 23, 2026
0fe8393
ci: build CUDA for Blackwell sm_120 (RTX 50) on CUDA 12.9
xAlcahest Jun 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/main-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH

COPY .. .
# Enable cuBLAS
RUN make base.en CMAKE_ARGS="-DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES='75;80;86;90'"
RUN --mount=type=secret,id=HF_TOKEN,required=false,env=HF_TOKEN make base.en CMAKE_ARGS="-DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES='75;80;86;90'"

RUN find /app/build -name "*.o" -delete && \
find /app/build -name "*.a" -delete && \
Expand Down
9 changes: 5 additions & 4 deletions .devops/main-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build
WORKDIR /app

RUN apt-get update && \
Expand All @@ -10,13 +10,14 @@ RUN apt-get update && \
COPY .. .
# Enable SYCL
ARG GGML_SYCL_F16=OFF
RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
RUN --mount=type=secret,id=HF_TOKEN,required=false,env=HF_TOKEN \
if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" \
&& export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
make base.en CMAKE_ARGS="-DGGML_SYCL=1 -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16}"

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS runtime
WORKDIR /app

RUN apt-get update && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/main-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ RUN apt-get update && \

COPY .. .
# Enable muBLAS
RUN make base.en CMAKE_ARGS="-DGGML_MUSA=1"
RUN --mount=type=secret,id=HF_TOKEN,required=false,env=HF_TOKEN make base.en CMAKE_ARGS="-DGGML_MUSA=1"

RUN find /app/build -name "*.o" -delete && \
find /app/build -name "*.a" -delete && \
Expand Down
20 changes: 20 additions & 0 deletions .devops/main-vulkan.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM ubuntu:24.04 AS build
WORKDIR /app

RUN apt-get update && \
apt-get install -y build-essential wget cmake git libvulkan-dev spirv-headers glslc \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY .. .
RUN --mount=type=secret,id=HF_TOKEN,required=false,env=HF_TOKEN make base.en CMAKE_ARGS="-DGGML_VULKAN=1"

FROM ubuntu:24.04 AS runtime
WORKDIR /app

RUN apt-get update && \
apt-get install -y curl ffmpeg libsdl2-dev wget cmake git libvulkan1 mesa-vulkan-drivers \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY --from=build /app /app
ENV PATH=/app/build/bin:$PATH
ENTRYPOINT [ "bash", "-c" ]
2 changes: 1 addition & 1 deletion .devops/main.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ RUN apt-get update && \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY .. .
RUN make base.en
RUN --mount=type=secret,id=HF_TOKEN,required=false,env=HF_TOKEN make base.en

FROM ubuntu:22.04 AS runtime
WORKDIR /app
Expand Down
22 changes: 22 additions & 0 deletions .github/actions/ccache-clear/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: "ccache-clear"
description: "Delete all GitHub Actions caches matching a key prefix"
inputs:
key:
description: "Cache key prefix to match and delete"
required: true

runs:
using: "composite"
steps:
- name: Clear caches
shell: bash
run: |
CACHES=$(gh cache list --key "ccache-${{ inputs.key }}" --json id,key --jq '.[] | "\(.id) \(.key)"' 2>/dev/null)
if [ -z "$CACHES" ]; then
echo "No caches found with key prefix: ${{ inputs.key }}"
exit 0
fi
while read -r id key; do
echo "Deleting cache: $id ($key)"
gh cache delete "$id"
done <<< "$CACHES"
8 changes: 4 additions & 4 deletions .github/workflows/bindings-go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@ on:
push:
paths:
- bindings/go/**
- whisper.h
- include/whisper.h
pull_request:
paths:
- bindings/go/**
- whisper.h
- include/whisper.h

jobs:
ubuntu-22:
runs-on: ubuntu-22.04
steps:
- uses: actions/setup-go@v5
- uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6
with:
go-version: '^1.23'
- uses: actions/checkout@v4
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
- run: |
cd bindings/go
make test
17 changes: 14 additions & 3 deletions .github/workflows/bindings-ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,19 @@ on:
push:
branches:
- master
paths:
- bindings/ruby/**
- include/whisper.h
- examples/common-whisper.h
- ggml/include/ggml.h

pull_request:
types: [opened, synchronize, reopened]
paths:
- bindings/ruby/**
- include/whisper.h
- examples/common-whisper.h
- ggml/include/ggml.h

jobs:
ubuntu-22:
Expand All @@ -14,8 +25,8 @@ jobs:
run:
working-directory: bindings/ruby
steps:
- uses: ruby/setup-ruby@v1
- uses: ruby/setup-ruby@afeafc3d1ab54a631816aba4c914a0081c12ff2f # v1.310.0
with:
ruby-version: '3.2'
- uses: actions/checkout@v4
ruby-version: '3.3'
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
- run: rake test
80 changes: 80 additions & 0 deletions .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
name: CI (android)

on:
workflow_dispatch: # allows manual triggering
push:
branches:
- master
paths: ['.github/workflows/build-android.yml',
'**/CMakeLists.txt',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
'**/*.java']

pull_request:
types: [opened, synchronize, reopened]
paths-ignore:
- 'bindings/ruby/**' # handled by bindings-ruby.yml
- 'bindings/go/**' # handled by bindings-go.yml
- 'examples/addon.node/**' # handled by examples.yml

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
android:
runs-on: ubuntu-22.04

steps:
- name: Clone
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
with:
path: whisper

- name: Install Java
uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5
with:
distribution: zulu
java-version: 21

- name: Setup Android SDK
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1

- name: Build
run: |
cd whisper/examples/whisper.android
./gradlew assembleRelease --no-daemon

- name: Build with external ggml
run: |
export PATH_TO_GGML=$PWD/ggml
cd whisper/examples/whisper.android
./gradlew assembleRelease --no-daemon

android_java:
runs-on: ubuntu-22.04

steps:
- name: Clone
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6

- name: set up JDK 11
uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5
with:
java-version: '11'
distribution: 'temurin'
cache: gradle

- name: Setup Android SDK
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1
with:
cmdline-tools-version: 9.0

- name: Build
run: |
cd examples/whisper.android.java
chmod +x ./gradlew
./gradlew assembleRelease
43 changes: 22 additions & 21 deletions .github/workflows/build-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ permissions:
contents: write

env:
CUDA_ARCHITECTURES: "75;80;86;89"
# RTX 20-50 (Turing through Blackwell). sm_120 requires CUDA Toolkit >= 12.8.
CUDA_ARCHITECTURES: "75;80;86;89;120"

jobs:
build-macos-arm64:
Expand Down Expand Up @@ -190,21 +191,21 @@ jobs:
- name: Install Ninja
run: choco install ninja -y

- name: Install CUDA Toolkit 12.4.0
- name: Install CUDA Toolkit 12.9.1
run: |
$CUDA_VERSION = "12.4.0"
$CUDA_VERSION = "12.9.1"
$CUDA_TOOLKIT_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v$CUDA_VERSION"
$CUDA_DOWNLOAD = "https://developer.download.nvidia.com/compute/cuda/redist"

# Component versions for CUDA 12.4.0
$CUDART_VER = "12.4.127"
$NVCC_VER = "12.4.131"
$NVRTC_VER = "12.4.127"
$CUBLAS_VER = "12.4.5.8"
$NVTX_VER = "12.4.127"
$PROFILER_VER = "12.4.127"
$VS_VER = "12.4.127"
$CCCL_VER = "12.4.127"
# Component versions for CUDA 12.9.1
$CUDART_VER = "12.9.79"
$NVCC_VER = "12.9.86"
$NVRTC_VER = "12.9.86"
$CUBLAS_VER = "12.9.1.4"
$NVTX_VER = "12.9.79"
$PROFILER_VER = "12.9.79"
$VS_VER = "12.9.79"
$CCCL_VER = "12.9.27"

# Create CUDA toolkit directory
New-Item -ItemType Directory -Force -Path $CUDA_TOOLKIT_DIR
Expand Down Expand Up @@ -400,24 +401,24 @@ jobs:
sudo apt-get update
sudo apt-get install -y build-essential cmake wget

- name: Install CUDA Toolkit 12.4
- name: Install CUDA Toolkit 12.9
run: |
# Download and install CUDA keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# Install minimal CUDA toolkit (compiler and libraries only, no driver)
sudo apt-get install -y cuda-toolkit-12-4
sudo apt-get install -y cuda-toolkit-12-9

# Set environment variables
echo "/usr/local/cuda-12.4/bin" >> $GITHUB_PATH
echo "CUDA_PATH=/usr/local/cuda-12.4" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH" >> $GITHUB_ENV
echo "/usr/local/cuda-12.9/bin" >> $GITHUB_PATH
echo "CUDA_PATH=/usr/local/cuda-12.9" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH" >> $GITHUB_ENV

- name: Verify CUDA installation
run: |
export PATH=/usr/local/cuda-12.4/bin:$PATH
export PATH=/usr/local/cuda-12.9/bin:$PATH
nvcc --version

- name: Setup ccache
Expand All @@ -427,16 +428,16 @@ jobs:

- name: Build whisper.cpp with CUDA
run: |
export PATH=/usr/local/cuda-12.4/bin:$PATH
export CUDA_PATH=/usr/local/cuda-12.4
export PATH=/usr/local/cuda-12.9/bin:$PATH
export CUDA_PATH=/usr/local/cuda-12.9
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DBUILD_SHARED_LIBS=OFF \
-DGGML_NATIVE=OFF \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.9/bin/nvcc \
-DCMAKE_CUDA_ARCHITECTURES="${{ env.CUDA_ARCHITECTURES }}"
cmake --build build --config Release -j $(nproc)

Expand Down
Loading
Loading