From ab1811ef8208c3b939d0920dee98b90273f3a5db Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 31 May 2026 11:34:44 +0000 Subject: [PATCH] Upgrade llama.cpp from b9437 to b9442 --- CLAUDE.md | 2 +- CMakeLists.txt | 2 +- README.md | 2 +- docs/history/llama-cpp-breaking-changes.md | 4 ++++ 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 848657ee..b6bec371 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI. -Current llama.cpp pinned version: **b9437** +Current llama.cpp pinned version: **b9442** ## Upgrading CUDA Version diff --git a/CMakeLists.txt b/CMakeLists.txt index 090e5150..d5fe64cf 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE) FetchContent_Declare( llama.cpp GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git - GIT_TAG b9437 + GIT_TAG b9442 ) FetchContent_MakeAvailable(llama.cpp) diff --git a/README.md b/README.md index 9edbd395..b3a8b110 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ [![Lincheck](https://img.shields.io/badge/tested%20with-Lincheck-7F52FF)](https://github.com/JetBrains/lincheck) [![vmlens](https://img.shields.io/badge/tested%20with-vmlens-ff6f00)](https://vmlens.com) [![JMH](https://img.shields.io/badge/benchmarked%20with-JMH-25A162)](https://openjdk.org/projects/code-tools/jmh/) -[![llama.cpp b9437](https://img.shields.io/badge/llama.cpp-%23b9437-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9437) +[![llama.cpp b9442](https://img.shields.io/badge/llama.cpp-%23b9442-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9442) [![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml) [![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml) diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md index 7775e1f6..473dbc71 100644 --- a/docs/history/llama-cpp-breaking-changes.md +++ b/docs/history/llama-cpp-breaking-changes.md @@ -279,3 +279,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r | ~b9354–b9437 | `vendor/cpp-httplib/` | Bumped to v0.46.0: adds `Client::set_no_proxy(std::vector<std::string>)` with full hostname-suffix and IPv4/IPv6 CIDR matching; `Server::ThreadPool` constructor is exception-safe (already in v0.45.0); `Client::set_proxy()` now disconnects the held socket immediately so a later proxy change cannot reuse the old TLS session. Compiled automatically, no project changes required | | ~b9354–b9437 | `common/arg.cpp` (additive flags) | New `--spec-draft-backend-sampling` / `--no-spec-draft-backend-sampling` (env `LLAMA_ARG_SPEC_DRAFT_BACKEND_SAMPLING`) and `--skip-download` (mapped to `common_params::skip_download`). Both default-on / default-off in a way that preserves current Java behaviour. Consider exposing as `ModelParameters.setSpecDraftBackendSampling(boolean)` and `setSkipDownload(boolean)` in a follow-up — tracked under Open TODOs | | ~b9354–b9437 | `ggml/src/ggml-cuda/common.cuh` | `GGML_CUDA_USE_PDL` gating tightened: for MSVC, now requires CTK ≥ 12.3 (was 11.8) due to a compiler bug in the older Windows CUDA toolchains. Project's only CUDA build is Linux (dockcross, CUDA 13.2) so the MSVC gate has no CI impact; Windows CI builds CPU-only | +| ~b9437–b9442 | `src/llama-vocab.{h,cpp}` + `src/llama-arch.{h,cpp}` | New `LLAMA_VOCAB_PRE_TYPE_WHITESPACE = 53` and `llm_tokenizer_whitespace_session` (used by jina-v2-base-zh embeddings); new "whitespace" tokenizer_model routed as `LLAMA_VOCAB_TYPE_BPE`; new `LLM_KV_TOKENIZER_NORMALIZER_LOWERCASE` key (`tokenizer.ggml.normalizer.lowercase`) read into `llama_vocab::impl::normalizer_lowercase`; new public accessor `llama_vocab::get_normalizer_lowercase()`. All additive — existing tokenizers untouched; new whitespace + lowercase normalizer is consumed automatically when loading a GGUF that sets these vocabulary keys, no project source or Java API changes required | +| ~b9437–b9442 | `src/llama.cpp` | `llama_prepare_model_devices()` iGPU collection now appends only the FIRST `GGML_BACKEND_DEVICE_TYPE_IGPU` device (prevents duplicate iGPU registration on multi-iGPU hosts). Behavioural fix, single-line caller in `jllama.cpp` unchanged, no project source changes required | +| ~b9437–b9442 | `tools/ui/embed.cpp` + `tools/ui/src/...` (Svelte) | Webasset embedder tightened printf format specifiers (`%lu` → `%zu` and `PRIx64`); UI settings split `custom` into `customJson` + `customCss`; runtime CSS injection via ``. Project does not ship the upstream UI, no impact | +| ~b9437–b9442 | `gguf-py/`, `conversion/` (Python) | New `_set_vocab_whitespace()` helper and `add_normalizer_lowercase()` GGUF writer for the new whitespace tokenizer + lowercase normalizer keys (mirrors the vocab additions above); jina-v2 Roberta-tokenizer path now branches to whitespace when `tokenizer.json` declares a `Whitespace` pre-tokenizer. Python-side only, no impact on the Java/JNI build |