Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.

Current llama.cpp pinned version: **b9437**
Current llama.cpp pinned version: **b9442**

## Upgrading CUDA Version

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
FetchContent_Declare(
llama.cpp
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
GIT_TAG b9437
GIT_TAG b9442
)
FetchContent_MakeAvailable(llama.cpp)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![Lincheck](https://img.shields.io/badge/tested%20with-Lincheck-7F52FF)](https://github.com/JetBrains/lincheck)
[![vmlens](https://img.shields.io/badge/tested%20with-vmlens-ff6f00)](https://vmlens.com)
[![JMH](https://img.shields.io/badge/benchmarked%20with-JMH-25A162)](https://openjdk.org/projects/code-tools/jmh/)
[![llama.cpp b9437](https://img.shields.io/badge/llama.cpp-%23b9437-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9437)
[![llama.cpp b9442](https://img.shields.io/badge/llama.cpp-%23b9442-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9442)
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)

Expand Down
4 changes: 4 additions & 0 deletions docs/history/llama-cpp-breaking-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,3 +279,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
| ~b9354–b9437 | `vendor/cpp-httplib/` | Bumped to v0.46.0: adds `Client::set_no_proxy(std::vector<std::string>)` with full hostname-suffix and IPv4/IPv6 CIDR matching; `Server::ThreadPool` constructor is exception-safe (already in v0.45.0); `Client::set_proxy()` now disconnects the held socket immediately so a later proxy change cannot reuse the old TLS session. Compiled automatically, no project changes required |
| ~b9354–b9437 | `common/arg.cpp` (additive flags) | New `--spec-draft-backend-sampling` / `--no-spec-draft-backend-sampling` (env `LLAMA_ARG_SPEC_DRAFT_BACKEND_SAMPLING`) and `--skip-download` (mapped to `common_params::skip_download`). Both default-on / default-off in a way that preserves current Java behaviour. Consider exposing as `ModelParameters.setSpecDraftBackendSampling(boolean)` and `setSkipDownload(boolean)` in a follow-up — tracked under Open TODOs |
| ~b9354–b9437 | `ggml/src/ggml-cuda/common.cuh` | `GGML_CUDA_USE_PDL` gating tightened: for MSVC, now requires CTK ≥ 12.3 (was 11.8) due to a compiler bug in the older Windows CUDA toolchains. Project's only CUDA build is Linux (dockcross, CUDA 13.2) so the MSVC gate has no CI impact; Windows CI builds CPU-only |
| ~b9437–b9442 | `src/llama-vocab.{h,cpp}` + `src/llama-arch.{h,cpp}` | New `LLAMA_VOCAB_PRE_TYPE_WHITESPACE = 53` and `llm_tokenizer_whitespace_session` (used by jina-v2-base-zh embeddings); new "whitespace" tokenizer_model routed as `LLAMA_VOCAB_TYPE_BPE`; new `LLM_KV_TOKENIZER_NORMALIZER_LOWERCASE` key (`tokenizer.ggml.normalizer.lowercase`) read into `llama_vocab::impl::normalizer_lowercase`; new public accessor `llama_vocab::get_normalizer_lowercase()`. All additive — existing tokenizers untouched; new whitespace + lowercase normalizer is consumed automatically when loading a GGUF that sets these vocabulary keys, no project source or Java API changes required |
| ~b9437–b9442 | `src/llama.cpp` | `llama_prepare_model_devices()` iGPU collection now appends only the FIRST `GGML_BACKEND_DEVICE_TYPE_IGPU` device (prevents duplicate iGPU registration on multi-iGPU hosts). Behavioural fix, single-line caller in `jllama.cpp` unchanged, no project source changes required |
| ~b9437–b9442 | `tools/ui/embed.cpp` + `tools/ui/src/...` (Svelte) | Webasset embedder tightened printf format specifiers (`%lu` &#x2192; `%zu` and `PRIx64`); UI settings split `custom` into `customJson` + `customCss`; runtime CSS injection via `<svelte:head>`. Project does not ship the upstream UI, no impact |
| ~b9437–b9442 | `gguf-py/`, `conversion/` (Python) | New `_set_vocab_whitespace()` helper and `add_normalizer_lowercase()` GGUF writer for the new whitespace tokenizer + lowercase normalizer keys (mirrors the vocab additions above); jina-v2 Roberta-tokenizer path now branches to whitespace when `tokenizer.json` declares a `Whitespace` pre-tokenizer. Python-side only, no impact on the Java/JNI build |
Loading