Skip to content

Commit 95ccb19

Browse files
SanjanaB123abetlen
andauthored
fix(embedding): set kv_unified=True when embedding=True to enable batch processing (abetlen#2217)
* fix(embedding): set kv_unified=True when embedding=True to enable batch processing * chore: update changelog for batch embedding fix --------- Co-authored-by: abetlen <abetlen@gmail.com>
1 parent 4a1a8ec commit 95ccb19

2 files changed

Lines changed: 3 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
- fix: Enable unified KV cache for embedding contexts to preserve full per-sequence context in batch embedding calls by @SanjanaB123 in #2217
11+
1012
## [0.3.23]
1113

1214
- feat: Update llama.cpp to ggerganov/llama.cpp@7d442abf

llama_cpp/llama.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ def __init__(
402402
self.n_batch,
403403
llama_cpp.llama_max_parallel_sequences(),
404404
)
405-
405+
self.context_params.kv_unified = True
406406
self._ctx = self._stack.enter_context(
407407
contextlib.closing(
408408
internals.LlamaContext(

0 commit comments

Comments
 (0)