fix(embeddings): make CUDA batch size configurable for small GPUs by ASuresh0524 · Pull Request #368 · StarTrail-org/LEANN

ASuresh0524 · 2026-06-06T22:01:28Z

Fixes #344. LEANN_CUDA_BATCH_SIZE / LEANN_MPS_BATCH_SIZE / LEANN_CPU_THREADS env vars and --embedding-batch-size CLI override hardcoded defaults; cap CUDA batch by free VRAM after model load and halve on OOM retry.

Checklist

Tests pass (uv run pytest)
Code formatted (ruff format and ruff check)
Pre-commit hooks pass (pre-commit run --all-files)

Fixes #344. LEANN_CUDA_BATCH_SIZE / LEANN_MPS_BATCH_SIZE / LEANN_CPU_THREADS env vars and --embedding-batch-size CLI override hardcoded defaults; cap CUDA batch by free VRAM after model load and halve on OOM retry. Co-authored-by: Cursor <cursoragent@cursor.com>

ArtifexSystems · 2026-06-08T16:50:18Z

Tested this on the exact small-GPU setup from #344 and it works — it turns the OOM into a clean build. 👍

Setup: NVIDIA RTX A1000 Laptop GPU (4 GB), WSL2, leann-core 0.3.7, torch 2.12.0+cu130, sentence-transformers 5.5.1, model BAAI/bge-base-en-v1.5. Ran the PR head (80e83da) against a realistic corpus of 384 × 512-token chunks.

Results

Env / CLI override — unset → 256 (prior default preserved); LEANN_CUDA_BATCH_SIZE=24 → 24; an invalid value warns and falls back; LEANN_CPU_THREADS=16 honored. ✅
VRAM cap — at ~3.2 GiB free, _cap_cuda_batch_by_vram(256) → 192; LEANN_CUDA_AUTO_BATCH=0 opts out → 256. ✅
End-to-end default path (compute_embeddings(..., is_build=True), no env set): auto-capped 256 → 153, that still OOM'd, and the halving retry dropped to 76 → succeeded (384, 768). A hard torch.OutOfMemoryError became a clean build with zero manual tuning. ✅

Two small notes from the run, in case they're useful:

The VRAM heuristic is a touch optimistic for ≤4 GB. 0.35 × free ÷ ~6 MB/seq lands around 150–210 on a 4 GB card, which still OOMs here — the halving retry is what actually makes it land (at 76). The retry makes it robust regardless, but a more conservative factor (or sizing against the eager-attention batch × heads × seq² peak, which dominates the footprint) would let the common path succeed without leaning on the retry.
The cap runs even when the batch size was set explicitly. _cap_cuda_batch_by_vram applies whenever device == "cuda", so a user who deliberately passes --embedding-batch-size N (which sets adaptive_optimization=False) still gets it reduced unless they also set LEANN_CUDA_AUTO_BATCH=0. Might be friendlier to skip the auto-cap when the value came from an explicit flag.

Neither blocks anything — this is a complete fix for #344: env/CLI override and VRAM autoscale + OOM retry, with no new dependency. Once it merges I'll drop our local monkey-patch in favor of these knobs. Thanks @ASuresh0524!

ASuresh0524 · 2026-06-08T19:20:50Z

Awesome, thanks @ArtifexSystems. That helps a lot! Will incorporate these two small changes.

Address PR #368 review: use a more conservative seq^2-based VRAM estimate so small GPUs land near batch 76 without relying on OOM retry, and only auto-cap when batch size comes from adaptive defaults (not --embedding-batch-size). Co-authored-by: Cursor <cursoragent@cursor.com>

ASuresh0524 requested a review from yichuan-w June 8, 2026 05:37

Tobi-Adesoye mentioned this pull request Jun 8, 2026

CUDA embedding batch size hardcoded to 256 — OOMs on small GPUs with no override #344

Closed

ASuresh0524 removed the request for review from yichuan-w June 8, 2026 19:26

ASuresh0524 merged commit 0ddc20f into main Jun 8, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embeddings): make CUDA batch size configurable for small GPUs#368

fix(embeddings): make CUDA batch size configurable for small GPUs#368
ASuresh0524 merged 2 commits into
mainfrom
fix/cuda-embedding-batch-size-344

ASuresh0524 commented Jun 6, 2026

Uh oh!

ArtifexSystems commented Jun 8, 2026

Uh oh!

ASuresh0524 commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ASuresh0524 commented Jun 6, 2026

Checklist

Uh oh!

ArtifexSystems commented Jun 8, 2026

Uh oh!

ASuresh0524 commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants