Skip to content

Commit 194d5ff

Browse files
abhinadduriclaude
andcommitted
fix: handle control pool smaller than n_samples in _sample_consecutive_controls
When n_samples > pool_size (e.g., observational data with rare cell types having only 2 control cells but sentence_len=64), the old tail+head wrap only wrapped once, returning fewer elements than requested. This caused IndexError in __getitems__ during multi-worker DataLoader training. Use modular arithmetic to wrap around the pool as many times as needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 28b7bb3 commit 194d5ff

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

src/cell_load/mapping_strategies/random.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,7 @@ def _sample_consecutive_controls(
276276
if start + n_samples <= pool_size:
277277
return np.array(pool[start : start + n_samples], dtype=np.int64)
278278

279-
tail = pool[start:]
280-
head = pool[: n_samples - len(tail)]
281-
return np.array(tail + head, dtype=np.int64)
279+
# Wrap around the pool as many times as needed
280+
pool_arr = np.array(pool, dtype=np.int64)
281+
indices = np.arange(start, start + n_samples) % pool_size
282+
return pool_arr[indices]

0 commit comments

Comments
 (0)