feat: Hexagonal HNSW — diversity heuristic, A2 codebook, sub-centroid signs, multi-segment fix#36
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughIntroduces TurboQuant4A2 (A2 hexagonal paired-dimension quantization), heuristic HNSW neighbor selection, MVCC global_id/key_hash tracking with on-disk v2 headers, compaction changes to preserve global IDs, many TurboQuant encoder/decoder additions, and new mixed insert/search benchmark scripts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoHexagonal HNSW with diversity heuristic, A2 lattice quantization, and critical bug fixes
WalkthroughsDescription• **Diversity heuristic for HNSW neighbor selection** — Implements Algorithm 4 from Malkov & Yashunin 2018 to replace simple nearest-M selection, accepting candidates only if closer to query than to already-selected neighbors • **A2 hexagonal lattice quantization (TQ4A2)** — New 16-cell hexagonal quantizer with density-adapted 1+6+6+3 ring centroid layout for paired-dimension encoding, achieving 2x compression vs scalar TQ4 • **Sub-centroid signs at insert time** — Computes sign bits from pre-quantization FWHT values during encoding, doubling effective quantization resolution from 16 to 32 levels in HNSW beam search, now working in Light mode • **Global vector ID collision fix** — Critical bug: each compacted segment independently assigned VectorIds 0..N-1, causing collisions across compaction cycles. Fixed by adding global_id and key_hash to MvccHeader with base offset tracking (without fix: recall drops to 9.7%) • **TQ-ADC graph construction fix** — Major bug: Light mode built HNSW graphs using asymmetric TQ-ADC pairwise distance instead of decoded-centroid symmetric L2, causing poor topology at 384d+ (without fix: 41% recall) • **Sub-centroid signs in Light mode fix** — Major bug: Light mode couldn't compute signs due to lack of raw f32 vectors. Fixed by computing signs at insert time from pre-quantization FWHT values (without fix: 92.7% vs 96.9% recall) • **Parallel compaction infrastructure** — Implemented cell-based parallel HNSW graph construction with spatial grid partitioning and cross-cell boundary stitching (currently disabled pending PCA-based partitioning) • **Benchmark improvements** — Recall@10 improved from 92.5% to 96.9% (+4.4%), insert throughput increased 2.4x to 71,351 v/s, compact time reduced 40% to 3.9s • **Mixed workload benchmarks** — Added bench-mixed-workload.py (5-phase simulation) and bench-mixed-1k-compact.py (multi-compaction stress test) for comprehensive performance validation Diagramflowchart LR
A["Insert Vector"] -->|"compute signs at insert time"| B["Sub-centroid Signs"]
A -->|"paired-dimension encoding"| C["A2 Lattice Quantization"]
B --> D["HNSW Graph Construction"]
C --> D
D -->|"diversity heuristic"| E["Neighbor Selection"]
E --> F["HNSW Index"]
G["Compaction Cycle"] -->|"assign global_id_base"| H["Global ID Tracking"]
H --> I["Immutable Segment"]
I -->|"remap to global IDs"| J["Search Results"]
F --> J
File Changes1. src/vector/segment/compaction.rs
|
Code Review by Qodo
1.
|
|
@coderabbitai review |
Phase 74: Hexagonal HNSW improvements to vector engine - Plan 01 (W1): Diversity heuristic neighbor selection (Algorithm 4) - Plan 02 (W1): A2 hexagonal lattice codebook + quantization - Plan 03 (W2): Cell-parallel compaction with stitching - Plan 04 (W3): TQ4A2 config wiring + backward compat
…Quant - 16 density-optimized hex centroids for bivariate N(0, sigma^2) - quantize_pair/decode_pair with brute-force nearest of 16 (fast at 16 comparisons) - Lloyd refinement for runtime centroid optimization - Distortion test proves A2 MSE < scalar 2-bit MSE at same 4-bit budget - Byte layout: padded_dim/4 bytes per code (one nibble per dimension pair) - centroids_x/centroids_y accessors for ADC lookup tables - 9 tests: roundtrip, distortion, scaling, cell reachability, convergence
Replace select_neighbors_simple with select_neighbors_heuristic from Malkov & Yashunin 2018. Candidates accepted only if closer to query than to all already-selected neighbors, ensuring angular diversity. - select_neighbors_heuristic with keepPrunedConnections - Updated insert() and add_neighbor_with_prune() to use heuristic - SmallVec for zero-alloc combined buffer in prune path - brute_force_knn oracle for recall testing - Tests: collinear rejection, diverse selection, keepPruned fill, BFS connectivity (1000 nodes), recall@10 >= 0.85 on 4-cluster data
- test_assign_to_cells_partitions_all_vectors: verifies cell assignment covers all indices - test_parallel_compact_bfs_reaches_all: BFS connectivity on 500-vector parallel graph - test_compact_parallel_recall: recall@10 >= 0.90 with brute-force L2 oracle - test_stitch_cross_cell_edges: stitched graph fully connected across cells - brute_force_knn helper for recall oracle - Stub functions: assign_to_cells, compact_parallel, stitch_subgraphs
…ning and stitching - assign_to_cells: grid-based spatial partitioning on first 2 f32 coordinates - compact_parallel: std::thread::scope builds HNSW sub-graphs per cell in parallel - stitch_subgraphs: merges sub-graphs with bidirectional cross-cell boundary edges using centroid-based boundary detection and farthest-replacement insertion - add_neighbor_to_flat: farthest-replacement for full neighbor lists ensures connectivity - Dispatch in compact(): parallel path when has_raw && n >= PARALLEL_THRESHOLD (10K) - BFS connectivity verified on 500-vector test, recall@10 >= 0.90 on 1000-vector test - Zero unsafe blocks, no rayon dependency, zero clippy warnings - All 364 vector tests pass, all 10 compaction tests pass
- Add QuantizationConfig::TurboQuant4A2 variant (repr=6) for hexagonal A2 lattice - Implement encode_tq_mse_a2: paired-dimension A2 encoding via A2Codebook - Implement decode_tq_mse_a2: inverse path for roundtrip verification - Override code_bytes_per_vector for A2: padded_dim/4 (2x more compressed than TQ4) - Update persistence serialization (segment_io.rs) for TQ4A2 - Update FT.CREATE parser to accept TQ4A2 quantization option - Add 12 tests: byte length, roundtrip MSE, backward compat, norm preservation
- Update MutableSegment::append to dispatch A2 encoding for TQ4A2 collections - Fix bytes_per_code to use collection.code_bytes_per_vector() (dimension-adaptive) - Add A2 decode path in compaction all_rotated (pair-based nibble decode) - Add A2 L2 fallback in HNSW build when raw f32 unavailable - Update sub-centroid sign computation for A2 pair layout - Guard codebook_16() calls with is_a2 check (placeholder for A2 search path) - Update QJL and residual norm recomputation for A2 decode - All 376 vector tests pass, zero clippy warnings
Moon 20.5x faster insert (57K vs 2.8K vec/s), higher recall than Redis (0.594 vs 0.462). Diversity heuristic adds +3.4% build cost for +15-30% recall improvement. Zero search regression.
The diversity heuristic's inter-neighbor distance comparisons amplify TQ-ADC quantization noise, especially at high dimensions (768d+) where quantization error is proportionally larger. This caused the heuristic to over-reject good candidates, producing poorly connected graphs. Fix: add use_heuristic flag to HnswBuilder. Set to false when building with TQ-ADC fallback (Light mode). Exact mode (f32 L2) still uses the heuristic since distance comparisons are accurate. Root cause of low recall at 768d/10K is actually the codebook sigma mismatch (1/sqrt(padded_dim) vs 1/sqrt(original_dim)) and zero-padding overhead — a pre-existing issue not introduced by Phase 74.
…ning Two fixes for recall regression in Light mode: 1. Replace TQ-ADC pairwise distance with decoded centroid symmetric L2 for HNSW graph construction. TQ-ADC's asymmetric noise caused poor graph topology at 384d+ (0.41 recall → 0.71 random, 0.93 MiniLM). 2. Scale ef_search auto formula with dimension: base=k*20, +50% at 384d, +100% at 768d. Higher beam width compensates for TQ distance noise. 3. Add use_heuristic flag to HnswBuilder — diversity heuristic only used with accurate distances (f32 L2), disabled for noisy distance functions. Benchmark (10K MiniLM 384d, Light mode): Recall@10: 0.9265 (was 0.59 with TQ-ADC build + random data) Insert: 92K vec/s | Search: 1250 QPS | Compact: 3.8s
Compute sub-centroid sign bits during TQ encoding (at insert time) from actual pre-quantization FWHT values. Previously, Light mode could only compute signs from decoded centroids (always 1 = useless). The fix: - encode_tq_mse_scaled_with_signs: new encoder that returns TqCodeWithSigns - MutableSegment stores signs alongside TQ codes (128 bytes/vec at 384d) - FrozenSegment carries signs to compaction - Compaction remaps signs to BFS order for ImmutableSegment - ImmutableSegment::search uses 32-level LUT (was 16-level) This doubles effective quantization resolution from 16 to 32 levels in the HNSW beam search, cutting distance error roughly in half. Benchmark (10K MiniLM 384d, Light mode): Recall@10: 0.9690 (was 0.9265, +4.25%) Insert: 77K vec/s (was 92K, -16% from sign computation) QPS: 1127 (was 1250, -10% from 32-entry LUT) Compact: 4.1s (was 3.8s)
5-phase workload simulating real production patterns: Phase 1: Bulk load 1K vectors Phase 2: Steady-state (insert 10 + search 5) × 900 turns Phase 3: Search-heavy (insert 1 + search 20) Phase 4: Burst insert + search Phase 5: Final recall measurement Tests recall during active ingestion, compaction interference, and latency stability under mixed load.
Root cause: each compacted immutable segment assigned VectorIds 0..N-1 independently. When multiple segments exist (compact_threshold=1000 with 10K vectors = 10 segments), search results from different segments had colliding IDs (vec:42 in segment 0 ≠ vec:42 in segment 5). The merge picked the closest distance but returned the wrong document name. Fix: - MvccHeader gains global_id (globally unique sequential number) - MutableSegment tracks global_id_base offset, incremented on compaction - ImmutableSegment::remap_to_global_ids() converts per-segment IDs - Mutable brute-force search also uses global_id_base + internal_id Benchmark (10K MiniLM, compact_threshold=1000, 10 compaction cycles): Before: steady recall 0.308, final 0.097 (IDs colliding across segments) After: steady recall 0.972, final 0.969 (global IDs, correct merge) Also adds bench-mixed-1k-compact.py simulation script.
…call Cell-parallel compaction partitions vectors by their first 2 coordinates, which carries <0.5% of variance at 384d+. This produces essentially random partitions, and the 3-edge stitching cannot recover graph quality. Result: Exact mode at 10K vectors got 68.7% recall (should be 96.9%). Disabling compact_parallel restores Exact to 96.85% — matching Light. The parallel compaction concept is sound but needs PCA-based or random partitioning instead of coordinate-based. Disabled until fixed.
1. Remove .expect() in compaction.rs — use match/if-let for codebook access, filter_map for thread join (no panics in library code) 2. Remove .expect() in mutable.rs — use match on (is_a2, a2_cb, centroids) triple for decoded path selection 3. Fix A2 search panic — add decoded-vector symmetric L2 fallback for TQ4A2 brute-force search. Never calls query_state.unwrap() for A2. Builds A2Codebook once per search, decodes pairs inline. 4. Fix filter bitmap ID mismatch — PayloadIndex now uses global_id (global_id_base + internal_id) instead of per-segment internal_id. Mutable filtered search checks bitmap against global_id. 5. Persist global_id and key_hash — mvcc_headers.bin v2 format writes 32 bytes/header (was 20). Backward-compatible reader detects v1 by checking first byte != 2. 6. Fix sub-sign remap — use direct live_entries[orig_id] indexing instead of .position(|e| e.internal_id == orig_id) which can select wrong entry when internal_ids don't match builder sequence. 7. Fix prune buffer overflow — increase combined_buf from 65 to 129 elements (M0 can be up to 128 when M=64).
10b5cd9 to
3ddb7c5
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/vector/segment/mutable.rs (1)
458-458:⚠️ Potential issue | 🟡 Minor
unwrap()used in library code violates coding guidelines.Line 458 calls
query_state.unwrap()which can panic. Per coding guidelines, library code should use pattern matching orif letinstead ofunwrap().The surrounding logic already handles the
Nonecase viause_tq_adc, so this should be unreachable, but the explicitunwrap()is still a violation.Proposed fix
let dist = if use_tq_adc { tq_l2_adc_scaled(&q_rotated, tq_code, entry.norm, centroids) - } else { - let qs = query_state.unwrap(); + } else if let Some(qs) = query_state { let qjl_bpv = inner.qjl_bytes_per_vec; let qjl_offset = id * qjl_bpv; let qjl_signs = &inner.qjl_signs[qjl_offset..qjl_offset + qjl_bpv]; let residual_norm = inner.residual_norms[id]; let single_qjl_bpv = (dim + 7) / 8; crate::vector::turbo_quant::inner_product::score_l2_prod( qs, tq_code, entry.norm, qjl_signs, residual_norm, centroids, dim, single_qjl_bpv, ) + } else { + // Unreachable: non-A2 with Light mode always uses TQ-ADC + continue; };Based on learnings: "No
unwrap()orexpect()in library code outside tests; use pattern matching orif let"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vector/segment/mutable.rs` at line 458, Replace the direct unwrap() call on query_state with explicit pattern matching: instead of let qs = query_state.unwrap(); match on query_state (or use if let Some(qs) = query_state { ... } else { unreachable!("query_state must be Some when use_tq_adc is true") }) so the code does not call unwrap() in library code; reference the symbols query_state and use_tq_adc in your change to keep the existing control flow and document the unreachable branch with a clear message.
🧹 Nitpick comments (2)
src/vector/segment/compaction.rs (2)
501-502: Redundant variable re-declarations shadow earlier definitions.
has_rawanddimare re-declared on lines 501-502, shadowing the identical declarations on lines 450-451. While Rust allows this, it's confusing and suggests incomplete refactoring.Suggested cleanup
- let has_raw = !frozen.raw_f32.is_empty(); - let dim = frozen.dimension as usize; - let live_f32: Vec<&[f32]> = if has_raw && need_cpu_build {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vector/segment/compaction.rs` around lines 501 - 502, The two later let bindings `let has_raw = !frozen.raw_f32.is_empty();` and `let dim = frozen.dimension as usize;` are redundant and shadow earlier identical bindings; remove these re-declarations in src/vector/segment/compaction.rs and reuse the previously declared `has_raw` and `dim` variables (or, if different semantics were intended, rename the new bindings to a distinct name and adjust their usage accordingly), ensuring all uses of `has_raw` and `dim` in functions like the compaction logic reference the single, correctly-typed variables derived from `frozen.raw_f32` and `frozen.dimension`.
444-445: Unused variables_codebookand_code_lenshould be removed.Lines 444-445 define
_codebookand_code_lenwith underscore prefixes but are never used. They were likely left over from refactoring.code_lenis re-declared correctly on line 499.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vector/segment/compaction.rs` around lines 444 - 445, Remove the two unused temporary variables by deleting the declarations of `_codebook` (from collection.codebook_16()) and `_code_len` (computed as bytes_per_code - 4); these are vestigial after refactoring and duplicate the correctly declared `code_len` later, so simply remove the `_codebook` and `_code_len` lines to avoid unused-variable warnings and keep `collection.codebook_16()` unused unless actually needed elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/bench-mixed-1k-compact.py`:
- Around line 114-122: The current loop quietly swallows all exceptions when
parsing IDs from the Redis response (variables result, raw, ids), which can hide
format changes; replace the broad except Exception: pass with targeted exception
handling (e.g., except (ValueError, IndexError, UnicodeDecodeError,
AttributeError) as e:) and re-raise a new exception or call sys.exit with a
clear message that includes the offending payload (raw and/or result) and the
original exception to abort the run; apply the same change to the other
identical blocks handling FT.SEARCH/VSIM responses so any unparseable ID causes
the benchmark to fail loudly instead of silently dropping IDs.
- Around line 41-57: The benchmark reuses fixed Redis names ("idx", key prefix
"doc:", set "vecset", and keys like "vec:*") causing FT.CREATE conflicts and
leftover state between runs; update run_moon (and other routines that create the
index/keys) to generate a run-specific namespace (e.g., append a unique
run_id/timestamp to the index name, prefix, and set/key names) or perform an
explicit teardown before creating resources so every execution is isolated;
ensure all usages of "idx", "doc:", "vecset", and "vec:*" in the script are
replaced to consistently use the per-run namespace or are cleaned up prior to
FT.CREATE.
In `@scripts/bench-mixed-workload.py`:
- Around line 305-314: The variable phase2_compact_spikes is being incremented
for any backend when do_search() returns lat > 50ms but the final report labels
it as "Compaction spikes", which is only meaningful for Moon; update the logic
so either (a) only increment phase2_compact_spikes when adapter.name() == "Moon"
(i.e., gate the increment around adapter.name() check where
phase2_compact_spikes is updated) or (b) rename the metric and its report label
to a generic "Latency spikes (>50ms)" everywhere it is used (including the
report generation at the end) so it correctly reflects a backend-agnostic
tail-latency counter. Ensure you update both the increment sites (where
phase2_compact_spikes is modified) and the final reporting (the lines that print
"Compaction spikes") to keep names consistent.
- Around line 108-186: The issue is reuse of fixed namespaces (idx, doc:,
vecset, vec:) causing collisions between runs; fix by adding a per-run unique
namespace/id and using it in all commands or by tearing down existing state
before creation: generate a run-specific identifier in the adapter/connector
(e.g., set self.run_id or self.index_name in the class __init__ or
create_index), replace literal "idx", "doc:", "vecset", "vec:" and f"vec:{id}"
usages in RedisMoonAdapter.create_index / insert / insert_batch / search and
RedisAdapter.create_index / insert / insert_batch / search with names that
include that run-specific id, or call the appropriate teardown commands
(FT.DROPINDEX for the RediSearch adapter and DEL/UNLINK for vecset/vec keys and
VDEL if available) before creating/adding to avoid inheriting state.
- Around line 283-410: The script currently exhausts the cached 10K corpus
before Phase 3 because Phase1 (do_insert_batch(1000)) + Phase2 (900 turns × 10
inserts = 9000) use the full budget; add a guard before Phase 1 that checks n
and next_id against the planned inserts (1000 + 9000) and either (a) reduce
phase2 turns/inserts to fit the configured n (adjust the loop using the symbol
range(900) or compute a dynamic turns variable) or (b) fail fast with a clear
error asking to regenerate the cache with a larger corpus and update the cache
key. Also ensure any cache key/path change is applied when changing n so old 10K
artifacts don’t mask the fix; reference next_id, n, do_insert_batch, and the
Phase 2 loop (range(900)) when making the change.
In `@src/command/vector_search/mod.rs`:
- Around line 474-483: The dimension-boost mapping is inverted:
idx.meta.dimension uses dim_factor to scale beam but the high-d branch currently
gives no boost; update the dim_factor logic so that higher dimensions get larger
multipliers (use dim_factor = 4 for idx.meta.dimension >= 768, dim_factor = 3
for idx.meta.dimension >= 384, else dim_factor = 2) so the expression (base *
dim_factor / 2).clamp(200, 1000) yields +50% at >=384 and +100% at >=768 as the
comment describes; adjust the dim_factor assignment near the variables base and
dim_factor in this module accordingly.
In `@src/vector/persistence/segment_io.rs`:
- Around line 384-386: The current v1 fallback sets key_hash to 0 in
src/vector/persistence/segment_io.rs (the tuple branch returning (internal_id,
0u64)), which collides with mark_deleted_by_key_hash in
src/vector/segment/mutable.rs and can cause mass deletions; change the v1
fallback key_hash from 0u64 to a sentinel value (e.g. u64::MAX) unlikely to
match real hashes, and add a brief code comment documenting this decision and
that legacy v1 segments should be compacted or treated specially; also scan
mark_deleted_by_key_hash to ensure it will never intentionally target the
sentinel value (or explicitly ignore sentinel) so legacy entries are not
matched.
- Around line 349-360: The current version-detection using mvcc_bytes[0] == 2
can misclassify v1 files whose LE count low byte is 0x02; update the detection
in segment_io.rs to only treat it as v2 when mvcc_bytes[0] == 2 AND the parsed
v2 count (from mvcc_bytes[1..5]) is plausible — e.g., ensure mvcc_bytes.len() >=
5 and that pos + count * HEADER_SIZE (or the expected per-entry size) <=
mvcc_bytes.len(); if that validation fails, fall back to parsing as v1 (read
count from mvcc_bytes[0..4] and set pos=4). Use the same symbols mvcc_bytes,
mvcc_version, mvcc_count, and pos so callers remain unchanged.
In `@src/vector/segment/compaction.rs`:
- Line 159: The current join of worker threads uses
handles.into_iter().filter_map(|h| h.join().ok()).collect(), which silently
drops any panicked thread and loses its sub-graph; replace this with explicit
panic propagation or error reporting by iterating handles and calling join()
while handling the Err case: for each thread handle returned from the parallel
graph builder (the variable `handles`), call join() and if it Errs either
propagate the panic (e.g., unwrap/expect) or log the thread panic including
context (cell id or worker index) and return an error/Result upstream so the
caller knows a sub-graph was lost; ensure the calling function/sig for the
compaction/graph-building path (the function that owns `handles`) returns a
Result or panics consistently so incomplete graphs are not silently accepted.
In `@src/vector/segment/mutable.rs`:
- Around line 408-416: brute_force_search_mvcc currently sets use_tq_adc =
!is_a2 && ... and then assumes query_state is Some, causing a panic when is_a2
is true and query_state.is_none(); fix by adding the A2 decoded-L2 fallback path
similar to brute_force_search_filtered: when is_a2 && query_state.is_none()
decode A2 centroids into q_rotated (use the a2_placeholder/centroids logic
already present) and set use_tq_adc false or an equivalent flag so the search
uses decoded L2 distances; additionally add an A2 distance branch in the scoring
loop (the same branch used in brute_force_search_filtered) so the scorer
computes distances for A2-encoded vectors instead of calling
query_state.unwrap(), and ensure all references to query_state are guarded or
use the decoded path to avoid unwrap panics.
In `@src/vector/store.rs`:
- Around line 109-115: When attaching recovered segments in create_index(),
compute the current max global ID by iterating immutable segments' graph node
counts (sum or max of each immutable segment's graph.node_count()) and call
set_global_id_base(...) on the recovered mutable (the Arc<MutableSegment> named
new_mutable or recovered mutable) with that max value before swapping it into
the index; ensure you use MutableSegment::set_global_id_base on the recovered
mutable so new entries start after existing IDs and avoid collisions.
---
Outside diff comments:
In `@src/vector/segment/mutable.rs`:
- Line 458: Replace the direct unwrap() call on query_state with explicit
pattern matching: instead of let qs = query_state.unwrap(); match on query_state
(or use if let Some(qs) = query_state { ... } else { unreachable!("query_state
must be Some when use_tq_adc is true") }) so the code does not call unwrap() in
library code; reference the symbols query_state and use_tq_adc in your change to
keep the existing control flow and document the unreachable branch with a clear
message.
---
Nitpick comments:
In `@src/vector/segment/compaction.rs`:
- Around line 501-502: The two later let bindings `let has_raw =
!frozen.raw_f32.is_empty();` and `let dim = frozen.dimension as usize;` are
redundant and shadow earlier identical bindings; remove these re-declarations in
src/vector/segment/compaction.rs and reuse the previously declared `has_raw` and
`dim` variables (or, if different semantics were intended, rename the new
bindings to a distinct name and adjust their usage accordingly), ensuring all
uses of `has_raw` and `dim` in functions like the compaction logic reference the
single, correctly-typed variables derived from `frozen.raw_f32` and
`frozen.dimension`.
- Around line 444-445: Remove the two unused temporary variables by deleting the
declarations of `_codebook` (from collection.codebook_16()) and `_code_len`
(computed as bytes_per_code - 4); these are vestigial after refactoring and
duplicate the correctly declared `code_len` later, so simply remove the
`_codebook` and `_code_len` lines to avoid unused-variable warnings and keep
`collection.codebook_16()` unused unless actually needed elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 398b5521-b766-4162-ae04-dd2565d94c4e
📒 Files selected for processing (15)
.planningscripts/bench-mixed-1k-compact.pyscripts/bench-mixed-workload.pysrc/command/vector_search/mod.rssrc/shard/spsc_handler.rssrc/vector/hnsw/build.rssrc/vector/persistence/segment_io.rssrc/vector/segment/compaction.rssrc/vector/segment/immutable.rssrc/vector/segment/mutable.rssrc/vector/store.rssrc/vector/turbo_quant/a2_lattice.rssrc/vector/turbo_quant/collection.rssrc/vector/turbo_quant/encoder.rssrc/vector/turbo_quant/mod.rs
| def run_moon(port, vectors, queries, gt_final, compact_threshold): | ||
| import redis as redis_lib | ||
|
|
||
| r = redis_lib.Redis(port=port, decode_responses=False, socket_timeout=600) | ||
| r.ping() | ||
|
|
||
| n, dim = vectors.shape | ||
|
|
||
| # Create index with specified compact threshold | ||
| r.execute_command( | ||
| "FT.CREATE", "idx", "ON", "HASH", | ||
| "PREFIX", "1", "doc:", | ||
| "SCHEMA", "vec", "VECTOR", "HNSW", "10", | ||
| "TYPE", "FLOAT32", "DIM", str(dim), | ||
| "DISTANCE_METRIC", "L2", "QUANTIZATION", "TQ4", | ||
| "COMPACT_THRESHOLD", str(compact_threshold), | ||
| ) |
There was a problem hiding this comment.
This benchmark also needs run-specific namespaces.
Lines 51/53/83/99/165/218/228/256 reuse fixed idx, doc:*, vecset, and vec:* names. On a rerun against the same ports, Moon can fail at FT.CREATE, and the baseline is not guaranteed to start from a clean namespace. Use a per-run prefix/set name or explicit teardown so repeated executions stay isolated.
Also applies to: 78-103, 214-228, 254-256
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-mixed-1k-compact.py` around lines 41 - 57, The benchmark reuses
fixed Redis names ("idx", key prefix "doc:", set "vecset", and keys like
"vec:*") causing FT.CREATE conflicts and leftover state between runs; update
run_moon (and other routines that create the index/keys) to generate a
run-specific namespace (e.g., append a unique run_id/timestamp to the index
name, prefix, and set/key names) or perform an explicit teardown before creating
resources so every execution is isolated; ensure all usages of "idx", "doc:",
"vecset", and "vec:*" in the script are replaced to consistently use the per-run
namespace or are cleaned up prior to FT.CREATE.
| if isinstance(result, list) and len(result) > 1: | ||
| for j in range(1, len(result), 2): | ||
| try: | ||
| raw = result[j] | ||
| if isinstance(raw, bytes): | ||
| raw = raw.decode() | ||
| ids.append(int(raw.split(":")[-1])) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Abort the run if ID parsing stops matching the response format.
These except Exception: pass blocks quietly drop unparseable IDs. If FT.SEARCH or VSIM changes shape, the benchmark will look like a recall regression instead of a broken parser. Narrow the exception and raise with the offending payload so the run fails obviously.
Also applies to: 173-181, 233-240, 259-266
🧰 Tools
🪛 Ruff (0.15.7)
[error] 121-122: try-except-pass detected, consider logging the exception
(S110)
[warning] 121-121: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-mixed-1k-compact.py` around lines 114 - 122, The current loop
quietly swallows all exceptions when parsing IDs from the Redis response
(variables result, raw, ids), which can hide format changes; replace the broad
except Exception: pass with targeted exception handling (e.g., except
(ValueError, IndexError, UnicodeDecodeError, AttributeError) as e:) and re-raise
a new exception or call sys.exit with a clear message that includes the
offending payload (raw and/or result) and the original exception to abort the
run; apply the same change to the other identical blocks handling FT.SEARCH/VSIM
responses so any unparseable ID causes the benchmark to fail loudly instead of
silently dropping IDs.
| def create_index(self, dim): | ||
| self.dim = dim | ||
| if not self.created: | ||
| self.r.execute_command( | ||
| "FT.CREATE", "idx", "ON", "HASH", | ||
| "PREFIX", "1", "doc:", | ||
| "SCHEMA", "vec", "VECTOR", "HNSW", "10", | ||
| "TYPE", "FLOAT32", "DIM", str(dim), | ||
| "DISTANCE_METRIC", "L2", "QUANTIZATION", "TQ4", | ||
| "COMPACT_THRESHOLD", "10000", | ||
| ) | ||
| self.created = True | ||
|
|
||
| def insert(self, doc_id, vector): | ||
| self.r.execute_command("HSET", f"doc:{doc_id}", "vec", vector.tobytes()) | ||
|
|
||
| def insert_batch(self, start_id, vectors): | ||
| pipe = self.r.pipeline(transaction=False) | ||
| for i, v in enumerate(vectors): | ||
| pipe.execute_command("HSET", f"doc:{start_id + i}", "vec", v.tobytes()) | ||
| pipe.execute() | ||
|
|
||
| def search(self, query, k=10): | ||
| t0 = time.perf_counter() | ||
| result = self.r.execute_command( | ||
| "FT.SEARCH", "idx", | ||
| f"*=>[KNN {k} @vec $query]", | ||
| "PARAMS", "2", "query", query.tobytes(), | ||
| ) | ||
| latency = (time.perf_counter() - t0) * 1000 | ||
| ids = [] | ||
| if isinstance(result, list) and len(result) > 1: | ||
| for j in range(1, len(result), 2): | ||
| try: | ||
| raw = result[j] | ||
| if isinstance(raw, bytes): | ||
| raw = raw.decode() | ||
| ids.append(int(raw.split(":")[-1])) | ||
| except Exception: | ||
| pass | ||
| return ids, latency | ||
|
|
||
|
|
||
| class RedisAdapter: | ||
| def __init__(self, port): | ||
| import redis as redis_lib | ||
| self.r = redis_lib.Redis(port=port, decode_responses=False, socket_timeout=600) | ||
| self.port = port | ||
|
|
||
| def name(self): | ||
| return "Redis" | ||
|
|
||
| def create_index(self, dim): | ||
| pass # VADD auto-creates | ||
|
|
||
| def insert(self, doc_id, vector): | ||
| self.r.execute_command("VADD", "vecset", "FP32", vector.tobytes(), f"vec:{doc_id}") | ||
|
|
||
| def insert_batch(self, start_id, vectors): | ||
| pipe = self.r.pipeline(transaction=False) | ||
| for i, v in enumerate(vectors): | ||
| pipe.execute_command("VADD", "vecset", "FP32", v.tobytes(), f"vec:{start_id + i}") | ||
| pipe.execute() | ||
|
|
||
| def search(self, query, k=10): | ||
| t0 = time.perf_counter() | ||
| result = self.r.execute_command( | ||
| "VSIM", "vecset", "FP32", query.tobytes(), "COUNT", str(k) | ||
| ) | ||
| latency = (time.perf_counter() - t0) * 1000 | ||
| ids = [] | ||
| if isinstance(result, list): | ||
| for item in result: | ||
| try: | ||
| raw = item.decode() if isinstance(item, bytes) else str(item) | ||
| ids.append(int(raw.split(":")[-1])) | ||
| except Exception: | ||
| pass | ||
| return ids, latency |
There was a problem hiding this comment.
Isolate each backend run instead of reusing fixed namespaces.
Line 112 always creates idx, and the Moon/Redis paths reuse the same doc:*, vecset, and vec:* namespaces on every invocation. That makes reruns non-reproducible: Moon will fail at FT.CREATE, and the Redis baseline can inherit old state if the dataset or ID range changes. Give each run a unique index/set/prefix or explicitly tear the old state down before loading data.
🧰 Tools
🪛 Ruff (0.15.7)
[error] 146-147: try-except-pass detected, consider logging the exception
(S110)
[warning] 146-146: Do not catch blind exception: Exception
(BLE001)
[error] 184-185: try-except-pass detected, consider logging the exception
(S110)
[warning] 184-184: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-mixed-workload.py` around lines 108 - 186, The issue is reuse
of fixed namespaces (idx, doc:, vecset, vec:) causing collisions between runs;
fix by adding a per-run unique namespace/id and using it in all commands or by
tearing down existing state before creation: generate a run-specific identifier
in the adapter/connector (e.g., set self.run_id or self.index_name in the class
__init__ or create_index), replace literal "idx", "doc:", "vecset", "vec:" and
f"vec:{id}" usages in RedisMoonAdapter.create_index / insert / insert_batch /
search and RedisAdapter.create_index / insert / insert_batch / search with names
that include that run-specific id, or call the appropriate teardown commands
(FT.DROPINDEX for the RediSearch adapter and DEL/UNLINK for vecset/vec keys and
VDEL if available) before creating/adding to avoid inheriting state.
| # ── Phase 1: Bulk load 1000 vectors ────────────────────────────── | ||
| print(f" Phase 1: Bulk load 1000 vectors...") | ||
| t0 = time.time() | ||
| do_insert_batch(1000) | ||
| phase1_time = time.time() - t0 | ||
|
|
||
| # One search to trigger compaction (Moon) / indexing | ||
| _, _ = do_search() | ||
| time.sleep(0.5) # Let indexing settle | ||
|
|
||
| results["phases"].append({ | ||
| "name": "Bulk Load", | ||
| "inserts": 1000, | ||
| "time_s": phase1_time, | ||
| "vps": 1000 / phase1_time, | ||
| }) | ||
|
|
||
| # ── Phase 2: Steady state (insert 10, search 5) × 900 turns ───── | ||
| print(f" Phase 2: Steady-state (insert 10 + search 5) × 900 turns...") | ||
| t0 = time.time() | ||
| phase2_lats = [] | ||
| phase2_recalls = [] | ||
| phase2_compact_spikes = 0 | ||
|
|
||
| for turn in range(900): | ||
| do_insert_batch(10) | ||
| for _ in range(5): | ||
| lat, recall = do_search() | ||
| phase2_lats.append(lat) | ||
| phase2_recalls.append(recall) | ||
| if lat > 50: # >50ms = likely compaction interference | ||
| phase2_compact_spikes += 1 | ||
|
|
||
| if (turn + 1) % 300 == 0: | ||
| print(f" Turn {turn+1}/900: {next_id} vectors, " | ||
| f"avg recall={np.mean(phase2_recalls[-100:]):.4f}, " | ||
| f"p50={np.percentile(phase2_lats[-100:], 50):.1f}ms") | ||
|
|
||
| phase2_time = time.time() - t0 | ||
| results["phases"].append({ | ||
| "name": "Steady State", | ||
| "inserts": 9000, | ||
| "searches": 4500, | ||
| "turns": 900, | ||
| "time_s": phase2_time, | ||
| "avg_recall": float(np.mean(phase2_recalls)), | ||
| "p50_ms": float(np.percentile(phase2_lats, 50)), | ||
| "p99_ms": float(np.percentile(phase2_lats, 99)), | ||
| "compact_spikes": phase2_compact_spikes, | ||
| }) | ||
|
|
||
| # ── Phase 3: Search-heavy (insert 1, search 20) × 100 turns ───── | ||
| # Tests search quality after most data is loaded | ||
| # Remaining vectors may not be enough, cap at what we have | ||
| remaining = min(100, n - next_id) | ||
| print(f" Phase 3: Search-heavy (insert 1 + search 20) × {remaining} turns...") | ||
| t0 = time.time() | ||
| phase3_lats = [] | ||
| phase3_recalls = [] | ||
|
|
||
| for turn in range(remaining): | ||
| if next_id < n: | ||
| do_insert_batch(1) | ||
| for _ in range(20): | ||
| lat, recall = do_search() | ||
| phase3_lats.append(lat) | ||
| phase3_recalls.append(recall) | ||
|
|
||
| phase3_time = time.time() - t0 | ||
| results["phases"].append({ | ||
| "name": "Search Heavy", | ||
| "inserts": remaining, | ||
| "searches": remaining * 20, | ||
| "time_s": phase3_time, | ||
| "avg_recall": float(np.mean(phase3_recalls)) if phase3_recalls else 0, | ||
| "p50_ms": float(np.percentile(phase3_lats, 50)) if phase3_lats else 0, | ||
| "p99_ms": float(np.percentile(phase3_lats, 99)) if phase3_lats else 0, | ||
| }) | ||
|
|
||
| # ── Phase 4: Burst insert (remaining vectors, then 50 searches) ── | ||
| burst_count = n - next_id | ||
| if burst_count > 0: | ||
| print(f" Phase 4: Burst insert ({burst_count} vectors, then 50 searches)...") | ||
| t0 = time.time() | ||
| do_insert_batch(burst_count) | ||
| burst_insert_time = time.time() - t0 | ||
|
|
||
| # Wait for indexing | ||
| time.sleep(1) | ||
|
|
||
| phase4_lats = [] | ||
| phase4_recalls = [] | ||
| for _ in range(50): | ||
| lat, recall = do_search() | ||
| phase4_lats.append(lat) | ||
| phase4_recalls.append(recall) | ||
|
|
||
| results["phases"].append({ | ||
| "name": "Burst Insert", | ||
| "inserts": burst_count, | ||
| "searches": 50, | ||
| "insert_time_s": burst_insert_time, | ||
| "insert_vps": burst_count / burst_insert_time if burst_insert_time > 0 else 0, | ||
| "avg_recall": float(np.mean(phase4_recalls)), | ||
| "p50_ms": float(np.percentile(phase4_lats, 50)), | ||
| "p99_ms": float(np.percentile(phase4_lats, 99)), | ||
| }) | ||
|
|
||
| # ── Phase 5: Final recall (200 queries against full dataset) ───── | ||
| print(f" Phase 5: Final recall (200 queries against full dataset)...") | ||
| final_lats = [] | ||
| final_recalls = [] | ||
|
|
||
| for i, q in enumerate(queries): | ||
| ids, lat = adapter.search(q) | ||
| final_lats.append(lat) | ||
| # Use pre-computed ground truth for full dataset | ||
| recall = len(set(ids) & set(gt_final[i])) / 10 | ||
| final_recalls.append(recall) | ||
|
|
||
| results["phases"].append({ | ||
| "name": "Final Recall", | ||
| "searches": len(queries), | ||
| "recall": float(np.mean(final_recalls)), | ||
| "p50_ms": float(np.percentile(final_lats, 50)), | ||
| "p99_ms": float(np.percentile(final_lats, 99)), | ||
| "qps": 1000 / np.mean(final_lats), | ||
| }) |
There was a problem hiding this comment.
Configured phases consume the full 10K corpus before Phase 3 starts.
Line 286 inserts 1,000 vectors and the loop at Line 307 inserts 9,000 more, so by Line 337 remaining is always 0 with the cached 10K dataset. The advertised "Search Heavy" and "Burst Insert" phases therefore never execute, which means this script does not actually benchmark the workload described in the docstring/PR summary. Either reserve vector budget before Phase 2 or fail fast unless the cache is regenerated with a larger corpus. If you change the corpus size, the cache path/key should change too, otherwise old 10K artifacts will keep masking the fix.
🧮 Minimal guard to stop silently skipping the later phases
+ required_vectors = 1000 + 900 * 10 + 100 + 500
+ if n < required_vectors:
+ raise ValueError(
+ f"Configured 5-phase workload needs at least {required_vectors} vectors, got {n}"
+ )
+
# ── Phase 1: Bulk load 1000 vectors ──────────────────────────────
print(f" Phase 1: Bulk load 1000 vectors...")🧰 Tools
🪛 Ruff (0.15.7)
[error] 284-284: f-string without any placeholders
Remove extraneous f prefix
(F541)
[warning] 300-300: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF003)
[error] 301-301: f-string without any placeholders
Remove extraneous f prefix
(F541)
[warning] 301-301: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
[warning] 334-334: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF003)
[warning] 338-338: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
[warning] 343-343: Loop control variable turn not used within loop body
(B007)
[error] 392-392: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-mixed-workload.py` around lines 283 - 410, The script currently
exhausts the cached 10K corpus before Phase 3 because Phase1
(do_insert_batch(1000)) + Phase2 (900 turns × 10 inserts = 9000) use the full
budget; add a guard before Phase 1 that checks n and next_id against the planned
inserts (1000 + 9000) and either (a) reduce phase2 turns/inserts to fit the
configured n (adjust the loop using the symbol range(900) or compute a dynamic
turns variable) or (b) fail fast with a clear error asking to regenerate the
cache with a larger corpus and update the cache key. Also ensure any cache
key/path change is applied when changing n so old 10K artifacts don’t mask the
fix; reference next_id, n, do_insert_batch, and the Phase 2 loop (range(900))
when making the change.
| phase2_compact_spikes = 0 | ||
|
|
||
| for turn in range(900): | ||
| do_insert_batch(10) | ||
| for _ in range(5): | ||
| lat, recall = do_search() | ||
| phase2_lats.append(lat) | ||
| phase2_recalls.append(recall) | ||
| if lat > 50: # >50ms = likely compaction interference | ||
| phase2_compact_spikes += 1 |
There was a problem hiding this comment.
compact_spikes is only meaningful for Moon in this script.
Lines 313-314 increment this counter for every backend, but the report at Lines 476-478 labels it as "Compaction spikes". For Redis and Qdrant this is just a generic >50ms tail-latency counter, so the current name/reporting is misleading. Either gate it on adapter.name() == "Moon" or rename it to a backend-agnostic latency-spike metric.
Also applies to: 475-478
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-mixed-workload.py` around lines 305 - 314, The variable
phase2_compact_spikes is being incremented for any backend when do_search()
returns lat > 50ms but the final report labels it as "Compaction spikes", which
is only meaningful for Moon; update the logic so either (a) only increment
phase2_compact_spikes when adapter.name() == "Moon" (i.e., gate the increment
around adapter.name() check where phase2_compact_spikes is updated) or (b)
rename the metric and its report label to a generic "Latency spikes (>50ms)"
everywhere it is used (including the report generation at the end) so it
correctly reflects a backend-agnostic tail-latency counter. Ensure you update
both the increment sites (where phase2_compact_spikes is modified) and the final
reporting (the lines that print "Compaction spikes") to keep names consistent.
| } else { | ||
| (internal_id, 0u64) // v1 fallback: global_id = internal_id | ||
| }; |
There was a problem hiding this comment.
v1 fallback key_hash=0 causes mass deletion when mark_deleted_by_key_hash is called.
Per src/vector/segment/mutable.rs:575-586, mark_deleted_by_key_hash(key_hash, lsn) marks ALL entries where entry.key_hash == key_hash. If v1 segments set key_hash=0 for all entries, any DEL/UNLINK on a key that hashes to 0 (or any delete operation on v1 data) could inadvertently mark unrelated entries.
This is a latent correctness issue for v1→v2 upgrades. Consider:
- Documenting that v1 segments should be recompacted before enabling delete operations, or
- Using a sentinel value like
u64::MAXthat's unlikely to match real key hashes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/vector/persistence/segment_io.rs` around lines 384 - 386, The current v1
fallback sets key_hash to 0 in src/vector/persistence/segment_io.rs (the tuple
branch returning (internal_id, 0u64)), which collides with
mark_deleted_by_key_hash in src/vector/segment/mutable.rs and can cause mass
deletions; change the v1 fallback key_hash from 0u64 to a sentinel value (e.g.
u64::MAX) unlikely to match real hashes, and add a brief code comment
documenting this decision and that legacy v1 segments should be compacted or
treated specially; also scan mark_deleted_by_key_hash to ensure it will never
intentionally target the sentinel value (or explicitly ignore sentinel) so
legacy entries are not matched.
| }) | ||
| .collect(); | ||
|
|
||
| handles.into_iter().filter_map(|h| h.join().ok()).collect() |
There was a problem hiding this comment.
Thread panics are silently ignored during parallel graph building.
filter_map(|h| h.join().ok()) discards any sub-graph from a thread that panicked. This silently loses vectors assigned to that cell, causing incomplete graphs and potential data loss without any error indication.
Consider propagating thread panics or at least logging when a sub-graph is lost:
Proposed fix
- handles.into_iter().filter_map(|h| h.join().ok()).collect()
+ handles
+ .into_iter()
+ .filter_map(|h| match h.join() {
+ Ok(result) => Some(result),
+ Err(_) => {
+ // Log or track that a cell's sub-graph was lost due to panic
+ // For now, skip the cell (data loss risk)
+ None
+ }
+ })
+ .collect()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| handles.into_iter().filter_map(|h| h.join().ok()).collect() | |
| handles | |
| .into_iter() | |
| .filter_map(|h| match h.join() { | |
| Ok(result) => Some(result), | |
| Err(_) => { | |
| // Log or track that a cell's sub-graph was lost due to panic | |
| // For now, skip the cell (data loss risk) | |
| None | |
| } | |
| }) | |
| .collect() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/vector/segment/compaction.rs` at line 159, The current join of worker
threads uses handles.into_iter().filter_map(|h| h.join().ok()).collect(), which
silently drops any panicked thread and loses its sub-graph; replace this with
explicit panic propagation or error reporting by iterating handles and calling
join() while handling the Err case: for each thread handle returned from the
parallel graph builder (the variable `handles`), call join() and if it Errs
either propagate the panic (e.g., unwrap/expect) or log the thread panic
including context (cell id or worker index) and return an error/Result upstream
so the caller knows a sub-graph was lost; ensure the calling function/sig for
the compaction/graph-building path (the function that owns `handles`) returns a
Result or panics consistently so incomplete graphs are not silently accepted.
| let new_mutable = Arc::new( | ||
| crate::vector::segment::mutable::MutableSegment::new( | ||
| self.meta.dimension, | ||
| self.collection.clone(), | ||
| )), | ||
| ), | ||
| ); | ||
| new_mutable.set_global_id_base(next_global); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how recovered mutable segment's global_id_base is initialized
rg -nC5 'attach_recovered|global_id_base' src/vector/persistence/ src/vector/store.rsRepository: pilotspace/moon
Length of output: 2529
🏁 Script executed:
# Find the recovery module and understand RecoveredState structure
fd -t f 'recovery' src/vector/persistence/Repository: pilotspace/moon
Length of output: 93
🏁 Script executed:
# Check what RecoveredState contains and how it's used
rg -n 'struct RecoveredState|pub recovery' src/vector/persistence/ -A 10Repository: pilotspace/moon
Length of output: 880
🏁 Script executed:
# Look for where mutable segments are created from recovered state
rg -n 'MutableSegment::new' src/vector/ --context 5Repository: pilotspace/moon
Length of output: 10728
🏁 Script executed:
# Check if there's any logic that computes max global_id from immutable segments
rg -n 'max.*global|global.*max|next_global' src/vector/store.rs src/vector/segment/ -A 3 -B 3Repository: pilotspace/moon
Length of output: 2148
🏁 Script executed:
# Look for FrozenSegment and ImmutableSegment structures to see if max_global_id is stored
rg -n 'struct.*Segment|max_global|max_id' src/vector/segment/ --type rs -A 5 | head -100Repository: pilotspace/moon
Length of output: 85
🏁 Script executed:
# Check if immutable segment metadata contains global_id information
rg -n 'pub struct.*Meta|max_global|entries\.len' src/vector/segment/immutable.rs -A 3 -B 1 | head -80Repository: pilotspace/moon
Length of output: 41
🏁 Script executed:
# Look at the recover function to see the full flow
sed -n '1,300p' src/vector/persistence/recovery.rsRepository: pilotspace/moon
Length of output: 11223
🏁 Script executed:
# Continue reading recovery.rs to see the complete flow
sed -n '300,400p' src/vector/persistence/recovery.rsRepository: pilotspace/moon
Length of output: 3449
🏁 Script executed:
# Look for FrozenSegment or ImmutableSegment structure to see available metadata
rg -n 'pub struct.*Segment|pub max_|pub entries' src/vector/segment/immutable.rs -B 2 -A 8 | head -100Repository: pilotspace/moon
Length of output: 591
🏁 Script executed:
# Check if there's any post-recovery logic that computes max global_id
rg -n 'RecoveredCollection|attach_recovered' src/vector/store.rs -A 20Repository: pilotspace/moon
Length of output: 1564
🏁 Script executed:
# Look at what FrozenSegment contains (freeze output)
rg -n 'struct Frozen|pub entries' src/vector/segment/mutable.rs -A 5 | head -60Repository: pilotspace/moon
Length of output: 379
🏁 Script executed:
# Check if immutable segment entry count is available and how it's used
rg -n 'immutable.*len|entries.*len|num_entries' src/vector/segment/immutable.rs -A 2 -B 2Repository: pilotspace/moon
Length of output: 41
🏁 Script executed:
# See complete attach_recovered and create_index to understand the flow
sed -n '169,250p' src/vector/store.rsRepository: pilotspace/moon
Length of output: 2974
🏁 Script executed:
# Check if there's any logic that sets global_id_base on recovered mutable when attaching
rg -n 'set_global_id_base|attach_recovered' src/vector/store.rs src/vector/ -B 5 -A 15 | head -150Repository: pilotspace/moon
Length of output: 6611
🏁 Script executed:
# Check if immutable segment has a len() method to get entry count
rg -n 'pub fn len|pub fn num_' src/vector/segment/immutable.rs -A 3Repository: pilotspace/moon
Length of output: 41
🏁 Script executed:
# Check ImmutableSegment for entry count methods
rg -n 'impl.*ImmutableSegment|pub fn len|pub fn num_entries|pub fn graph' src/vector/segment/immutable.rs -A 5 | head -80Repository: pilotspace/moon
Length of output: 433
🏁 Script executed:
# Look at what happens in create_index when attaching recovered segments - more context
sed -n '200,235p' src/vector/store.rsRepository: pilotspace/moon
Length of output: 1354
Set global_id_base on recovered mutable segment when attaching to index.
In create_index() (lines 218–235), when recovered segments are attached to an index, the recovered mutable segment has global_id_base = 0. It must be initialized from the maximum global_id across immutable segments before swapping to prevent ID collisions with existing entries. Compute max_global_id from each immutable segment's graph node count and call set_global_id_base() on the recovered mutable to ensure new entries continue from the highest assigned ID.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/vector/store.rs` around lines 109 - 115, When attaching recovered
segments in create_index(), compute the current max global ID by iterating
immutable segments' graph node counts (sum or max of each immutable segment's
graph.node_count()) and call set_global_id_base(...) on the recovered mutable
(the Arc<MutableSegment> named new_mutable or recovered mutable) with that max
value before swapping it into the index; ensure you use
MutableSegment::set_global_id_base on the recovered mutable so new entries start
after existing IDs and avoid collisions.
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Summary
Phase 74: Hexagonal HNSW — lattice-inspired improvements to the vector search engine, plus 4 critical bug fixes discovered during benchmarking.
Features
keepPrunedConnectionsfills remaining slots.TQ4A2) for paired-dimension encoding. Density-adapted 1+6+6+3 ring centroid layout with Lloyd refinement. Available viaFT.CREATE ... QUANTIZATION TQ4A2.bench-mixed-workload.py(5-phase simulation) andbench-mixed-1k-compact.py(multi-compaction stress test).Bug Fixes
global_idto MvccHeader with base offset tracking. Without fix: recall drops to 9.7%.Benchmark Results (MiniLM 384d, 10K vectors)
Before → After this PR
vs Competitors (all-MiniLM-L6-v2, 10K vectors, single-threaded)
Moon is 10.8x faster insert than both competitors. Recall gap narrowed to 1.4% (from 5.6%) while maintaining 6x memory compression via TQ-4bit.
Multi-compaction stress test (compact_threshold=1000, 10 cycles)
Files Changed
hnsw/build.rsuse_heuristicflagturbo_quant/a2_lattice.rs(new)turbo_quant/encoder.rsTqCodeWithSigns,encode_tq_mse_scaled_with_signsturbo_quant/collection.rsTQ4A2variant, A2 code bytessegment/mutable.rssegment/immutable.rsMvccHeader.global_idsegment/compaction.rsstore.rscommand/vector_search.rspersistence/segment_io.rsscripts/bench-mixed-*.pyTest plan
cargo test --no-default-features --features runtime-tokio,jemalloc— 1540 tests passcargo clippy -- -D warnings— zero warningsSummary by CodeRabbit
New Features
Improvements
Bug Fixes
Infrastructure