You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Non-blocking quality follow-ups from the #610 review:
Tune fusion. Hybrid currently uses Qdrant default RRF (k=60, equal leg weights). Evaluate weighted RRF (Qdrant ≥1.17) or DBSF (≥1.11) once there's a representative query set — no public nDCG benchmark exists, so this needs real-vault evaluation.
Single-char CJK queries. The tokenizer emits character bigrams for CJK runs, so a single-character CJK query (猫) won't match multi-char CJK content (飼猫 only emitted bigram 飼猫). Consider also emitting CJK unigrams, or document the limitation.
u32 collision noise at scale. HMAC dims are truncated to u32; per-user vocab in the low-millions starts accruing collisions (two terms → one bucket, weights summed — graceful ranking noise, not a correctness/security issue). Quantify the ranking-noise floor; widen the dim space if it bites.
Non-blocking quality follow-ups from the #610 review:
猫) won't match multi-char CJK content (飼猫only emitted bigram飼猫). Consider also emitting CJK unigrams, or document the limitation.ReindexKeywordstub recomputes against drifting avgdl during the pass; the real impl should snapshot avgdl once and pass it via the existingBm25injected-param seam.Refs #595, #605, #610