|
| 1 | +# Semantic Search Manual Test Log |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Manual test session for semantic (vector) search on the main project. |
| 6 | +- Date: 2026-02-15 |
| 7 | +- Database: ~/.basic-memory/memory.db (SQLite) |
| 8 | +- Entities: 456 embedded, 2714 vector chunks |
| 9 | +- Search index: 2390 FTS entries |
| 10 | +- Embedding model: default (384-dim, sqlite-vec) |
| 11 | + |
| 12 | +## Test Plan |
| 13 | + |
| 14 | +1. **Search Type Routing** — verify vector/hybrid/text dispatch, invalid search_type handling |
| 15 | +2. **Conceptual Queries** — natural language where vector should beat FTS |
| 16 | +3. **Keyword Queries** — exact terms where FTS should be strong |
| 17 | +4. **Hybrid Ranking** — queries where both FTS and vector contribute |
| 18 | +5. **Result Types** — entities, observations, relations in vector results |
| 19 | +6. **Filters + Vector** — combine vector with types/entity_types/after_date |
| 20 | +7. **Edge Cases** — short queries, long queries, empty, special chars, no-match |
| 21 | +8. **Pagination** — page > 1, page_size respected |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Test Results |
| 26 | + |
| 27 | +### Test 1: Search Type Routing |
| 28 | + |
| 29 | +#### 1a: search_type="semantic" (invalid value) |
| 30 | +- **Input:** query="how does the knowledge graph work", search_type="semantic" |
| 31 | +- **Expected:** error or explicit fallback |
| 32 | +- **Actual:** Silently falls through to text search (else branch in search.py:430) |
| 33 | +- **Verdict:** BUG — should either be a recognized alias for "vector" or return an error |
| 34 | + |
| 35 | +#### 1b: search_type="vector" |
| 36 | +- **Input:** query="keeping AI context between sessions", search_type="vector" |
| 37 | +- **Actual:** 5 results, scores ~0.58-0.59, found "Maintaining context across conversation boundaries" observation |
| 38 | +- **Verdict:** PASS |
| 39 | + |
| 40 | +#### 1c: search_type="text" with conceptual query |
| 41 | +- **Input:** query="keeping AI context between sessions", search_type="text" |
| 42 | +- **Actual:** 0 results (no exact keyword match) |
| 43 | +- **Verdict:** PASS (expected — FTS requires token overlap) |
| 44 | + |
| 45 | +#### 1d: search_type="hybrid" with conceptual query |
| 46 | +- **Input:** query="keeping AI context between sessions", search_type="hybrid" |
| 47 | +- **Actual:** 5 results, same ranking as vector (FTS contributed nothing here) |
| 48 | +- **Verdict:** PASS |
| 49 | + |
| 50 | +#### 1e: search_type="text" with keyword query |
| 51 | +- **Input:** query="OAuth authentication", search_type="text" |
| 52 | +- **Actual:** 3 results — AUTH.md Supabase OAuth, OAuth Rip-and-Replace, OAuth Integration Analysis |
| 53 | +- **Verdict:** PASS |
| 54 | + |
| 55 | +#### 1f: search_type="vector" with keyword query |
| 56 | +- **Input:** query="OAuth authentication", search_type="vector" |
| 57 | +- **Actual:** Same top results as text (keyword-rich content also scores well in vector space) |
| 58 | +- **Verdict:** PASS |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +### Test 2: Conceptual Queries (vector advantage) |
| 63 | + |
| 64 | +#### 2a: Natural language question |
| 65 | +- **Input:** query="why do AI assistants forget things", search_type="vector" |
| 66 | +- **Actual:** 5 results — Manual Testing Session, "Balance security and usability" observation, "Tools should match thought patterns" observation. Scores ~0.56-0.57 |
| 67 | +- **Vector advantage:** Found conceptually related content despite no exact keyword overlap |
| 68 | +- **Verdict:** PASS |
| 69 | + |
| 70 | +#### 2b: Same query, text search |
| 71 | +- **Input:** query="why do AI assistants forget things", search_type="text" |
| 72 | +- **Actual:** 1 result — "What is Basic Memory?" (likely matched on "AI" token) |
| 73 | +- **Verdict:** PASS (demonstrates vector advantage — text barely matched) |
| 74 | + |
| 75 | +#### 2c: Domain concept with no jargon |
| 76 | +- **Input:** query="pricing strategy for cloud product", search_type="vector" |
| 77 | +- **Actual:** 3 results — SPEC-16 MCP Cloud Service Consolidation, knowledge architecture observation, Visual Knowledge Spaces relation. Scores ~0.56-0.57 |
| 78 | +- **Verdict:** PASS (found cloud-related content conceptually) |
| 79 | + |
| 80 | +#### 2d: Technical concept, long query |
| 81 | +- **Input:** query="SQLite performance optimization WAL mode concurrent writes", search_type="vector" |
| 82 | +- **Actual:** 3 results — SPEC-11 API Performance Optimization, Real-Time Updates with WebSockets, marketing status update. Scores ~0.55-0.58 |
| 83 | +- **Verdict:** PASS (found performance-related content) |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +### Test 3: Keyword Queries (FTS strength) |
| 88 | + |
| 89 | +#### 3a: Exact term match — "OAuth authentication" |
| 90 | +- **Text:** 3 results with high relevance (exact matches in titles) |
| 91 | +- **Vector:** Same top results (keyword overlap helps vector too) |
| 92 | +- **Verdict:** PASS — FTS and vector converge on keyword-rich queries |
| 93 | + |
| 94 | +#### 3b: "OAuth" single keyword, hybrid mode |
| 95 | +- **Input:** query="OAuth", search_type="hybrid" |
| 96 | +- **Actual:** 5 results — Basic Memory Coding Guide, AI Collaboration Examples, SPEC-18, daily note, Manual Testing Session. FTS + vector blended. Scores ~0.016-0.032 |
| 97 | +- **Note:** Top hybrid result is "Basic Memory Coding Guide" not an OAuth-specific doc — suggests hybrid scoring may dilute strong FTS matches |
| 98 | +- **Verdict:** PASS but hybrid ranking questionable for single-keyword queries |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +### Test 4: Hybrid Ranking |
| 103 | + |
| 104 | +#### 4a: Hybrid vs vector on "OAuth authentication" |
| 105 | +- **Hybrid with entity_types=["entity"]:** 5 results — RLS Implementation Lessons, Cloud Readiness Assessment, AUTH.md OAuth, Core Service Implementation, OAuth Rip-and-Replace. Scores ~0.016-0.023 |
| 106 | +- **Vector with entity_types=["entity"]:** 5 results — Core Service Implementation, SPEC-13 CLI Auth, Coding Guide, Authentication Service, ADR Production Auth. Scores ~0.55-0.60 |
| 107 | +- **Observation:** Hybrid surfaces different top results than vector-only. Hybrid found RLS and Cloud Readiness docs that vector didn't prioritize. Different ranking is expected from RRF fusion. |
| 108 | +- **Verdict:** PASS — hybrid produces meaningfully different ranking |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +### Test 5: Result Types |
| 113 | + |
| 114 | +#### 5a: Vector returns all result types |
| 115 | +- **Input:** query="keeping AI context between sessions", search_type="vector" |
| 116 | +- **Entities:** SPEC-18 AI Memory Management Tool (type=entity) |
| 117 | +- **Relations:** Prompt Builder integrates_with (type=relation) |
| 118 | +- **Observations:** "Translation layer is key" (type=observation), "Maintaining context across conversation boundaries" (type=observation) |
| 119 | +- **Verdict:** PASS — all three types appear in vector results |
| 120 | + |
| 121 | +#### 5b: Observations carry metadata |
| 122 | +- **Observation result:** category="challenge", content="Maintaining context across conversation boundaries", from_entity="research/ai-knowledge-management-research" |
| 123 | +- **Verdict:** PASS — category, content, from_entity, tags all present |
| 124 | + |
| 125 | +#### 5c: Relations carry link info |
| 126 | +- **Relation result:** relation_type="integrates_with", from_entity="development/features/prompt-builder...", to_entity (present but truncated in some) |
| 127 | +- **Verdict:** PASS — relation metadata present |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +### Test 6: Filters + Vector Search |
| 132 | + |
| 133 | +#### 6a: entity_types=["entity"] with vector |
| 134 | +- **Input:** query="OAuth authentication", search_type="vector", entity_types=["entity"] |
| 135 | +- **Actual:** 5 results, all type="entity" (Core Service Implementation, SPEC-13, Coding Guide, Authentication Service, ADR Auth) |
| 136 | +- **Verdict:** PASS — filter correctly restricts to entities only |
| 137 | + |
| 138 | +#### 6b: types=["note"] with vector |
| 139 | +- **Input:** query="OAuth authentication", search_type="vector", types=["note"] |
| 140 | +- **Actual:** Same 5 results (all have entity_type="note" in metadata) |
| 141 | +- **Verdict:** PASS — types filter works with vector search |
| 142 | + |
| 143 | +#### 6c: after_date with vector |
| 144 | +- **Input:** query="OAuth authentication", search_type="vector", after_date="2025-06-01" |
| 145 | +- **Actual:** 3 results — Core Service Implementation, Cloud Web App analysis observation, SPEC-13. Filtered out older OAuth docs. |
| 146 | +- **Verdict:** PASS — date filter applied correctly |
| 147 | + |
| 148 | +#### 6d: entity_types=["entity"] with hybrid |
| 149 | +- **Input:** query="OAuth authentication", search_type="hybrid", entity_types=["entity"] |
| 150 | +- **Actual:** 5 results, all type="entity" — RLS lessons, Cloud Readiness, AUTH.md OAuth, Core Service, OAuth Rip-and-Replace |
| 151 | +- **Verdict:** PASS — filter works with hybrid mode too |
| 152 | + |
| 153 | +#### 6e: types=["entity"] with vector (WRONG filter name) |
| 154 | +- **Input:** query="OAuth authentication", search_type="vector", types=["entity"] |
| 155 | +- **Actual:** 0 results |
| 156 | +- **Note:** `types` filters by entity_type metadata (e.g., "note", "person"), NOT by SearchItemType. Using types=["entity"] looks for entity_type="entity" which few/no notes have. This is a UX confusion point — the param names are ambiguous. |
| 157 | +- **Verdict:** PASS (correct behavior) but USABILITY ISSUE — easy to confuse types vs entity_types |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +### Test 7: Edge Cases |
| 162 | + |
| 163 | +#### 7a: Single character query |
| 164 | +- **Input:** query="x", search_type="vector" |
| 165 | +- **Actual:** 3 results — "Self-contained application bundle" observation, Non-Markdown File Support relation, quick-win-tools entity. Scores ~0.57-0.59 |
| 166 | +- **Note:** Single character still produces an embedding and returns results. Quality is low/random as expected. |
| 167 | +- **Verdict:** PASS (no crash, returns results) |
| 168 | + |
| 169 | +#### 7b: Whitespace-only query |
| 170 | +- **Input:** query=" ", search_type="vector" |
| 171 | +- **Actual:** 0 results |
| 172 | +- **Verdict:** PASS (handled gracefully — _check_vector_eligible strips and rejects empty) |
| 173 | + |
| 174 | +#### 7c: Query with no relevant content |
| 175 | +- **Input:** query="quantum computing blockchain", search_type="vector" |
| 176 | +- **Actual:** 3 results — Inter-Agent Communication relation, Self-contained bundle observation, JSON-LD interop observation. Scores ~0.54 |
| 177 | +- **Note:** Still returns results because vector search always finds nearest neighbors. Scores are lower (~0.54) than relevant queries (~0.58-0.60). No relevance threshold applied. |
| 178 | +- **Verdict:** PASS (expected behavior) but NOTE — no relevance cutoff means irrelevant queries always return something |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +### Test 8: Pagination |
| 183 | + |
| 184 | +#### 8a: Vector search page 2 |
| 185 | +- **Input:** query="keeping AI context between sessions", search_type="vector", page=2, page_size=3 |
| 186 | +- **Actual:** 3 results on page 2, current_page=2. Different results from page 1. Top: "Maintaining context across conversation boundaries" observation (score 0.587) |
| 187 | +- **Note:** Interestingly, page 2 had a higher-scoring result than some page 1 results. This may indicate pagination doesn't sort globally — it might be paginating within a pre-scored set. |
| 188 | +- **Verdict:** PASS (pagination works) but POSSIBLE ISSUE — result ordering across pages needs investigation |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## Summary |
| 193 | + |
| 194 | +### Passing Tests: 20/21 |
| 195 | + |
| 196 | +### Bugs Found |
| 197 | +1. **search_type="semantic" silently falls through** (Test 1a) — Invalid search_type values fall to the `else` branch and default to text search without any warning. Should either alias "semantic" to "vector" or raise an error. |
| 198 | + |
| 199 | +### Usability Issues |
| 200 | +2. **types vs entity_types confusion** (Test 6e) — `types` filters by entity_type metadata (note, person, etc.) while `entity_types` filters by SearchItemType (entity, observation, relation). The naming is ambiguous and easy to mix up. |
| 201 | +3. **No relevance threshold** (Test 7c) — Vector search always returns nearest neighbors even for completely irrelevant queries. Consider adding a minimum score threshold or at least documenting expected score ranges. |
| 202 | +4. **Hybrid ranking for single keywords** (Test 3b) — Hybrid mode on simple keyword queries produced less intuitive rankings than pure FTS or pure vector. The RRF fusion may dilute strong FTS signals. |
| 203 | + |
| 204 | +### Observations |
| 205 | +- Vector search successfully finds conceptually related content that FTS misses entirely |
| 206 | +- Score ranges: relevant queries ~0.56-0.60, irrelevant queries ~0.54 (narrow spread) |
| 207 | +- All three result types (entity, observation, relation) appear correctly in vector results |
| 208 | +- Filters (entity_types, types, after_date) all work correctly with vector and hybrid modes |
| 209 | +- Pagination works but cross-page ordering may need investigation |
0 commit comments