feat: RAG implementation and benchmark for Nabledge v6 (#383) by kiyotis · Pull Request #386 · nablarch/nabledge-dev

kiyotis · 2026-06-25T01:48:51Z

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…v1.18.0 (#383) - Add `--model` CLI arg (default: cohere.embed-multilingual-v3) for v4 swap without code changes - Truncate texts to 2048 chars for v3 models (Bedrock rejects longer inputs) - Bump Qdrant Docker image v1.13.4 → v1.18.0 to match qdrant-client 1.18.0 - Add _MODEL_VECTOR_SIZES dict and _MODEL_MAX_CHARS dict for per-model config - Add 9 new unit tests (TestModelVectorSizes + TestEmbedTextsModelMaxChars): 36 total complete task #1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fied Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Single upsert of 9376 chunks (122 MB) exceeded Qdrant's limit. Batch into 500-point chunks per upsert call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

17 tests covering embed_query (search_query input_type), build_processing_type_filter (nablarch-batch+none OR filter), search_qdrant (query_points API), and format_results (path.json:sN section_ref format). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Embeds question with Cohere Embed (input_type=search_query) via Bedrock, builds processing_type OR "none" filter, queries Qdrant via query_points API, and returns QueryResult list with section_ref in path.json:sN format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ied) Retrieves top-k sections via query.py, loads section content, builds RAG prompt with context, calls LLM (claude -p), parses e2e-prompt.md format output, saves workflow_details/answer/metrics/evaluation.json. Compatible with run_qa.py output structure. Verified on pre-01: scores 0.90 correctness, 0.92 relevancy, 1.0 faithfulness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…, harden error handling (#383) - Fix A: remove dead boto3 try/except at module level; rename _DEFAULT_EMBED_MODEL_ID, _DEFAULT_TOP_K, _QDRANT_HOST, _QDRANT_PORT → public (no leading underscore) - Fix B: query() now accepts optional qdrant_client param (for DI / testing) - Fix C: run_rag_qa.py imports public constants and delegates to rag_query() instead of duplicating the 4-step embed→filter→search→format pipeline - Fix D: call_llm() wraps json.loads and subprocess.run in try/except; raises RuntimeError with clear messages on TimeoutExpired and JSONDecodeError - Fix E: format_results() skips hits with empty page_id (logs warning to stderr) - Fix F: test_query.py — module-level import json, full-path assertions, value check for filter test, simplified test_passes_text_in_texts_array; add truncation tests and TestQuery orchestration test and test_empty_page_id_hit_is_skipped (17 → 21 tests, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fallback (#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sign (#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iews passed)

…y user

… before 3-run

…election

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…py (#383) - find_truncated_pages: wrap open+json.load in try/except (json.JSONDecodeError, OSError), print [WARN] to stderr and continue on corrupt files - page_id_from_section_ref: warn to stderr when ref has no ':' separator, return path stripped of .json suffix (graceful degradation) - main(): replace data["scenarios"] with data.get("scenarios") guarded by sys.exit(1) when key is missing - _V3_MAX_CHARS: clarify comment — this is a quality threshold (truncation reduces retrieval accuracy), not an absence marker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rios.py (#383) - Fix sys.path.insert to use repo-root approach (4x .parent hops) matching test_index.py, and import via tools.rag.scripts.select_scenarios - Add TestPageIdFromSectionRef::test_ref_with_no_colon_returns_path_stripped_of_json_suffix to cover the warning + graceful-degradation path added to page_id_from_section_ref Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…er benchmark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-report Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dopt/reject report) v3 limited 8-scenario benchmark (#5-#6) is intermediate measurement only. Adopt/reject decision requires v4 full 34-scenario benchmark (#7→#8). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iting SCP unlock) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kiyotis and others added 30 commits June 25, 2026 10:48

chore: start session — issue-383

b913255

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: revise steering — split tasks into 6 incremental steps

54f956d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: implement Qdrant Docker env and Indexing pipeline (#383)

a82d40d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update steering — v3 first, v4 swap later; add task #7

b8476c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update steering — v3 for #1-#4, v4 for #5-#7; remove duplicate…

b5c42e3

… tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wip: suspend session — task #1 impl done, --model arg needed for v3

7941c27

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update task-1 check — all criteria met, QA findings resolved

e237c4d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update steering — #1 complete, next: #2 full indexing

c97c645

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: strengthen task-1 evidence — category/processing_type both veri…

de17732

…fied Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add vector search verification to task-1 check

defa272

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: batch upsert to Qdrant to avoid 33 MB payload limit (#383)

acca100

Single upsert of 9376 chunks (122 MB) exceeded Qdrant's limit. Batch into 500-point chunks per upsert call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: task-2 complete — 9376 points indexed, search verified

98d888d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: suspend session — #2 complete, next: #3 query engine

21d08cf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: reconcile steering — resume from #3

c4aa1dd

docs: strengthen completion criteria — verify intent not just execution

98ce3d0

wip: task-3 check file — reviews done, page_id bug found, fix pending

3d41f50

chore: suspend session — #3 impl done, page_id bug fix pending

a161598

chore: reconcile steering — resume from #3 (page_id bug fix)

47e0ec6

fix: use rel_path as page_id in index.py — fixes content unavailable …

48f0656

…fallback (#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: fix stale docstrings and document linked_pages basename-only de…

cc4653c

…sign (#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: rename page_id local var to basename in extract_linked_pages (

37e4342

#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: complete task #3 — RAG query engine (page_id bug fixed, all rev…

973828a

…iews passed)

feat: complete task #3 — RAG query engine with page_id fix verified b…

fbe7329

…y user

docs: task-4 QA complete — 13 scenarios all pass, ready for user review

6255321

docs: revise #5-#6 — v3-eligible scenario selection + 1-run stabilize…

ffcc524

… before 3-run

kiyotis and others added 20 commits June 25, 2026 16:59

feat: complete task #4 — 13 scenarios scale-up verified by user

767fff8

chore: suspend session — #4 complete, next: #5 v3-eligible scenario s…

6a32077

…election

chore: reconcile steering — resume from #5 v3-eligible scenario selec…

e03c0de

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: implement select_scenarios.py for v3-eligible scenario selection (

421798a

#383) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: generate v3-eligible-scenarios.json — 8 eligible of 34 total (#383

b03724f

) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: run-1 benchmark results for 8 v3-eligible scenarios (#383)

53485ca

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add unit tests for select_scenarios.py (#383)

c62b57c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: complete task #5 — v3-eligible scenario selection + run-1 verified

5e2d060

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: suspend session — #5 complete, next: #6 3-run benchmark

9a687a8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: reconcile steering — resume from #6 3-run benchmark

e1f1460

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add RAG benchmark run-2 results (rag-k10-filter, 8 scenarios)

5b13f0a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add RAG benchmark run-3 results (rag-k10-filter, 8 scenarios)

657a7c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add 3-run crossrun-summary and quality-report for RAG k=10 filt…

7cd8965

…er benchmark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: complete task #6 — 3-run benchmark + crossrun-summary + quality…

385a782

…-report Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: add intermediate benchmark runs from task #4 exploration

616f9a2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: suspend session — #6 complete, next: #7 v4 full benchmark (awa…

fdeaed6

…iting SCP unlock) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: reconcile steering — resume from #6, #7 blocked on v4 SCP unlock

dd189f6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: RAG implementation and benchmark for Nabledge v6 (#383)#386

feat: RAG implementation and benchmark for Nabledge v6 (#383)#386
kiyotis wants to merge 50 commits into
mainfrom
383-rag-implementation

kiyotis commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kiyotis commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant