Skip to content

feat: RAG implementation and benchmark for Nabledge v6 (#383)#386

Draft
kiyotis wants to merge 50 commits into
mainfrom
383-rag-implementation
Draft

feat: RAG implementation and benchmark for Nabledge v6 (#383)#386
kiyotis wants to merge 50 commits into
mainfrom
383-rag-implementation

Conversation

@kiyotis

@kiyotis kiyotis commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

See steering.

kiyotis and others added 30 commits June 25, 2026 10:48
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tasks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…v1.18.0 (#383)

- Add `--model` CLI arg (default: cohere.embed-multilingual-v3) for v4 swap without code changes
- Truncate texts to 2048 chars for v3 models (Bedrock rejects longer inputs)
- Bump Qdrant Docker image v1.13.4 → v1.18.0 to match qdrant-client 1.18.0
- Add _MODEL_VECTOR_SIZES dict and _MODEL_MAX_CHARS dict for per-model config
- Add 9 new unit tests (TestModelVectorSizes + TestEmbedTextsModelMaxChars): 36 total

complete task #1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fied

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Single upsert of 9376 chunks (122 MB) exceeded Qdrant's limit.
Batch into 500-point chunks per upsert call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17 tests covering embed_query (search_query input_type), build_processing_type_filter
(nablarch-batch+none OR filter), search_qdrant (query_points API), and format_results
(path.json:sN section_ref format).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Embeds question with Cohere Embed (input_type=search_query) via Bedrock,
builds processing_type OR "none" filter, queries Qdrant via query_points API,
and returns QueryResult list with section_ref in path.json:sN format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ied)

Retrieves top-k sections via query.py, loads section content, builds RAG prompt
with context, calls LLM (claude -p), parses e2e-prompt.md format output, saves
workflow_details/answer/metrics/evaluation.json. Compatible with run_qa.py output
structure. Verified on pre-01: scores 0.90 correctness, 0.92 relevancy, 1.0 faithfulness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, harden error handling (#383)

- Fix A: remove dead boto3 try/except at module level; rename _DEFAULT_EMBED_MODEL_ID,
  _DEFAULT_TOP_K, _QDRANT_HOST, _QDRANT_PORT → public (no leading underscore)
- Fix B: query() now accepts optional qdrant_client param (for DI / testing)
- Fix C: run_rag_qa.py imports public constants and delegates to rag_query() instead of
  duplicating the 4-step embed→filter→search→format pipeline
- Fix D: call_llm() wraps json.loads and subprocess.run in try/except; raises RuntimeError
  with clear messages on TimeoutExpired and JSONDecodeError
- Fix E: format_results() skips hits with empty page_id (logs warning to stderr)
- Fix F: test_query.py — module-level import json, full-path assertions, value check for
  filter test, simplified test_passes_text_in_texts_array; add truncation tests and
  TestQuery orchestration test and test_empty_page_id_hit_is_skipped
  (17 → 21 tests, all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fallback (#383)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sign (#383)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#383)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kiyotis and others added 20 commits June 25, 2026 16:59
…tion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#383)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…py (#383)

- find_truncated_pages: wrap open+json.load in try/except (json.JSONDecodeError, OSError),
  print [WARN] to stderr and continue on corrupt files
- page_id_from_section_ref: warn to stderr when ref has no ':' separator, return
  path stripped of .json suffix (graceful degradation)
- main(): replace data["scenarios"] with data.get("scenarios") guarded by sys.exit(1)
  when key is missing
- _V3_MAX_CHARS: clarify comment — this is a quality threshold (truncation reduces
  retrieval accuracy), not an absence marker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rios.py (#383)

- Fix sys.path.insert to use repo-root approach (4x .parent hops) matching
  test_index.py, and import via tools.rag.scripts.select_scenarios
- Add TestPageIdFromSectionRef::test_ref_with_no_colon_returns_path_stripped_of_json_suffix
  to cover the warning + graceful-degradation path added to page_id_from_section_ref

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-report

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dopt/reject report)

v3 limited 8-scenario benchmark (#5-#6) is intermediate measurement only.
Adopt/reject decision requires v4 full 34-scenario benchmark (#7#8).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iting SCP unlock)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant