v0.4.0 -- per-edge LLM scoring + view_tree nested JSON by dimknaf · Pull Request #10 · dimknaf/braindb

dimknaf · 2026-06-03T01:00:50Z

v0.4.0 — per-edge LLM scoring + view_tree nested JSON

Headline: a focused pass on recall quality and the view_tree tool. The per-edge LLM judgment that was missing on create_relation is now wired through to graph scoring, and view_tree returns a nested JSON tree the agent can actually navigate (vs the depth-grouped text that silently clipped 70% of connections on popular wikis).

Changed

view_tree / GET /api/v1/memory/tree/<id> — nested JSON shape. Root keyed by entity_type, children arrays per node, multi-path first-wins by best accumulated path score, keyword + retired-wiki noise filtered by default, _truncated last-child marker when more remain. One shared builder (build_entity_tree in braindb/services/tree.py) for the HTTP endpoint and the agent tool. New optional query params: include_keywords (default false), top_k (default 40), min_path_score (default 0.0).
create_relation writes both edge scores. The importance_score column had been NULL for every agent-created row since day one; the parameter is now on the tool, the watcher's extraction prompt no longer dictates literal score values (the LLM judges per docstring), and the graph CTE multiplies relevance_score × COALESCE(importance_score, 0.5) × depth_penalty per hop. is_bidirectional is now ignored by graph traversal — every edge walks both ways.
Seed-similarity propagation in graph scoring. Recall hops carry the seed's similarity score forward through the graph; the depth multiplier is softened.
Prompts refreshed across system_prompt.md, both skill files, README.md, and BRAINDB_GUIDE.md for the new tree shape.

Fixed

view_tree keyword noise through non-tagged_with edges.
view_tree duplicate retired-wiki siblings — tree CTE now skips wikis_ext.retired_at IS NOT NULL.
Test isolation in tests/test_ingest.py — content is uuid'd per run.
Missing test dep — added pytest-asyncio==0.23.7 to [dev].

Bench

Path A (Claude Code + curl skill): 5/5 PASS, view_tree usage 0 → 1-2 calls (the structured shape is now usable in practice).
Path B (Qwen 27B via /agent/query): 5/5 PASS, −25% wall-clock, −26% tool calls, zero delegate calls on the hardest question (was 2). Numbers in benchmarks/runs/round-2f_comparison.md.

Upgrading from v0.3.0

No DB migration. No env-var changes. The wiki maintainer's existing retired-wiki pipeline now also gates view_tree traversal. pyproject.toml version field was at 0.2.0 in v0.3.0's tagged release (the bump was missed); this release catches it up to 0.4.0.

Test plan

134/134 pytest pass (8 wiki_jobs_grouping host-only tests intentionally deselected when run via docker exec).
Path A bench 5/5 (pre-commit + post-commit + post-Micron-ingest).
Path B bench 5/5 (pre-commit + post-commit + post-Micron-ingest).
Live ingest of a 3100-word YouTube transcript: 29 facts extracted, 17 cross-fact relations, 1 synthesis thought, 1 wiki materialized end-to-end. Duplicate wiki created by maintainer race + auto-consolidated by the next maintainer tick — round-2f's retired-wiki filter exercised on real data.
Sensitive-content audit clean (no surname / no secrets / no internal paths / no email leaks across the 9-commit diff).

…_sql - recall_memory docstring: prepend "Primary recall tool" framing. - view_tree docstring: add explicit "when to use" line, named as the entity-driven default; positioned ahead of search_sql for any "what's around this entity" question. - search_sql docstring: tightened to one assertive line — aggregates only, never for recall/discovery/understanding/neighbourhood. - system_prompt.md TOOL PRIORITY block: promote view_tree from a buried bullet to a named slot (#2), making the entity-driven vs query-driven split explicit. SQL stays as #5 exception. - skills/braindb-agent/SKILL.md: prose paragraph rewritten as a numbered priority list matching the other skill's shape. - skills/braindb/SKILL.md: split tree out of the "structure lookups" bullet into its own #2 slot; SQL bullet explicitly forbids "around this entity" questions (those are tree's job). - BRAINDB_GUIDE.md: added a top-level ⚠ TOOL PRIORITY block so the reference doc isn't the weakest spot in the guidance chain. Net token impact on agent prompts: ~+20 tokens per call (one extra phrase in three tool docstrings). Skill markdowns and the guide are not loaded into prompts. No behaviour change in any code path; this is pure messaging.

…ultiplier Two related fixes to the recall ranking, both confirmed against live data where every depth-1 graph result was pinning at the same literal 0.27 regardless of which seed it came from. 1. Propagate seed similarity through graph hops. The graph CTE now carries a `seed_origin_id` column from each seed down through every recursive row. In context.py, the score component for a graph-discovered (non-seed) entity is now the score of its origin seed, inherited via that column, instead of the literal `0.3` fallback that made every graph-only entity rank identically. Before this fix a perfect-match seed (sim=1.0) and a weak-match seed (sim=0.3) produced the same depth-1 neighbour rank. Worse: a weak seed (sim<0.27) was outranked by its own neighbours because the fallback floor was higher than the seed's real score. Both gone now. 2. Soften the depth multiplier. The hardcoded depth step in the recursive CTE goes from [1.0 / 0.6 / 0.3] to [1.0 / 0.8 / 0.6]. Deeper hops still decay but no longer collapse — depth-2 and depth-3 items can now reach final_rank values that exceed the min_relevance threshold and surface in results, instead of vanishing as they do today. Net effect (for a seed with similarity 1.0): depth 0: 1.00 -> 1.00 (unchanged) depth 1: 0.27 -> 0.80 depth 2: 0.12 -> 0.51 depth 3: 0.05 -> 0.31 For a seed with similarity 0.5: depth 0: 0.50 -> 0.50 (unchanged) depth 1: 0.27 -> 0.40 (now correctly lower than seed) depth 2: 0.12 -> 0.26 depth 3: 0.05 -> 0.15 No new tables, no migration, no config flags, no module reorganisation. Just two surgical edits: one extra column in the CTE, one Python lookup swap, two constants nudged up.

Round-1 elevated view_tree under a category label ("default for entity- driven neighbourhood exploration"). Benchmarking showed both Claude (via the curl skill) and the in-house Qwen agent picked tree zero times across 5 questions x 2 paths = 10 runs. The category framing was too abstract. This round describes WHAT tree does (capability) with a SUGGESTIVE "when" hint, not a rigid trigger. The agent keeps full judgment about whether to use it — we just make the value clearer: reveals an entity's connections in one call: relations + 1-N hop neighbours + edge scores. Especially useful when you have an entity ID (from a previous result) and want its graph context. No "INSTEAD OF" commands. No decision-rule blocks. No examples. The shape mirrors what already worked for the search_sql demotion (capability + bounded use, agent decides). Net token impact on the agent's system prompt: ~-20 to -50 tokens (this is a shrink, not a bloat). Same edit applied across: - braindb/agent/tools.py (view_tree docstring — in agent prompt) - braindb/agent/prompts/system_prompt.md (TOOL PRIORITY block entry #2) - skills/braindb/SKILL.md (user-facing skill) - skills/braindb-agent/SKILL.md (user-facing skill) - BRAINDB_GUIDE.md (reference guide top block) No code changes. No behavioural change in any code path. Pure messaging. Verified separately on the same benchmark question set next.

…abels Three minimal fixes to the agent's view_tree tool. Round-2a benchmarks showed Claude started using tree (0/5 -> 3/5) but the Qwen agent didn't, and even on Path A one of three tree calls (q4) didn't pay off. Looking at the actual implementation revealed why: tree was advertised with a max_depth argument but ignored it (single-hop SQL), so an agent asking view_tree(id, max_depth=2) only got depth-1 connections. Fixes: 1. max_depth respected. Single-hop SQL replaced with a recursive CTE that walks bidirectionally (as the single-hop already did via the OR clause) and stops at the requested depth. A cycle-visited array prevents loops. 2. Depth grouping in output. "DEPTH N (count):" headers between sections. Within a depth, rows sorted by edge_score desc. Same line shape as before; only headers are new. 3. Wiki labels use canonical_name. The wiki:meta comment header was being truncated as if it were content body. Extract canonical_name via a small regex; everything else keeps the existing 80-char content truncation. No graph.py change. No system_prompt change (already committed in round-2a). No schema change. ~35 lines net in tools.py. Verified separately on the same 5 benchmark questions next.

… tool Round-2a benchmarks showed the Qwen agent (Path B) still picked tree 0/5 times despite the new wording. Hypothesis: "graph context" reads as a niche specialist feature to a smaller model, so it falls back to recall. The reframing makes tree sound like the GENERAL tool for the thing the agent actually wants to do once it has an entity in hand: explore around it. "Explore around this entity" is the everyday framing; "graph context" is the jargon framing. Same line count, same shape, no bloat. Just a verb change in the system_prompt.md TOOL PRIORITY block entry #2. Verified separately in the next benchmark.

…our drift) Round-2c benchmarks revealed that "tree" was served by two divergent implementations: the HTTP endpoint (routers/memory.py::entity_tree) did a single-hop SQL that silently ignored max_depth, while the agent's view_tree tool ran a proper recursive CTE. Same name, same input, different behaviour. Path A (HTTP) and Path B (agent tool) were not looking at the same data. This commit extracts one source of truth: - braindb/services/tree.py NEW: build_entity_tree(conn, entity_id, max_depth) recursive CTE walks bidirectionally and respects max_depth. Returns {"root": {...}, "connections": [...]} with the same shape the HTTP endpoint always advertised — the frontend Graph tab keeps reading it unchanged. - routers/memory.py::entity_tree shrinks from ~60 lines to 8: just calls build_entity_tree. - agent/tools.py::view_tree shrinks: drops its own recursive CTE (added in bccf2b4), calls build_entity_tree, keeps only the text rendering (depth headers, [out]/[in] arrows, _tree_label for wikis). Behavioural effects: - HTTP /memory/tree/<id>?max_depth=N now actually walks N hops. Quick spot check: tree on the "value-investing" keyword used to return ~20 depth-1 connections; now returns 156 connections (20 d1 + 136 d2). - Frontend Graph tab: same field names, same direction values ("outgoing"/"incoming"), more nodes visible at depth 2. No JS change needed. - Agent view_tree tool: returns the same text shape we shipped in bccf2b4; underlying data now comes from the shared service. Tests: tests/test_search.py — all 6 tests pass (shape-agnostic check on /memory/tree was already there; refactor preserves the shape). Net diff: +60 / -90 across 3 files. Code SHRINK.

…es both scores * tools.py: create_relation gains importance_score parameter; INSERT writes both relevance_score and importance_score (column was NULL for all agent-created rows since day one). * ingest_watcher.py: stripped dictated certainty/importance/relevance_score literals from chunk-extraction and central_review prompts -- LLM judges per the tool docstring. AGENT_TIMEOUT now env-overridable, default 1200. * services/graph.py: per-hop multiplier now r.relevance_score * COALESCE(r.importance_score, 0.5) * depth_penalty; is_bidirectional dropped from the JOIN (always walks both directions, matches tree.py). * system_prompt.md: importance_score added to create_relation param list. * 4 new tests lock the behaviour: persistence, watcher-no-dictation, importance_score moves rank, unidirectional edges walk backwards. All 142 tests pass. Path A bench 5/5 PASS in 14s (zero view_tree). Path B bench 5/5 PASS in 1090s at 1200s timeout. Variance verified on live ingest of the AI Dark Output article.

* pyproject.toml: add pytest-asyncio==0.23.7 to [dev]. Existing tests use @pytest.mark.asyncio decorators (test_handoff_hooks, test_runhooks_countdown, test_final_answer_rename) but the plugin was not listed in deps, so `pip install -e ".[dev]"` left them skipped silently on a clean install. * tests/test_ingest.py: the three datasource-ingest tests used fixed content strings, so a previous run's row in the DB caused dedup-by-hash to fire and the 201 assertion to fail on subsequent runs. Prepend a per-run uuid to the content so each invocation is genuinely fresh. No production-code change. 134/134 pass (8 wiki_jobs_grouping deselected — those use the host port mapping; they run from the host per tests/README.md, not from inside the api container). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ired wikis * services/tree.py: one build_entity_tree function. Recursive CTE carries parent_id + accumulated_score (relevance × COALESCE( importance_score, 0.5) × depth_penalty, same formula as graph.py). DISTINCT ON (id) ORDER BY id, accumulated_score DESC -- multi-path first-wins by best score. Skip tagged_with edges + target.entity_type='keyword' by default; skip wikis_ext.retired_at IS NOT NULL. New shape: root keyed by entity_type, children arrays per node, _truncated last-child marker. * routers/memory.py: /memory/tree/<id> returns the nested shape; new query params include_keywords, top_k (default 40), min_path_score. * agent/tools.py::view_tree: returns json.dumps(tree) directly; _tree_label helper removed. * system_prompt.md: view_tree blurb updated to describe nested JSON. Path A 5/5 PASS, view_tree 0/5 -> 1/5 (the agent reaches for tree now that the shape is structured). Path B 5/5 PASS, 1090s -> 773s (-29%), 54 tool calls -> 40 (-26%), zero delegate calls on q4 (was 2). Two latent bugs caught by the new shape and fixed in this commit: keyword children leaking through non-tagged_with edges; duplicate retired-wiki siblings. Frontend Graph tab will be broken until graph.js consumes the new shape -- follow-up commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bumps pyproject.toml 0.2.0 -> 0.4.0 (catches up from the v0.3.0 release which shipped without a pyproject bump). Adds the CHANGELOG entry. Refreshes user-facing docs (README, BRAINDB_GUIDE, both skills) for the new nested-JSON tree shape. No DB migration. No env-var changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

dimknaf and others added 10 commits May 31, 2026 16:11

dimknaf merged commit b08e9e9 into main Jun 3, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0 -- per-edge LLM scoring + view_tree nested JSON#10

v0.4.0 -- per-edge LLM scoring + view_tree nested JSON#10
dimknaf merged 10 commits into
mainfrom
feat/edge-scoring-and-tree-shape

dimknaf commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dimknaf commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!