Skip to content

v0.4.0 -- per-edge LLM scoring + view_tree nested JSON#10

Merged
dimknaf merged 10 commits into
mainfrom
feat/edge-scoring-and-tree-shape
Jun 3, 2026
Merged

v0.4.0 -- per-edge LLM scoring + view_tree nested JSON#10
dimknaf merged 10 commits into
mainfrom
feat/edge-scoring-and-tree-shape

Conversation

@dimknaf
Copy link
Copy Markdown
Owner

@dimknaf dimknaf commented Jun 3, 2026

v0.4.0 — per-edge LLM scoring + view_tree nested JSON

Headline: a focused pass on recall quality and the view_tree tool. The per-edge LLM judgment that was missing on create_relation is now wired through to graph scoring, and view_tree returns a nested JSON tree the agent can actually navigate (vs the depth-grouped text that silently clipped 70% of connections on popular wikis).

Changed

  • view_tree / GET /api/v1/memory/tree/<id> — nested JSON shape. Root keyed by entity_type, children arrays per node, multi-path first-wins by best accumulated path score, keyword + retired-wiki noise filtered by default, _truncated last-child marker when more remain. One shared builder (build_entity_tree in braindb/services/tree.py) for the HTTP endpoint and the agent tool. New optional query params: include_keywords (default false), top_k (default 40), min_path_score (default 0.0).
  • create_relation writes both edge scores. The importance_score column had been NULL for every agent-created row since day one; the parameter is now on the tool, the watcher's extraction prompt no longer dictates literal score values (the LLM judges per docstring), and the graph CTE multiplies relevance_score × COALESCE(importance_score, 0.5) × depth_penalty per hop. is_bidirectional is now ignored by graph traversal — every edge walks both ways.
  • Seed-similarity propagation in graph scoring. Recall hops carry the seed's similarity score forward through the graph; the depth multiplier is softened.
  • Prompts refreshed across system_prompt.md, both skill files, README.md, and BRAINDB_GUIDE.md for the new tree shape.

Fixed

  • view_tree keyword noise through non-tagged_with edges.
  • view_tree duplicate retired-wiki siblings — tree CTE now skips wikis_ext.retired_at IS NOT NULL.
  • Test isolation in tests/test_ingest.py — content is uuid'd per run.
  • Missing test dep — added pytest-asyncio==0.23.7 to [dev].

Bench

  • Path A (Claude Code + curl skill): 5/5 PASS, view_tree usage 0 → 1-2 calls (the structured shape is now usable in practice).
  • Path B (Qwen 27B via /agent/query): 5/5 PASS, −25% wall-clock, −26% tool calls, zero delegate calls on the hardest question (was 2). Numbers in benchmarks/runs/round-2f_comparison.md.

Upgrading from v0.3.0

No DB migration. No env-var changes. The wiki maintainer's existing retired-wiki pipeline now also gates view_tree traversal. pyproject.toml version field was at 0.2.0 in v0.3.0's tagged release (the bump was missed); this release catches it up to 0.4.0.

Test plan

  • 134/134 pytest pass (8 wiki_jobs_grouping host-only tests intentionally deselected when run via docker exec).
  • Path A bench 5/5 (pre-commit + post-commit + post-Micron-ingest).
  • Path B bench 5/5 (pre-commit + post-commit + post-Micron-ingest).
  • Live ingest of a 3100-word YouTube transcript: 29 facts extracted, 17 cross-fact relations, 1 synthesis thought, 1 wiki materialized end-to-end. Duplicate wiki created by maintainer race + auto-consolidated by the next maintainer tick — round-2f's retired-wiki filter exercised on real data.
  • Sensitive-content audit clean (no surname / no secrets / no internal paths / no email leaks across the 9-commit diff).

dimknaf and others added 10 commits May 31, 2026 16:11
…_sql

- recall_memory docstring: prepend "Primary recall tool" framing.
- view_tree docstring: add explicit "when to use" line, named as the
  entity-driven default; positioned ahead of search_sql for any
  "what's around this entity" question.
- search_sql docstring: tightened to one assertive line — aggregates
  only, never for recall/discovery/understanding/neighbourhood.
- system_prompt.md TOOL PRIORITY block: promote view_tree from a
  buried bullet to a named slot (#2), making the entity-driven vs
  query-driven split explicit. SQL stays as #5 exception.
- skills/braindb-agent/SKILL.md: prose paragraph rewritten as a
  numbered priority list matching the other skill's shape.
- skills/braindb/SKILL.md: split tree out of the "structure lookups"
  bullet into its own #2 slot; SQL bullet explicitly forbids "around
  this entity" questions (those are tree's job).
- BRAINDB_GUIDE.md: added a top-level ⚠ TOOL PRIORITY block so the
  reference doc isn't the weakest spot in the guidance chain.

Net token impact on agent prompts: ~+20 tokens per call (one extra
phrase in three tool docstrings). Skill markdowns and the guide are
not loaded into prompts.

No behaviour change in any code path; this is pure messaging.
…ultiplier

Two related fixes to the recall ranking, both confirmed against live data
where every depth-1 graph result was pinning at the same literal 0.27
regardless of which seed it came from.

1. Propagate seed similarity through graph hops.

The graph CTE now carries a `seed_origin_id` column from each seed down
through every recursive row. In context.py, the score component for a
graph-discovered (non-seed) entity is now the score of its origin seed,
inherited via that column, instead of the literal `0.3` fallback that
made every graph-only entity rank identically.

Before this fix a perfect-match seed (sim=1.0) and a weak-match seed
(sim=0.3) produced the same depth-1 neighbour rank. Worse: a weak seed
(sim<0.27) was outranked by its own neighbours because the fallback
floor was higher than the seed's real score. Both gone now.

2. Soften the depth multiplier.

The hardcoded depth step in the recursive CTE goes from
[1.0 / 0.6 / 0.3] to [1.0 / 0.8 / 0.6]. Deeper hops still decay but
no longer collapse — depth-2 and depth-3 items can now reach
final_rank values that exceed the min_relevance threshold and surface
in results, instead of vanishing as they do today.

Net effect (for a seed with similarity 1.0):
  depth 0:  1.00  ->  1.00 (unchanged)
  depth 1:  0.27  ->  0.80
  depth 2:  0.12  ->  0.51
  depth 3:  0.05  ->  0.31

For a seed with similarity 0.5:
  depth 0:  0.50  ->  0.50 (unchanged)
  depth 1:  0.27  ->  0.40 (now correctly lower than seed)
  depth 2:  0.12  ->  0.26
  depth 3:  0.05  ->  0.15

No new tables, no migration, no config flags, no module reorganisation.
Just two surgical edits: one extra column in the CTE, one Python lookup
swap, two constants nudged up.
Round-1 elevated view_tree under a category label ("default for entity-
driven neighbourhood exploration"). Benchmarking showed both Claude (via
the curl skill) and the in-house Qwen agent picked tree zero times across
5 questions x 2 paths = 10 runs. The category framing was too abstract.

This round describes WHAT tree does (capability) with a SUGGESTIVE "when"
hint, not a rigid trigger. The agent keeps full judgment about whether to
use it — we just make the value clearer:

  reveals an entity's connections in one call: relations + 1-N hop
  neighbours + edge scores. Especially useful when you have an entity ID
  (from a previous result) and want its graph context.

No "INSTEAD OF" commands. No decision-rule blocks. No examples. The
shape mirrors what already worked for the search_sql demotion (capability
+ bounded use, agent decides).

Net token impact on the agent's system prompt: ~-20 to -50 tokens (this
is a shrink, not a bloat). Same edit applied across:

- braindb/agent/tools.py            (view_tree docstring — in agent prompt)
- braindb/agent/prompts/system_prompt.md  (TOOL PRIORITY block entry #2)
- skills/braindb/SKILL.md           (user-facing skill)
- skills/braindb-agent/SKILL.md     (user-facing skill)
- BRAINDB_GUIDE.md                  (reference guide top block)

No code changes. No behavioural change in any code path. Pure messaging.

Verified separately on the same benchmark question set next.
…abels

Three minimal fixes to the agent's view_tree tool. Round-2a benchmarks
showed Claude started using tree (0/5 -> 3/5) but the Qwen agent didn't,
and even on Path A one of three tree calls (q4) didn't pay off. Looking
at the actual implementation revealed why: tree was advertised with a
max_depth argument but ignored it (single-hop SQL), so an agent asking
view_tree(id, max_depth=2) only got depth-1 connections.

Fixes:

1. max_depth respected. Single-hop SQL replaced with a recursive CTE
   that walks bidirectionally (as the single-hop already did via the
   OR clause) and stops at the requested depth. A cycle-visited array
   prevents loops.

2. Depth grouping in output. "DEPTH N (count):" headers between
   sections. Within a depth, rows sorted by edge_score desc. Same
   line shape as before; only headers are new.

3. Wiki labels use canonical_name. The wiki:meta comment header was
   being truncated as if it were content body. Extract canonical_name
   via a small regex; everything else keeps the existing 80-char
   content truncation.

No graph.py change. No system_prompt change (already committed in
round-2a). No schema change. ~35 lines net in tools.py.

Verified separately on the same 5 benchmark questions next.
… tool

Round-2a benchmarks showed the Qwen agent (Path B) still picked tree 0/5
times despite the new wording. Hypothesis: "graph context" reads as a
niche specialist feature to a smaller model, so it falls back to recall.

The reframing makes tree sound like the GENERAL tool for the thing the
agent actually wants to do once it has an entity in hand: explore
around it. "Explore around this entity" is the everyday framing; "graph
context" is the jargon framing.

Same line count, same shape, no bloat. Just a verb change in the
system_prompt.md TOOL PRIORITY block entry #2.

Verified separately in the next benchmark.
…our drift)

Round-2c benchmarks revealed that "tree" was served by two divergent
implementations: the HTTP endpoint (routers/memory.py::entity_tree) did
a single-hop SQL that silently ignored max_depth, while the agent's
view_tree tool ran a proper recursive CTE. Same name, same input,
different behaviour. Path A (HTTP) and Path B (agent tool) were not
looking at the same data.

This commit extracts one source of truth:

- braindb/services/tree.py NEW: build_entity_tree(conn, entity_id,
  max_depth) recursive CTE walks bidirectionally and respects
  max_depth. Returns {"root": {...}, "connections": [...]} with the
  same shape the HTTP endpoint always advertised — the frontend Graph
  tab keeps reading it unchanged.

- routers/memory.py::entity_tree shrinks from ~60 lines to 8: just
  calls build_entity_tree.

- agent/tools.py::view_tree shrinks: drops its own recursive CTE
  (added in bccf2b4), calls build_entity_tree, keeps only the text
  rendering (depth headers, [out]/[in] arrows, _tree_label for wikis).

Behavioural effects:

- HTTP /memory/tree/<id>?max_depth=N now actually walks N hops. Quick
  spot check: tree on the "value-investing" keyword used to return ~20
  depth-1 connections; now returns 156 connections (20 d1 + 136 d2).
- Frontend Graph tab: same field names, same direction values
  ("outgoing"/"incoming"), more nodes visible at depth 2. No JS change
  needed.
- Agent view_tree tool: returns the same text shape we shipped in
  bccf2b4; underlying data now comes from the shared service.

Tests: tests/test_search.py — all 6 tests pass (shape-agnostic check
on /memory/tree was already there; refactor preserves the shape).

Net diff: +60 / -90 across 3 files. Code SHRINK.
…es both scores

* tools.py: create_relation gains importance_score parameter; INSERT writes
  both relevance_score and importance_score (column was NULL for all
  agent-created rows since day one).
* ingest_watcher.py: stripped dictated certainty/importance/relevance_score
  literals from chunk-extraction and central_review prompts -- LLM judges
  per the tool docstring. AGENT_TIMEOUT now env-overridable, default 1200.
* services/graph.py: per-hop multiplier now r.relevance_score *
  COALESCE(r.importance_score, 0.5) * depth_penalty; is_bidirectional
  dropped from the JOIN (always walks both directions, matches tree.py).
* system_prompt.md: importance_score added to create_relation param list.
* 4 new tests lock the behaviour: persistence, watcher-no-dictation,
  importance_score moves rank, unidirectional edges walk backwards.

All 142 tests pass. Path A bench 5/5 PASS in 14s (zero view_tree).
Path B bench 5/5 PASS in 1090s at 1200s timeout. Variance verified
on live ingest of the AI Dark Output article.
* pyproject.toml: add pytest-asyncio==0.23.7 to [dev]. Existing tests
  use @pytest.mark.asyncio decorators (test_handoff_hooks,
  test_runhooks_countdown, test_final_answer_rename) but the plugin
  was not listed in deps, so `pip install -e ".[dev]"` left them
  skipped silently on a clean install.
* tests/test_ingest.py: the three datasource-ingest tests used
  fixed content strings, so a previous run's row in the DB caused
  dedup-by-hash to fire and the 201 assertion to fail on subsequent
  runs. Prepend a per-run uuid to the content so each invocation is
  genuinely fresh.

No production-code change. 134/134 pass (8 wiki_jobs_grouping
deselected — those use the host port mapping; they run from the
host per tests/README.md, not from inside the api container).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ired wikis

* services/tree.py: one build_entity_tree function. Recursive CTE
  carries parent_id + accumulated_score (relevance × COALESCE(
  importance_score, 0.5) × depth_penalty, same formula as graph.py).
  DISTINCT ON (id) ORDER BY id, accumulated_score DESC --
  multi-path first-wins by best score. Skip tagged_with edges +
  target.entity_type='keyword' by default; skip wikis_ext.retired_at
  IS NOT NULL. New shape: root keyed by entity_type, children arrays
  per node, _truncated last-child marker.
* routers/memory.py: /memory/tree/<id> returns the nested shape;
  new query params include_keywords, top_k (default 40),
  min_path_score.
* agent/tools.py::view_tree: returns json.dumps(tree) directly;
  _tree_label helper removed.
* system_prompt.md: view_tree blurb updated to describe nested JSON.

Path A 5/5 PASS, view_tree 0/5 -> 1/5 (the agent reaches for tree
now that the shape is structured). Path B 5/5 PASS, 1090s -> 773s
(-29%), 54 tool calls -> 40 (-26%), zero delegate calls on q4
(was 2). Two latent bugs caught by the new shape and fixed in this
commit: keyword children leaking through non-tagged_with edges;
duplicate retired-wiki siblings.

Frontend Graph tab will be broken until graph.js consumes the new
shape -- follow-up commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps pyproject.toml 0.2.0 -> 0.4.0 (catches up from the v0.3.0 release
which shipped without a pyproject bump). Adds the CHANGELOG entry.
Refreshes user-facing docs (README, BRAINDB_GUIDE, both skills) for the
new nested-JSON tree shape.

No DB migration. No env-var changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dimknaf dimknaf merged commit b08e9e9 into main Jun 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant