You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -206,7 +206,7 @@ For Docker deployments, set `ollama = "http://host.docker.internal:11434"` in `[
206
206
207
207
## Web UI (Lerim Cloud)
208
208
209
-
The browser UI (sessions, memories, pipeline, settings) lives in **[lerim-cloud](https://github.com/lerim-dev/lerim-cloud)** and is served from **[lerim.dev](https://lerim.dev)**. The `lerim` daemon still exposes a **JSON API** on `http://localhost:8765` for the CLI and for Cloud to talk to your local runtime when connected.
209
+
The web dashboard has moved to **[lerim.dev](https://lerim.dev)**. The local bundled dashboard has been removed as of v0.1.70 -- all UI features (sessions, memories, pipeline, settings) are now part of **[Lerim Cloud](https://lerim.dev)**. The `lerim` daemon still exposes a **JSON API** on `http://localhost:8765` for the CLI and for Cloud to talk to your local runtime when connected. Running `lerim dashboard` shows a transition message with CLI alternatives.
210
210
211
211
## CLI reference
212
212
@@ -231,6 +231,7 @@ lerim ask "Why did we choose this?" # query memories
Use your Read and search tools to investigate the files above. Do NOT load entire files into context -- read strategically:
15
+
16
+
1.**Read a few memory files** in `{memory_root}/decisions/` and `{memory_root}/learnings/` to understand the existing memory state.
17
+
2.**Compare predicted actions** against golden assertions to see where classifications diverge.
18
+
3.**Read the original trace** at `{trace_path}` to verify whether add/update/no_op decisions make sense given the session content.
19
+
20
+
## Predicted Actions
21
+
22
+
```json
23
+
{predictions}
24
+
```
25
+
26
+
## Golden Assertions
27
+
28
+
```json
29
+
{golden}
30
+
```
31
+
32
+
## Scoring (each 0.0 to 1.0)
33
+
34
+
-**completeness** (weight 0.25): Did dedup find all duplicate/overlapping candidates? Were all candidates in the golden set properly classified? 1.0 = no missed duplicates.
35
+
-**faithfulness** (weight 0.25): Are dedup decisions grounded in actual memory content? Are update decisions justified by real overlap between candidate and existing memory? 1.0 = all decisions evidence-based.
36
+
-**coherence** (weight 0.20): Is the reasoning behind dedup decisions clear and consistent? Do add/update/no_op classifications follow a coherent strategy? 1.0 = excellent reasoning.
37
+
-**precision** (weight 0.30): No false-positive duplicates? Items classified as no_op or update should genuinely overlap with existing memories. Penalize marking distinct candidates as duplicates. 1.0 = no incorrect dedup matches.
38
+
39
+
## Response Format
40
+
41
+
Return ONLY valid JSON (no markdown fences, no extra text):
Use your Read and search tools to investigate the files above. Do NOT load entire files into context -- read strategically:
22
+
23
+
1.**List memory files** in `{memory_root}/decisions/` and `{memory_root}/learnings/` to see the post-maintain state.
24
+
2.**Read maintain_actions.json** in `{run_folder}` -- check what actions were taken (merge, archive, consolidate, unchanged).
25
+
3.**Check archived/** directory at `{memory_root}/archived/` for newly archived files. Cross-reference with should_archive list.
26
+
4.**Sample memory files** to verify merge decisions preserved important information from both sources.
27
+
5.**Compare against assertions**: Were should_archive items archived? Were should_merge items merged? Were should_keep items left untouched?
28
+
29
+
## Scoring (each 0.0 to 1.0)
30
+
31
+
-**completeness** (weight 0.25): Did maintenance find all merge and archive opportunities listed in the golden assertions? Were all memory files reviewed? Were should_merge groups actually merged? 1.0 = no missed opportunities.
32
+
-**faithfulness** (weight 0.25): Are maintenance actions reasonable? Do merges preserve important information from both originals? Are archive decisions justified (not discarding valuable content)? 1.0 = all actions correct.
33
+
-**coherence** (weight 0.20): Is the final memory store well-organized after maintenance? Do merged memories read naturally? Is the maintain report well-structured with clear reasoning? 1.0 = excellent coherence.
34
+
-**precision** (weight 0.30): Did maintenance correctly avoid archiving should_keep items? Were no valuable memories incorrectly archived or merged away? Reward archiving genuinely low-quality memories. 1.0 = no incorrect maintenance actions.
35
+
36
+
## Response Format
37
+
38
+
Return ONLY valid JSON (no markdown fences, no extra text):
Use your Read and search tools to investigate the files above. Do NOT load entire files into context -- read strategically:
27
+
28
+
1.**Read the returned memory files** to verify they actually match the query intent.
29
+
2.**Read the known relevant memories** (by their file paths) to understand what should have been returned.
30
+
3.**Check ranking order**: Are the most relevant results ranked highest?
31
+
32
+
## Scoring (each 0.0 to 1.0)
33
+
34
+
-**completeness** (weight 0.25): Did the search find all known relevant memories within the top results? 1.0 = all relevant memories appeared in results.
35
+
-**faithfulness** (weight 0.25): Do the returned results actually match the query semantically? Are they genuinely about the topic being searched? 1.0 = all results are on-topic.
36
+
-**coherence** (weight 0.20): Is the ranking order reasonable? Are the most relevant results ranked first? 1.0 = perfect ranking.
37
+
-**precision** (weight 0.30): Are there irrelevant results in the top-5? Penalize results that do not relate to the query at all. 1.0 = no irrelevant results in top-5.
38
+
39
+
## Response Format
40
+
41
+
Return ONLY valid JSON (no markdown fences, no extra text):
You are evaluating whether the lerim agent selected the correct tools in the correct order during a sync or maintain run.
4
+
5
+
## Context
6
+
7
+
-**Agent trace**: `{agent_trace_path}` -- OpenAI Agents SDK run history with tool calls and results
8
+
-**Expected tool sequence**: see below
9
+
-**Forbidden tools**: see below
10
+
-**Actual tool calls**: see below
11
+
12
+
## Expected Sequence
13
+
14
+
```json
15
+
{expected_sequence}
16
+
```
17
+
18
+
## Forbidden Tools (must_not_call)
19
+
20
+
```json
21
+
{must_not_call}
22
+
```
23
+
24
+
## Actual Tool Calls
25
+
26
+
```json
27
+
{actual_calls}
28
+
```
29
+
30
+
## Instructions
31
+
32
+
Use your Read tool to examine the agent trace at `{agent_trace_path}` if needed for deeper context.
33
+
34
+
1.**Compare tool ordering**: Did the agent call tools in the expected order? Extract/summarize first, then dedup, then classify, then write.
35
+
2.**Check forbidden calls**: Were any must_not_call tools invoked? This is a hard penalty.
36
+
3.**Verify tool arguments**: Were the arguments passed to each tool reasonable for the task?
37
+
4.**Check for unnecessary calls**: Did the agent make redundant or wasted tool calls?
38
+
39
+
## Scoring (each 0.0 to 1.0)
40
+
41
+
-**completeness** (weight 0.25): Were all necessary tools called? Did the agent complete the full pipeline without skipping steps? 1.0 = all expected tools were called.
42
+
-**faithfulness** (weight 0.25): Were tool arguments correct and matched to the task? Did the agent pass appropriate data between tools? 1.0 = all arguments well-formed and task-appropriate.
43
+
-**coherence** (weight 0.20): Was the tool ordering logical? Did the agent follow the expected pipeline sequence? 1.0 = perfect ordering.
44
+
-**precision** (weight 0.30): Were there unnecessary tool calls or forbidden tool invocations? Penalize redundant calls and must_not_call violations heavily. 1.0 = no wasted or forbidden calls.
45
+
46
+
## Response Format
47
+
48
+
Return ONLY valid JSON (no markdown fences, no extra text):
0 commit comments