fix: credibility hardening sprint#6
Conversation
…rve args, description, SQLite retries Closes #1 #2 #3 #4 #5 - benchmarks/retention_benchmark.py: capability matrix + honest diagnostics - memctrl/retriever.py: synonym expansion, layer boost, confidence multiplier, relative threshold, deduplication; docstring aligned with code - memctrl/cli.py: removed fake port/host args from serve command - pyproject.toml: aligned description with v1.2 positioning - memctrl/store.py: exponential backoff retry on all write paths; removed dead _with_retry decorator - tests/test_retriever.py: 4 new tests for scoring behavior - tests/test_store.py: concurrent write stress test
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4a12da397
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| deduped: List[Tuple[float, str, str, dict]] = [] | ||
| for item in sorted(scored_memories.values(), key=lambda x: x[0], reverse=True): | ||
| content = item[1] | ||
| if content not in seen_contents and item[0] >= relative_gate: |
There was a problem hiding this comment.
Preserve direct content matches below the branch gate
When a query strongly matches a project node title/summary but that node contains an unrelated memory, the branch metadata and layer boost can set relative_gate high enough that a directly matching lower-scored memory is dropped here even with top_k room. For example, a project node titled jwt auth with content unrelated billing note scores 8.5, while a session memory JWT refresh token bug scores 3.5 and is filtered by the 4.25 gate, so retrieval returns only the unrelated fact. The threshold should not compare branch-only matches against content matches this way, or it should require per-memory content overlap before suppressing other results.
Useful? React with 👍 / 👎.
| print(f"{'Feature':<40} {'Baseline':>10} {'MemCtrl':>10}") | ||
| print("-" * 60) | ||
| print( | ||
| f"{'Explainable retrieval trace':<40} {'no':>10} {'yes':>10}" |
There was a problem hiding this comment.
Report the actual capability check results
The benchmark now runs check_trace_explainability, check_secret_redaction, check_layer_enforcement, and check_lifetime_management, but the matrix still hard-codes MemCtrl as yes. If any of these checks regresses, benchmarks/retention_benchmark.py will continue to print a passing capability matrix, which makes the diagnostic harness misleading precisely for the credibility checks this change adds.
Useful? React with 👍 / 👎.
Summary
Fixes the five highest-impact credibility and durability issues blocking MemCtrl from being a trustworthy open-source tool.
Changes
Test results
Audit
Closes #1 #2 #3 #4 #5