Skip to content

Phase E: cross-session memory + recall (WAL → lazy digest → recall)#157

Merged
azalio merged 12 commits into
mainfrom
arroyo-switchback
Jun 3, 2026
Merged

Phase E: cross-session memory + recall (WAL → lazy digest → recall)#157
azalio merged 12 commits into
mainfrom
arroyo-switchback

Conversation

@azalio
Copy link
Copy Markdown
Owner

@azalio azalio commented Jun 3, 2026

Phase E — Cross-Session Memory + Recall

Gives the MAP Framework durable cross-session memory: cheap per-turn capture into a scratch WAL, lazy LLM "digest" finalization on the next session start, and recall injection of relevant past digests — so a later session knows why an approach was chosen.

Architecture: write-ahead-log → lazy checkpoint (NOT flush-on-SessionEnd). Stop is the only reliable durable-capture point; SessionStart carries finalize + recall. Works with zero SessionEnd dependency (HC-2) and survives compaction (which PreCompact does not reliably fire on, esp. 1M-token contexts).

Built via /map-efficient (8 subtasks, sequential RESEARCH → ACTOR → MONITOR, per-subtask commits). Final-verifier: PASS.

What's in it

  • src/mapify_cli/memory/ (new pure-runtime package):
    • digest_schema.py — single-source field constants + redaction (sk-/gh_/base64/AKIA) + secret-path globs + control-char sanitizer (INV-7, Contract-First).
    • capture.py — LLM-free per-turn scratch WAL append (subprocess-free branch resolve via .git/HEAD), on_session_end best-effort marker.
    • finalize.py — finalize-if-dirty: claude -p (argv form, MAP_INVOKED_BY=memory-finalize recursion guard, hard timeout), atomic .md.tmp → os.replace → .finalized, per-branch flock, truncation-tolerant, empty→no-digest, memory-cost.log.
    • recall.py — current-branch keyword+recency ranking, MAP_MEMORY_RECALL_CAP (default 4000 chars) with whole-digest drops → recall-drop.log, sanitized additionalContext.
  • 4 REQUIRE_GUARD hook shims (map-memory-{capture,finalize,recall,endmark}.py) authored as .jinja, registered in settings.json (Stop / SessionStart finalize→recall / SessionEnd / UserPromptSubmit), lint-hooks.py, and both hook doc tables.
  • map-memory-now skill — on-demand finalize / --finalize-all, requires-cmd: [claude, git], host-gate pruned when claude absent (EC-4).
  • .gitignore — ignores the scratch WAL; documents the MAP_MEMORY_COMMIT_DIGESTS=0 opt-out.
  • Hook executable-bit fix — hooks now ship +x (the harness execs them via shebang); renderer force-sets +x for hook .py/.sh, plus CI guards that exec hooks the way the harness does (shebang, not python3 <path>).

Tests / gates

  • ~150 new unit tests across the memory modules + an end-to-end integration smoke (capture×2 → finalize → recall with a fake claude on PATH; asserts the digest surfaces in additionalContext and memory-cost.log is written).
  • make check (ruff + mypy + pyright + full pytest + check-render) green; full suite 2037 passed.

Notes

  • Token accounting into token_accounting.json is intentionally deferred (Decision 9); finalize writes memory-cost.log instead.
  • Single-source render invariant respected throughout (templates_src/**/*.jinjamake render-templates; check-render enforced).

🤖 Generated with Claude Code

azalio and others added 12 commits June 2, 2026 21:40
…ion + sanitizer)

Adds src/mapify_cli/memory/{__init__,digest_schema}.py as the ONE authority
(INV-7 / Phase-A Contract-First) for the memory subsystem:
- SCRATCH_TURN_FIELDS / SCRATCH_ENDED_FIELDS / DIGEST_FRONTMATTER_FIELDS
  (decisions/findings intentionally absent from scratch shapes, spec:118)
- REDACTION_PATTERNS + redact_text() (sk-/sk-ant-, gh[pousr]_, base64 blob, AKIA)
- SECRET_PATH_GLOBS + redact_secret_path() (.env*/*.pem/*.key/credentials*/secrets*)
- sanitize_value() matching the proven _sanitize_for_json control-char rule

57 unit tests (VC1-VC4). ruff/mypy/pyright clean.

Follow-up (LOW, out of ST-001 scope): github_pat_ fine-grained PAT prefix and
AWS STS ASIA key-id formats are not yet covered by redact_text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
src/mapify_cli/memory/capture.py: LLM-free Stop-hook hot path.
- append_turn(stdin, project_dir): one redacted+sanitized JSONL turn record
  to .map/<branch>/sessions/scratch/<sid>.jsonl + maintains current-session pointer
- append_end_marker(stdin, project_dir): {event:ended,ts,session_id} (reused by ST-005)
- resolve_session_id: stdin session_id -> current-session pointer -> None (HC-1)
- branch resolved by reading .git/HEAD directly (dir + worktree-file + detached
  HEAD) — NO subprocess on the hot path (INV-1 proven by zero-subprocess test)
- turn counter = non-empty scratch line count (+1), resilient to truncated tail (INV-6)
- best-effort: never raises on malformed/empty stdin
- field names imported from digest_schema (INV-7)

30 unit tests. ruff/mypy/pyright clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lock, timeout)

src/mapify_cli/memory/finalize.py: finalize_dirty(incoming_sid, project_dir, timeout=60).
Transactional unit (INV-4, load-bearing order):
  write <sid>.md.tmp -> os.replace -> .finalized marker -> cost log -> delete scratch.
- candidate = scratch/*.jsonl with sid != incoming_sid AND no .finalized (no SessionEnd dep, HC-2)
- per-branch flock (name sanitized to ^[a-zA-Z0-9_-]{1,64}$) + in-lock re-check =>
  idempotent + concurrent-safe => exactly one digest (VC3); LockTimeoutError -> skip
- claude -p in argv list form, env MAP_INVOKED_BY=memory-finalize (recursion guard),
  hard subprocess timeout; timeout/returncode!=0 -> scratch left unfinalized, tmp cleaned
- tolerates truncated trailing JSONL line (INV-6); empty scratch -> no digest but finalized+deleted
- digest redact_text + sanitize_value (defense-in-depth); cost -> sessions/memory-cost.log
  (token_accounting.json deferred)

20 unit tests incl. timeout, returncode!=0, idempotent, concurrent-in-lock, lock-timeout,
truncation, empty, redaction, incoming-skip, and the post-replace marker-touch-failure
retry-convergence path. ruff/mypy/pyright clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
src/mapify_cli/memory/recall.py: build_recall(prompt, branch, project_dir) -> str.
- reads current-branch digests (.map/<branch>/sessions/*.md; cross-branch deferred, OQ-3 v1)
- parses YAML frontmatter (yaml.safe_load), fields via DIGEST_FRONTMATTER_FIELDS (INV-7)
- ranks by prompt keyword/ticket overlap + recency tiebreak; empty prompt -> recency
- caps assembled payload at MAP_MEMORY_RECALL_CAP (default 4000 chars); whole-digest
  drops only (never mid-digest, SC-1) logged to sessions/recall-drop.log
- sanitize_value + redact_text (defense-in-depth); returns "" when nothing to recall

Monitor follow-ups addressed: count the inter-block "\n" separator in the cap check so
the payload never exceeds the cap (was off by N-1); document the 500-char per-block body
bound; add a multi-block strict-cap regression test.

18 unit tests. ruff/mypy/pyright clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds capture.on_session_end(stdin, project_dir): thin SessionEnd entrypoint over
append_end_marker (ST-002). Appends ONLY {event:'ended',ts,session_id} — no finalize,
no LLM. Wraps the call in its own broad guard (swallow+log) so SessionEnd stays
fire-and-forget and NEVER raises (AC-4). Reason-agnostic: SessionEnd reason
(clear/resume/logout) is read for logging only and never enters the record (EC-6).

3 new tests (record-only, swallows-exception, reason-agnostic). Full suite 2021 passed.
ruff/mypy/pyright clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation

Authors 4 thin hook shims as templates_src/hooks/*.py.jinja (rendered to .claude/
and templates/ — claude-only; codex ships only workflow-gate.py):
- map-memory-capture.py   (Stop)         -> capture.append_turn
- map-memory-endmark.py   (SessionEnd)   -> capture.on_session_end
- map-memory-finalize.py  (SessionStart) -> finalize_dirty (MAP_MEMORY_FINALIZE_TIMEOUT, default 60)
- map-memory-recall.py    (SessionStart+UserPromptSubmit) -> build_recall, emits additionalContext

Each: recursion guard `if os.environ.get("MAP_INVOKED_BY"): sys.exit(0)` as the FIRST
main() statement (stops the finalizer's own claude -p from re-triggering memory hooks),
stdin parse, lazy import (src/ first, falls back to installed mapify_cli, ImportError->no-op),
single best-effort module call. Registered in scripts/lint-hooks.py REQUIRE_GUARD and both
doc tables (hooks/README.md + references/hook-patterns.md, all trees).

check-render green; lint-hooks 16 conform; test_hook_patterns 49 passed; full suite 2029 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gitignore

Wires the memory runtime surface:
- settings.json (.jinja + hand-maintained .claude copy, shipped-only): Stop->capture;
  SessionStart->finalize THEN recall (order load-bearing, INV-3); UserPromptSubmit->recall;
  new SessionEnd->endmark.
- New map-memory-now skill (SKILL.md.jinja): on-demand finalize / --finalize-all sweep
  via finalize_dirty(None). skill-rules entry: skillClass=task, requires-cmd=[claude,git],
  direct-invocation triggers. Host gate prunes it when claude absent (EC-4) — new
  TestMapMemoryNowHostGate covers it.
- .gitignore: Phase-E block ignoring .map/*/sessions/scratch/ + documented
  MAP_MEMORY_COMMIT_DIGESTS=0 opt-out; new templates_src/.gitignore.jinja rendered to
  templates/.gitignore (".gitignore" added to renderer _CLAUDE_SHIPPED_ONLY).
- Tests: skill-count 14->15; VC1/VC3 file_copier updated (map-state + map-memory-now both
  require git -> 2 skips); 4 memory-hook smoke cases.

render + check-render green; full suite 2036 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tests/test_memory_integration.py: drives the real hook BINARIES as subprocesses
(capture×2 → finalize[new sid, NO SessionEnd] → recall) with a fake `claude`
executable injected on PATH (mocks claude -p; unconditional, no skipif). Asserts:
2 turn records → exactly one digest .md containing the mocked body → .finalized
marker + scratch deleted → memory-cost.log with input_tokens (VC4) → recall
stdout additionalContext contains the digest body. Strips MAP_INVOKED_BY from the
subprocess env so the recursion guard cannot silently no-op the hooks (proves the
pipeline really ran). No token_accounting.json assertion (Decision 9 descope).

make check (lint + test + check-render) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final-verifier flagged a wording inaccuracy in the map-memory-now SKILL.md Notes:
finalize_dirty never stages or commits in any mode (there is no git add/commit in
the memory modules). Reword the MAP_MEMORY_COMMIT_DIGESTS=0 note: digests are
committed only because they are not git-ignored; the opt-out is uncommenting the
.gitignore line. Re-rendered; check-render green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… shebang

The 4 map-memory-* hooks shipped without the executable bit, so Claude Code's
direct shebang invocation ("$CLAUDE_PROJECT_DIR"/.claude/hooks/<name>.py) failed
at runtime with "/bin/sh: ... Permission denied" (Stop/SessionStart/UserPromptSubmit).
The python-based smoke/integration tests invoked hooks as `python3 <path>`, which
masked the missing bit.

Root fix (defense in depth):
- chmod +x the 4 hook .jinja sources (matches the existing convention, e.g.
  context-meter.py.jinja; git tracks the bit and the renderer propagates it).
- Harden template_renderer._atomic_write_file: FORCE +x for .py/.sh under a
  managed hooks/ dir regardless of source bit, so a hook .jinja that forgets the
  executable bit still ships an executable hook (mirrors create_hook_files'
  unconditional chmod on the install path). Implements the learned "preserve
  executable bits after atomic temp-file writer" rule.

CI guards (test hooks the way the harness does):
- tests/test_hook_patterns.py::test_hook_is_executable — every .py/.sh hook in
  all four trees (.claude, .codex, templates, templates/codex) must be X_OK.
- tests/hooks/test_hook_inventory_smoke.py::test_every_configured_hook_execs_via_shebang
  — exec each settings.json-wired hook via its bare shebang path (no interpreter
  prefix) and assert no PermissionError / 126 / 127.
- tests/test_template_render.py: a hook .jinja without +x still renders executable;
  non-hook files are not force-marked.

Negative-proofed: removing +x makes both guard tests fail; re-render restores it.
make check (lint + test + check-render) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…de absent)

TestVC1MissingDepSkip and TestMapMemoryNowHostGate delegated non-target commands
to the REAL requires-cmd checker. map-memory-now requires-cmd:[claude, git]; on CI
runners `claude` is absent, so map-memory-now skipped on `claude` (not `git`),
flipping the skip message and failing the assertion — while passing locally where
`claude` is installed. Force the patched checker deterministically (target command
absent, all others present) instead of delegating to PATH. Verified with the whole
test file run under a claude-stripped PATH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…subsystem

Correctness:
- finalize: slug disambiguation reserved suffix room before 32-char truncation
  so os.replace can no longer clobber another session's digest (#1)
- finalize: derive slug from the `title` key and strip ```json fences in
  _parse_claude_output so decisions/findings survive fenced output (#3)
- finalize: per-field redaction before YAML escaping; identifier fields
  (session_id/branch/date/slug) excluded so a long session_id is no longer
  rewritten to «redacted», keeping owner-line dedup working (#4)
- recall: cap is rank-monotonic — break on first overflow so a lower-ranked
  smaller digest never jumps a dropped higher-ranked one (#5)
- digest_schema: redact fine-grained github_pat_ tokens (#6); stop
  over-redacting pure-hex git SHAs while still catching mixed-case secrets (#7)
- capture: advance the transcript <sid>.offset only AFTER the record write so a
  crash never skips a transcript range (#8)
- capture: derive a fallback session id from the transcript stem instead of a
  shared unknown.jsonl bucket (#9)

Behavior/perf:
- settings: restore the UserPromptSubmit map-memory-recall registration so
  prompt-relevance ranking actually runs (#2)
- capture: memoise _resolve_branch and tail-read the turn count (O(1)) on the
  hot path; finalize hook subprocess default timeout 60->50 so it stays below
  the 60s harness timeout and its cleanup runs (#10)

Adds tests/test_memory_review_fixes.py (11 regression tests, one per finding).
Full gate green: mypy, pyright src/ 0/0/0, lint-hooks, check-render, 2082 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@azalio
Copy link
Copy Markdown
Owner Author

azalio commented Jun 3, 2026

Code-review fixes pushed (commit 901d991)

Resolved all 10 findings from a high-effort review of the memory subsystem:

Correctness

  1. finalize: slug disambiguation now reserves suffix room before the 32-char truncation → os.replace can no longer clobber another session's digest.
  2. finalize: slug derived from the title key; _parse_claude_output strips ```json fences so decisions/findings survive.
  3. finalize: per-field redaction before YAML escaping; identifier fields (session_id/branch/date/slug) excluded → long session_id no longer becomes «redacted», owner-line dedup preserved.
  4. recall: cap is rank-monotonic — break on first overflow so a lower-ranked smaller digest never jumps a dropped higher-ranked one.
    6/7. digest_schema: redact fine-grained github_pat_ tokens; stop over-redacting pure-hex git SHAs (mixed-case secrets still caught).
  5. capture: advance the transcript <sid>.offset only AFTER the record write (crash-safety).
  6. capture: fallback session id from the transcript stem instead of a shared unknown.jsonl.

Behavior/perf
2. settings: restored the UserPromptSubmit recall registration so prompt-relevance ranking actually runs.
10. capture: memoised _resolve_branch + O(1) tail-read turn count; finalize hook subprocess timeout 60→50 (below the 60s harness timeout so cleanup runs).

Adds tests/test_memory_review_fixes.py (11 regression tests). Full gate green: mypy, pyright src/ 0/0/0, lint-hooks, check-render, 2082 tests passed.

🤖 Generated with Claude Code

@azalio azalio merged commit b0c3133 into main Jun 3, 2026
6 checks passed
@azalio azalio deleted the arroyo-switchback branch June 3, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant