diff --git a/agents/session-navigator.md b/agents/session-navigator.md
index 19a4ba64..7bc35ee0 100644
--- a/agents/session-navigator.md
+++ b/agents/session-navigator.md
@@ -8,7 +8,7 @@ meta:
description: |
MUST NOT be invoked directly by external callers. ALWAYS delegated to by graph-analyst when the graph server is unreachable or returns 0 sessions.
- Local fallback agent for navigating session data via flat JSONL files using bash/jq/grep safe extraction patterns. Handles session discovery, event search, and session navigation across ~/.amplifier/projects/ when the context-intelligence graph server is unavailable.
+ Local fallback agent for navigating session data via flat JSONL files using bash/jq/grep safe extraction patterns. Handles session discovery, event search, and session navigation under the root resolved from `CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"` when the context-intelligence graph server is unavailable.
This agent is NOT called directly by external callers. It is only delegated to by graph-analyst when the graph server is unreachable or returns 0 sessions. External callers should use graph-analyst instead.
@@ -17,7 +17,7 @@ meta:
Context: Graph analyst delegating because server is unreachable
user: [graph-analyst delegates] 'Find tool errors in session abc123 — graph server is unreachable. Workspace: my-project'
- assistant: 'I will scope search to workspace my-project. I will look in ~/.amplifier/projects/my-project/sessions/ first, then filter by workspace field if needed. I will search for tool errors using safe jq extraction patterns.'
+ assistant: 'I will scope search to workspace my-project. I will first resolve CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}", then look in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/ first, then filter by workspace field if needed. I will search for tool errors using safe jq extraction patterns.'
session-navigator receives workspace from graph-analyst and uses it to scope all directory lookups and field filters. External callers should never invoke session-navigator directly.
@@ -113,13 +113,48 @@ You are `session-navigator` — the local JSONL fallback navigation agent for th
**No server tools:** You do NOT have `graph_query` or `blob_read` tools. You operate entirely on local filesystem files using bash/jq/grep safe extraction patterns. Never attempt to use server tools — they are not available in your tool set.
-**Storage path convention:** All session data lives at:
+**Root resolution — MANDATORY FIRST STEP before any discovery:**
+
+Resolve the root once at the start of every context_intelligence session navigation operation:
+
+```bash
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+```
+
+The on-disk layout is:
```
-~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/events.jsonl
-~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/metadata.json
+$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/events.jsonl
+$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/metadata.json
```
+> **⛔ MARKER RULE — the defect this fixes:** Every discovery glob MUST include the
+> `context-intelligence/` path segment and MUST NOT stop at `sessions//`:
+>
+> ```
+> CORRECT: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl
+> WRONG: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/metadata.json # catches Amplifier core's files
+> ```
+>
+> **Why:** Amplifier core writes `sessions//metadata.json` with NO `context-intelligence/`
+> segment. Globbing one level too shallow latches onto core's files and produces a confident
+> wrong count.
+>
+> **Canonical marker = `events.jsonl`.** The Python readers (`discover.py`, the workflow recipe)
+> treat `context-intelligence/events.jsonl` as the single discriminator of a real capture.
+> `metadata.json` is read here only for its fields; both files are written together, so either
+> glob keeps the `context-intelligence/` segment and avoids the false-positive. For a strict
+> capture count that matches the code, glob `events.jsonl`.
+
+> **⛔ RELOCATION RULE:** The root comes ONLY from `AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`
+> (unset → `$HOME/.amplifier/projects`). The hook's `config.base_path` moves the *writer* but
+> NOT these readers — if a capture seems missing, confirm the writer was relocated via the env
+> var, not `config.base_path` alone.
+
+> **⛔ FAIL-LOUD RULE:** When zero captures are found, say exactly `"looked in , found 0"` —
+> never report a confident count from a shallower glob, never silently fall back to a different
+> path.
+
Every `events.jsonl` line and every `metadata.json` file contains a `workspace` field. The graph-analyst will pass the active workspace when it delegates to you. **Always scope your search to that workspace.**
### Workspace Scoping — Do This First
@@ -129,15 +164,23 @@ When a workspace is provided by the caller, apply it immediately before any othe
**Step 1 — Try directory-first lookup** (fast, covers the common case where workspace equals the project slug):
```bash
-ls ~/.amplifier/projects/{WORKSPACE}/sessions/ 2>/dev/null
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+ls "$CONTEXT_INTELLIGENCE_ROOT"/{WORKSPACE}/sessions/ 2>/dev/null
```
-If this directory exists and contains sessions, work within it exclusively.
+> **Guard:** A `sessions//` directory entry only counts as a context_intelligence capture
+> when `sessions//context-intelligence/` also exists. `ls sessions/` may list directories
+> from Amplifier core with no `context-intelligence/` subdir — do not count those as
+> context_intelligence sessions.
+
+If this directory exists and contains sessions with `context-intelligence/` subdirs, work within it exclusively.
**Step 2 — If that directory is empty or missing**, the workspace was set explicitly and differs from the project slug. Scan across all project directories and filter by the `workspace` field in `metadata.json`:
```bash
-for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.workspace == "{WORKSPACE}") | input_filename' "$f" 2>/dev/null
done
```
@@ -155,26 +198,35 @@ done
Find sessions by ID, project slug, date, or agent name, always scoped to the provided workspace.
```bash
+# Resolve root first (required before any snippet below)
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+
# List sessions in a workspace (directory-first path)
-for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r '[.session_id, .workspace, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null
done | sort -t$'\t' -k4
# List sessions scoped by workspace field (cross-project scan)
-for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.workspace == "my-project") | [.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null
done | sort -t$'\t' -k3
# Find a session by partial ID (within a workspace)
-find ~/.amplifier/projects/my-project/sessions -maxdepth 1 -name "*PARTIAL_ID*" -type d
+# NOTE: a sessions// directory only counts as a context_intelligence capture when
+# sessions//context-intelligence/ also exists. Always confirm the subdir:
+find "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions -maxdepth 1 -name "*PARTIAL_ID*" -type d \
+ | while read -r d; do [ -d "$d/context-intelligence" ] && echo "$d"; done
# Find sessions by agent name within a workspace
-for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.agent_name == "TARGET_AGENT") | .session_id' "$f" 2>/dev/null
done
# Confirm the workspace of a specific session
-jq -r '.workspace' ~/.amplifier/projects/my-project/sessions/SESSION_ID/context-intelligence/metadata.json
+jq -r '.workspace' "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/SESSION_ID/context-intelligence/metadata.json
```
### Event Search
@@ -207,12 +259,16 @@ wc -l < events.jsonl
Trace parent-child chains via `parent_id`, trace delegation trees via `delegate:agent_spawned`/`delegate:agent_completed`.
```bash
+# Resolve root first (required before any snippet below)
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+
# Check if session is root or child, and confirm its workspace
jq -r '{parent_id, workspace, status}' metadata.json
# Find child sessions within a workspace
PARENT_ID="YOUR_SESSION_ID_HERE"
-for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r "select(.parent_id == \"$PARENT_ID\") | [.session_id, .agent_name // \"(root)\", .status, .workspace] | join(\"\t\")" "$f" 2>/dev/null
done
@@ -237,8 +293,9 @@ Since session-navigator is active when no server is configured, you must locate
2. Or read from bundle config YAML under `hook-context-intelligence.config`: `context_intelligence_server_url` and `context_intelligence_api_key`
```bash
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
context-intelligence-upload \
- --path ~/.amplifier/projects/my-project \
+ --path "$CONTEXT_INTELLIGENCE_ROOT"/my-project \
--server-url "https://your-server.example.com" \
--api-key "your-api-key"
```
diff --git a/behaviors/context-intelligence-logging.yaml b/behaviors/context-intelligence-logging.yaml
index 58851f87..0fa45e78 100644
--- a/behaviors/context-intelligence-logging.yaml
+++ b/behaviors/context-intelligence-logging.yaml
@@ -40,7 +40,30 @@ hooks:
- delegate:agent_completed
- delegate:agent_cancelled
- delegate:error
- # base_path: ~/.amplifier/projects (auto-resolved; uncomment to override)
+ # base_path — relocation root for ALL captures.
+ #
+ # This binding is how relocation reaches the readers too: discover.py, the
+ # workflow recipe, and the navigation skills resolve their root ONLY from
+ # the AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH env var. Binding base_path to
+ # the same var keeps the writer and readers in lock-step.
+ #
+ # REQUIRES the host app to expand ${VAR:default} placeholders before mount
+ # (the hook does not self-expand — same contract as url/api_key above). The
+ # Amplifier app-cli does this. On a host that does NOT expand the ${VAR:}
+ # (colon-empty-default) form, the unexpanded literal falls back to the
+ # default SILENTLY, and the §C.3 consistency check in on_session_ready warns
+ # LOUD if the var was actually set — so a broken binding is never silent.
+ #
+ # ⚠ Do NOT relocate by hard-coding a path here instead of the env var:
+ # the writer would move but the env-only readers would not follow, and they
+ # would report "found 0". Always relocate via the env var.
+ #
+ # ⚠ Relocation granularity is PER-PROCESS, not per-session: every session in
+ # one process shares the one env value. A host that needs different roots per
+ # session must use separate processes. At session start the hook logs (INFO)
+ # the active capture root whenever relocation is in effect, so you can confirm
+ # it took effect.
+ base_path: "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}"
# project_slug: (auto-resolved from working directory; uncomment to override)
# exclude_events: [] (optional fnmatch patterns; uncomment and list events to suppress)
#
diff --git a/bundle.dot b/bundle.dot
index 2fdb4366..b9ddc67d 100644
--- a/bundle.dot
+++ b/bundle.dot
@@ -8,7 +8,7 @@ digraph context_intelligence {
nodesep=0.6
ranksep=0.7
bgcolor="white"
- source_hash="ff1d6a9692415633ceafa3389694c17aa82f8f3d8296b1f7c440c9fb815ea5d2"
+ source_hash="18e3046d9999d7c21d8a948dbb2f0e83ae35c8b72a73e2e2793227571f9c088c"
node [fontname="Helvetica", fontsize=11, style="filled,rounded"]
edge [fontname="Helvetica", fontsize=9]
@@ -23,8 +23,8 @@ digraph context_intelligence {
beh_context_intelligence_analysis_behavior [label="context-intelligence-analysis-behavior\n1 tools\n~864 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
beh_context_intelligence_design_behavior [label="context-intelligence-design-behavior\n1 tools\n~331 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
- beh_context_intelligence_logging_behavior [label="context-intelligence-logging-behavior\n1 tools\n~1184 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
- beh_context_intelligence_navigation_behavior [label="context-intelligence-navigation-behavior\n2 tools\n~561 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
+ beh_context_intelligence_logging_behavior [label="context-intelligence-logging-behavior\n1 tools\n~1661 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
+ beh_context_intelligence_navigation_behavior [label="context-intelligence-navigation-behavior\n2 tools\n~617 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
beh_context_intelligence_behavior [label="context-intelligence-behavior\n~236 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"]
}
@@ -37,7 +37,7 @@ digraph context_intelligence {
agt_context_intelligence_design_facilitator [label="context-intelligence-design-facilitator\n~187 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"]
agt_context_intelligence_tool_designer [label="context-intelligence-tool-designer\n~198 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"]
agt_graph_analyst [label="graph-analyst\n~543 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"]
- agt_session_navigator [label="session-navigator\n~366 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"]
+ agt_session_navigator [label="session-navigator\n~422 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"]
}
subgraph cluster_modules {
diff --git a/bundle.png b/bundle.png
index 9723a87d..64b0a5aa 100644
Binary files a/bundle.png and b/bundle.png differ
diff --git a/context_intelligence/__init__.py b/context_intelligence/__init__.py
index 007f706e..993a42fd 100644
--- a/context_intelligence/__init__.py
+++ b/context_intelligence/__init__.py
@@ -25,6 +25,7 @@
resolve_config,
)
from context_intelligence.reconstruct import (
+ DiskScanResult,
build_disk_only_metadata,
discover_sessions,
extract_events,
@@ -45,6 +46,7 @@
"extract_transcript",
"extract_metadata",
"build_disk_only_metadata",
+ "DiskScanResult",
"discover_sessions",
"workspace_slug",
"sessions_dir_for_project",
diff --git a/context_intelligence/config.py b/context_intelligence/config.py
index d53bdcb5..3efc4574 100644
--- a/context_intelligence/config.py
+++ b/context_intelligence/config.py
@@ -29,6 +29,10 @@
AMPLIFIER_DIR = Path.home() / ".amplifier"
SETTINGS_PATH = AMPLIFIER_DIR / "settings.yaml"
+#: The ONE canonical reader-side default root for context-intelligence captures.
+#: All readers fall back to this when AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH is unset.
+DEFAULT_BASE_PATH = Path.home() / ".amplifier" / "projects"
+
# ---------------------------------------------------------------------------
# Shared env-var helpers (used by HookConfigResolver and ToolConfigResolver)
@@ -50,6 +54,141 @@ def _env(suffix: str) -> str | None:
return value if value else None
+def canonicalize_base_path(raw: str | Path | None) -> Path:
+ """Canonicalise a raw base-path value to a **guaranteed absolute** :class:`Path`.
+
+ Four rules applied in order (§D.2):
+
+ 1. Convert to string and strip whitespace. ``None`` → empty string.
+ 2. Empty string → :data:`DEFAULT_BASE_PATH` (never anchored to CWD).
+ 3. Expand ``~`` via :meth:`~pathlib.Path.expanduser`.
+ 4. If the result is still relative → warn and fall back to
+ :data:`DEFAULT_BASE_PATH`. Relative paths are invalid: each OS process
+ has its own CWD, so a relative root produces *different directories* for
+ different processes even when the string is byte-identical.
+
+ No ``os.path.normpath`` or CWD-anchoring — pathlib already drops trailing
+ slashes; absoluteness, not normalisation, is the load-bearing property.
+
+ .. important:: **Duplicated by design.** The fold gate forbids the hook's
+ ``config_resolver.py`` from importing this package, so the SAME rules are
+ inlined in ``HookConfigResolver.base_path``. The two copies MUST stay
+ byte-equivalent; ``tests/test_base_path_parity.py`` pins writer ≡ reader.
+ If you edit one, edit the other and the parity test.
+
+ Parameters
+ ----------
+ raw:
+ A raw string, :class:`~pathlib.Path`, or ``None``.
+
+ Returns
+ -------
+ Path
+ An absolute :class:`~pathlib.Path`. Never relative, never empty.
+ """
+ s = str(raw).strip() if raw is not None else ""
+ if not s:
+ return DEFAULT_BASE_PATH
+ p = Path(s).expanduser()
+ if not p.is_absolute():
+ log.warning(
+ "base_path %r is not absolute; using default %s",
+ s,
+ DEFAULT_BASE_PATH,
+ )
+ return DEFAULT_BASE_PATH
+ return p
+
+
+def context_intelligence_base_path() -> Path:
+ """Reader-side root for context-intelligence captures.
+
+ Reads ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` from the environment via
+ :func:`_env` (which returns ``None`` for both unset and empty) and passes the
+ result through :func:`canonicalize_base_path`, which guarantees an absolute
+ path and falls back to :data:`DEFAULT_BASE_PATH` for empty or relative values.
+
+ Mirrors the shell idiom::
+
+ ${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-~/.amplifier/projects}
+
+ This helper is **gate-safe**: it lives in ``config.py``, which already imports
+ ``os``. It does **not** touch ``config_resolver.py`` (fold-discipline gate).
+ """
+ return canonicalize_base_path(_env("BASE_PATH"))
+
+
+def reader_writer_roots_disagree(
+ env_raw: str | None,
+ writer_base_path: str | Path,
+) -> tuple[bool, Path, Path]:
+ """Compare the reader root against the writer root (§C.3 consistency check).
+
+ Pure, side-effect-free core of the startup consistency check in the hook's
+ ``on_session_ready``. Extracted here (rather than left inline) so the
+ divergence condition is **unit-testable** without importing the hook package
+ (which needs ``amplifier_core`` at import time).
+
+ Both operands pass through the SAME :func:`canonicalize_base_path`, so the
+ comparison is symmetric: a relocated *writer* whose root the env-only readers
+ cannot see produces ``disagree=True``.
+
+ Parameters
+ ----------
+ env_raw:
+ The raw ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` value (``None`` when
+ unset/empty) — exactly what every reader resolves from.
+ writer_base_path:
+ The writer's resolved ``base_path`` (e.g. ``resolver.base_path``).
+
+ Returns
+ -------
+ tuple[bool, Path, Path]
+ ``(disagree, reader_root, writer_root)``. ``disagree`` is ``True`` when
+ the canonicalized roots differ — the caller should then warn LOUD.
+ """
+ reader_root = canonicalize_base_path(env_raw)
+ writer_root = canonicalize_base_path(str(writer_base_path))
+ return reader_root != writer_root, reader_root, writer_root
+
+
+# ---------------------------------------------------------------------------
+# Shared capture-path helpers (canonical capture definition — §D.1)
+# ---------------------------------------------------------------------------
+
+#: Fixed-shape glob (relative to a ``sessions/`` directory) that matches
+#: exactly the files the writer produces. One capture =
+#: ``//context-intelligence/events.jsonl``.
+#:
+#: The ``events.jsonl`` **file** is the discriminator — a bare
+#: ``context-intelligence/`` directory without the file is not a recoverable
+#: capture and must not be counted. Amplifier core's
+#: ``sessions//metadata.json`` has no ``context-intelligence/`` segment and
+#: is excluded by construction.
+CAPTURE_GLOB = "*/context-intelligence/events.jsonl"
+
+
+def capture_paths_under_sessions_dir(sessions_dir: Path) -> list[Path]:
+ """Return all capture paths under a project ``sessions/`` directory.
+
+ Uses the fixed-shape :data:`CAPTURE_GLOB` — **not** a recursive ``**``
+ glob — so only the writer's real output layout is matched.
+
+ Parameters
+ ----------
+ sessions_dir:
+ The ``//sessions`` directory to scan.
+
+ Returns
+ -------
+ list[Path]
+ Sorted list of ``events.jsonl`` file paths, one per qualifying session
+ (including subsessions, which are flat siblings under ``sessions/``).
+ ``session_id`` for any path ``p`` is ``p.parent.parent.name``.
+ """
+ return sorted(sessions_dir.glob(CAPTURE_GLOB))
+
+
# ---------------------------------------------------------------------------
# Shell-style placeholder expander (used by ToolConfigResolver)
# ---------------------------------------------------------------------------
diff --git a/context_intelligence/reconstruct/__init__.py b/context_intelligence/reconstruct/__init__.py
index e396d32c..51ece5d8 100644
--- a/context_intelligence/reconstruct/__init__.py
+++ b/context_intelligence/reconstruct/__init__.py
@@ -33,6 +33,7 @@
from __future__ import annotations
from context_intelligence.reconstruct.discover import (
+ DiskScanResult,
discover_sessions,
sessions_dir_for_project,
workspace_slug,
@@ -49,6 +50,7 @@
"extract_transcript",
"extract_metadata",
"build_disk_only_metadata",
+ "DiskScanResult",
"discover_sessions",
"workspace_slug",
"sessions_dir_for_project",
diff --git a/context_intelligence/reconstruct/discover.py b/context_intelligence/reconstruct/discover.py
index e0c73156..8337b062 100644
--- a/context_intelligence/reconstruct/discover.py
+++ b/context_intelligence/reconstruct/discover.py
@@ -3,7 +3,7 @@
Discovers sessions for a workspace by querying the context-intelligence graph
and scanning the local filesystem for sessions not yet in the graph.
-Level 2 — Network I/O (queries the CI graph server via CIClient).
+Level 2 — Network I/O (queries the context-intelligence graph server via CIClient).
Extracted from prototype scripts/ci-reconstruct-sessions.py (lines 104-111, 1051-1084).
"""
@@ -12,10 +12,14 @@
import logging
import os
+from dataclasses import dataclass, field
from pathlib import Path
from context_intelligence.client import CIClient
-from context_intelligence.config import AMPLIFIER_DIR
+from context_intelligence.config import (
+ capture_paths_under_sessions_dir,
+ context_intelligence_base_path,
+)
log = logging.getLogger("context_intelligence.reconstruct.discover")
@@ -51,9 +55,13 @@ def workspace_slug(project_dir: str) -> str:
def sessions_dir_for_project(project_dir: str) -> Path:
"""Return the sessions directory for a project.
- Builds the path: ``AMPLIFIER_DIR / 'projects' / slug / 'sessions'``,
+ Builds the path: ``context_intelligence_base_path() / slug / 'sessions'``,
where *slug* is derived via :func:`workspace_slug`.
+ The base path honours ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` when
+ set (and guarantees an absolute result via the canonicalizer); falls back
+ to :data:`~context_intelligence.config.DEFAULT_BASE_PATH`.
+
Parameters
----------
project_dir:
@@ -65,18 +73,70 @@ def sessions_dir_for_project(project_dir: str) -> Path:
Path to the sessions directory for the project.
"""
slug = workspace_slug(project_dir)
- return AMPLIFIER_DIR / "projects" / slug / "sessions"
+ return context_intelligence_base_path() / slug / "sessions"
+
+
+@dataclass(frozen=True)
+class DiskScanResult:
+ """Tagged result from a disk scan — prevents silent misreading of absent roots.
+
+ The two distinct fail-loud states ride the **return value**, not only the
+ log, so callers must branch on :attr:`root_exists` rather than treating an
+ empty :attr:`disk_only_ids` as "success".
+
+ Attributes
+ ----------
+ root:
+ The ``sessions/`` directory that was (or was attempted to be) scanned.
+ root_exists:
+ ``True`` when ``root`` existed at scan time; ``False`` when the directory
+ was absent or was a typo'd / missing relocated root. A ``False`` value
+ means **the scan was impossible** — it is NOT equivalent to
+ "found zero captures".
+ disk_only_ids:
+ Session IDs present on disk (via the canonical
+ ``events.jsonl`` marker) but absent from the graph.
+ Computed as ``candidate_ids − graph_ids``.
+ Empty list when ``root_exists`` is ``False``.
+ candidate_ids:
+ The full capture-candidate set — all session IDs that have an
+ ``events.jsonl`` file under ``root``, regardless of graph membership.
+ Subsessions (directories whose name begins with ``0000000000000000``)
+ ARE included; the ``events.jsonl`` marker is the sole discriminator.
+ Empty list when ``root_exists`` is ``False``.
+ """
+
+ root: Path
+ root_exists: bool
+ disk_only_ids: list[str] = field(default_factory=list)
+ candidate_ids: list[str] = field(default_factory=list)
def discover_sessions(
client: CIClient,
workspace: str,
sessions_dir: Path,
-) -> tuple[list[dict], list[str]]:
+) -> tuple[list[dict], DiskScanResult]:
"""Discover all sessions for a workspace from the graph and disk.
Queries the context-intelligence graph for all sessions in *workspace*,
- then scans *sessions_dir* for session directories not present in the graph.
+ then scans *sessions_dir* for captures (``events.jsonl`` files) not present
+ in the graph.
+
+ The canonical capture definition (§D.1) is
+ ``//context-intelligence/events.jsonl`` — a
+ **fixed-shape** glob, **not** recursive. Subsessions (flat siblings whose
+ name begins with ``0000000000000000``) **are** included; they are real
+ captures that were silently dropped by the old ``0000…`` skip.
+
+ Two distinct fail-loud states ride the **return value**:
+
+ - ``root_exists=False`` — the sessions directory was absent or a typo'd
+ relocated root. ``disk_only_ids`` and ``candidate_ids`` are both empty.
+ This is **not** the same as "found zero captures" and the caller MUST
+ branch on it.
+ - ``root_exists=True, candidate_ids=[]`` — the directory exists but contains
+ no qualifying ``events.jsonl`` captures.
Parameters
----------
@@ -89,15 +149,15 @@ def discover_sessions(
Returns
-------
- tuple[list[dict], list[str]]
+ tuple[list[dict], DiskScanResult]
A 2-tuple of:
- **graph_sessions** — list of row dicts from the graph query,
each with keys ``s.node_id``, ``s.status``, ``s.started_at``,
``s.ended_at``, ordered by ``s.started_at``.
- - **disk_only_ids** — list of session directory names found on disk
- but absent from the graph (subsession directories starting with
- ``0000000000000000`` are excluded).
+ - **scan** — :class:`DiskScanResult` with tagged disk-scan outcome.
+ Callers must branch on ``scan.root_exists`` before interpreting
+ ``scan.disk_only_ids``.
"""
rows = client.cypher(
f'MATCH (s:Session) WHERE s.workspace = "{workspace}" '
@@ -106,21 +166,42 @@ def discover_sessions(
workspace=workspace,
)
- # Collect graph session IDs
+ # Collect graph session IDs.
graph_ids: set[str] = set()
for row in rows:
sid = row.get("s.node_id", "")
if sid:
graph_ids.add(sid)
- # Scan disk for session directories not in the graph
- disk_only_ids: list[str] = []
- if sessions_dir.is_dir():
- for entry in sorted(sessions_dir.iterdir()):
- if entry.is_dir() and entry.name not in graph_ids:
- # Skip subsession directories (start with 0000000000000000)
- if entry.name.startswith("0000000000000000"):
- continue
- disk_only_ids.append(entry.name)
-
- return rows, disk_only_ids
+ # --- Absent-root guard (§D.3 FIX 4) ------------------------------------
+ if not sessions_dir.is_dir():
+ log.warning(
+ "context_intelligence root does not exist: %s",
+ sessions_dir,
+ )
+ return rows, DiskScanResult(
+ root=sessions_dir,
+ root_exists=False,
+ )
+
+ # --- Shared capture-candidate set (§D.1) --------------------------------
+ # Fixed-shape glob keyed on events.jsonl — the writer's real output.
+ # Subsessions are flat siblings under sessions/ and ARE counted.
+ # The events.jsonl marker excludes bare dirs AND Amplifier-core's
+ # sessions//metadata.json (no context-intelligence/ segment).
+ capture_paths = capture_paths_under_sessions_dir(sessions_dir)
+ candidate_ids: list[str] = [p.parent.parent.name for p in capture_paths]
+ disk_only_ids: list[str] = [sid for sid in candidate_ids if sid not in graph_ids]
+
+ if not candidate_ids:
+ log.warning(
+ "looked in %s, found 0 context_intelligence captures",
+ sessions_dir,
+ )
+
+ return rows, DiskScanResult(
+ root=sessions_dir,
+ root_exists=True,
+ disk_only_ids=disk_only_ids,
+ candidate_ids=candidate_ids,
+ )
diff --git a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py
index b71afe19..f012079e 100644
--- a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py
+++ b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py
@@ -165,6 +165,59 @@ async def on_session_ready(coordinator: Any) -> None:
unregister_fns = state["unregister_fns"]
destinations: dict[str, Destination] = state["destinations"]
+ # --- §C.3 mandatory startup consistency check (always-fire, read-only) ---
+ # Compare what the READERS will compute (canonicalized env var, defaulting
+ # when unset) against what the WRITER resolved (canonicalized base_path).
+ # When they disagree the writer and readers target different roots — a silent
+ # split that this check makes LOUD. Never writes os.environ (multiplexed-safe).
+ #
+ # This fires in BOTH directions, covering the two ways relocation can break:
+ # 1. env SET, writer at a different root — binding did not expand, or a
+ # config override fought the env var.
+ # 2. env UNSET, writer NOT at default — someone relocated via
+ # config.base_path / settings.yaml, which the env-only readers CANNOT
+ # see (relocation is reader-visible ONLY via the env var). The earlier
+ # `if _env_raw:` guard missed this case entirely.
+ import os # local import — only this path needs the process env
+
+ from context_intelligence.config import reader_writer_roots_disagree
+
+ _ENV_VAR = "AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH"
+ _env_raw = os.environ.get(_ENV_VAR) # may be None/empty → readers fall to default
+ # Pure, unit-tested divergence core (see tests/test_base_path_parity.py).
+ _disagree, _reader_root, _writer_root = reader_writer_roots_disagree(
+ _env_raw, resolver.base_path
+ )
+ if _disagree:
+ log.warning(
+ "context-intelligence: writer base_path (%s) and reader root (%s) disagree"
+ " — readers (discover, recipe, navigation skills) resolve the root ONLY from"
+ " %s, so captures written under %s will be invisible to them."
+ ' Relocate via the env var (or bind base_path: "${%s:}" in the hook config),'
+ " not via config.base_path alone.",
+ _writer_root,
+ _reader_root,
+ _ENV_VAR,
+ _writer_root,
+ _ENV_VAR,
+ )
+ else:
+ # Positive confirmation at the operator's surface (default level is INFO).
+ # Fires ONLY when relocation is actually in effect, so the operator who
+ # relocated can SEE it took effect — closing the "success and silent
+ # misconfiguration look identical at the moment of action" gap. Stays
+ # silent in the default (non-relocated) case so it adds no noise.
+ from context_intelligence.config import DEFAULT_BASE_PATH
+
+ if _writer_root != DEFAULT_BASE_PATH:
+ log.info(
+ "context-intelligence: capturing to %s"
+ " (readers resolve the same root from %s)."
+ " Relocation is per-process, not per-session.",
+ _writer_root,
+ _ENV_VAR,
+ )
+
# --- Destination selection (C2: working_dir capability ONLY, fail-loud) ---
active: dict[str, Destination] = {}
match_key: str = ""
diff --git a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py
index 413031bd..4f6ed8e1 100644
--- a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py
+++ b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py
@@ -239,7 +239,32 @@ def base_path(self) -> Path:
"""Resolved base path for project storage.
Chain: config['base_path'] → coordinator.config['base_path'] → default.
- Tilde is expanded. Result is cached after first access.
+ Result is cached after first access.
+
+ Canonicalisation rules (§D.2 — identical to reader-side
+ ``canonicalize_base_path`` in ``context_intelligence.config``; inlined
+ here to keep zero hook→reader-package coupling and the fold gate green):
+
+ 0. **Unexpanded-placeholder guard.** A value that still looks like a
+ shell placeholder (``${...}``) means the host app did NOT expand the
+ ``base_path: "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}"`` binding
+ (the hook relies on the app layer to expand ``${VAR}`` before mount,
+ exactly as it does for ``url`` / ``api_key`` / ``exclude_events``).
+ Rather than treat the literal ``${...}`` as a bogus relative path and
+ warn on every session, fall back to the default **silently**. The
+ env-reading readers still relocate via the variable directly, and the
+ §C.3 startup consistency check in ``on_session_ready`` fires LOUD if
+ the variable was actually set (i.e. real relocation was intended but
+ the binding did not expand).
+ 1. Strip whitespace from the raw string.
+ 2. Empty string → ``_DEFAULT_BASE_PATH`` (never anchored to CWD).
+ 3. Expand ``~`` via :meth:`~pathlib.Path.expanduser`.
+ 4. If the result is still relative → warn, fall back to default.
+
+ Both the writer (this property) and readers call the same rules so
+ canonicalized paths are byte-identical regardless of which side resolves
+ them. The §D.2 contract test drives the real property and asserts
+ writer ≡ reader and always absolute.
"""
if self._base_path is None:
raw = (
@@ -247,7 +272,28 @@ def base_path(self) -> Path:
or self._coordinator_config_get("base_path")
or _DEFAULT_BASE_PATH
)
- self._base_path = Path(raw).expanduser()
+ # §D.2 canonicalizer (inline — no os, no import os, fold-gate safe).
+ # DUPLICATED BY DESIGN: the byte-equivalent reader copy is
+ # `canonicalize_base_path` in `context_intelligence.config`. The fold
+ # gate forbids importing that package here, so the two MUST be kept in
+ # sync by hand; `tests/test_base_path_parity.py` pins writer ≡ reader.
+ # Edit one → edit the other and the parity test.
+ s = str(raw).strip()
+ if not s or s.startswith("${"):
+ # Empty, OR an unexpanded ${VAR} placeholder (host app did not
+ # expand the binding). Either way → default, no noise. §D.2 rule 0.
+ self._base_path = Path(_DEFAULT_BASE_PATH).expanduser()
+ else:
+ p = Path(s).expanduser()
+ if not p.is_absolute():
+ log.warning(
+ "base_path %r is not absolute; using default %s",
+ s,
+ _DEFAULT_BASE_PATH,
+ )
+ self._base_path = Path(_DEFAULT_BASE_PATH).expanduser()
+ else:
+ self._base_path = p
return self._base_path
@property
diff --git a/modules/hook-context-intelligence/tests/test_base_path_consistency.py b/modules/hook-context-intelligence/tests/test_base_path_consistency.py
new file mode 100644
index 00000000..d8c8712c
--- /dev/null
+++ b/modules/hook-context-intelligence/tests/test_base_path_consistency.py
@@ -0,0 +1,101 @@
+# pyright: reportMissingImports=false
+# (pytest / amplifier_core are runtime/CI deps not visible to the static checker here.)
+"""End-to-end proof of the §C.3 base_path consistency warning (Restless-Old-Brian gate).
+
+These tests drive the REAL ``on_session_ready`` against the REAL ``amplifier_core``
+runtime (not the resolver in isolation) and assert the loud divergence warning
+actually fires — closing the gap that the writer/consistency branches were only
+ever verified by hand/parity.
+
+NOTE: like the rest of this module's tests, these require ``amplifier_core`` to be
+importable (it is in the Amplifier tool venv / CI, but NOT in the bundle's own
+isolated ``.venv``). They are skipped automatically if the runtime is unavailable.
+"""
+
+from __future__ import annotations
+
+import logging
+
+import pytest
+
+pytest.importorskip("amplifier_core", reason="amplifier_core runtime not installed in this venv")
+
+from tests.helpers import make_lifecycle_coordinator, mount_and_ready # noqa: E402
+
+_ENV_VAR = "AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH"
+
+
+def _messages(caplog) -> str:
+ return "\n".join(r.getMessage() for r in caplog.records)
+
+
+class TestConsistencyWarningEndToEnd:
+ """Real on_session_ready, real runtime — watch the warning fire (or stay silent)."""
+
+ async def test_warns_when_writer_relocated_but_env_unset(self, caplog, monkeypatch) -> None:
+ """Scenario A — the trap: relocate via config.base_path, env var UNSET.
+
+ Readers resolve the root only from the env var (-> default), the writer
+ resolved /tmp/relocated-ci-A. on_session_ready MUST warn LOUD.
+ """
+ monkeypatch.delenv(_ENV_VAR, raising=False)
+ coordinator = make_lifecycle_coordinator()
+ with caplog.at_level(logging.WARNING):
+ await mount_and_ready(coordinator, config={"base_path": "/tmp/relocated-ci-A"})
+ msgs = _messages(caplog)
+ assert "disagree" in msgs, f"expected a divergence warning; got:\n{msgs}"
+ assert "/tmp/relocated-ci-A" in msgs
+
+ async def test_silent_when_env_matches_writer(self, caplog, monkeypatch) -> None:
+ """Scenario C (positive control) — env set to the same root the writer uses → NO warning."""
+ monkeypatch.setenv(_ENV_VAR, "/tmp/relocated-ci-C")
+ coordinator = make_lifecycle_coordinator()
+ with caplog.at_level(logging.WARNING):
+ await mount_and_ready(coordinator, config={"base_path": "/tmp/relocated-ci-C"})
+ assert "disagree" not in _messages(caplog)
+
+ async def test_unexpanded_placeholder_is_silent_default(self, caplog, monkeypatch) -> None:
+ """Scenario B — host did NOT expand the ${VAR} binding, env unset.
+
+ Writer falls back to default SILENTLY (no 'not absolute' noise) and, since
+ readers also default, there is NO divergence warning either.
+ """
+ monkeypatch.delenv(_ENV_VAR, raising=False)
+ coordinator = make_lifecycle_coordinator()
+ with caplog.at_level(logging.WARNING):
+ await mount_and_ready(
+ coordinator,
+ config={"base_path": "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}"},
+ )
+ msgs = _messages(caplog)
+ assert "not absolute" not in msgs, f"unexpected noisy warning:\n{msgs}"
+ assert "disagree" not in msgs, f"unexpected divergence warning:\n{msgs}"
+
+
+class TestPositiveConfirmation:
+ """Operator-visible (default INFO) confirmation of the active capture root."""
+
+ async def test_confirmation_when_relocated_and_consistent(self, caplog, monkeypatch) -> None:
+ """Relocation in effect + reader matches writer → INFO 'capturing to ' fires.
+
+ log_level=INFO mirrors the behavior YAML default (``...:INFO``), i.e. the
+ level a composed-bundle operator actually runs at.
+ """
+ monkeypatch.setenv(_ENV_VAR, "/tmp/relocated-ci-confirm")
+ coordinator = make_lifecycle_coordinator()
+ with caplog.at_level(logging.INFO):
+ await mount_and_ready(
+ coordinator,
+ config={"base_path": "/tmp/relocated-ci-confirm", "log_level": "INFO"},
+ )
+ msgs = _messages(caplog)
+ assert "capturing to" in msgs, f"expected positive confirmation; got:\n{msgs}"
+ assert "/tmp/relocated-ci-confirm" in msgs
+
+ async def test_no_confirmation_in_default_case(self, caplog, monkeypatch) -> None:
+ """No relocation (default root) → stay silent even at INFO, no confirmation noise."""
+ monkeypatch.delenv(_ENV_VAR, raising=False)
+ coordinator = make_lifecycle_coordinator()
+ with caplog.at_level(logging.INFO):
+ await mount_and_ready(coordinator, config={"log_level": "INFO"})
+ assert "capturing to" not in _messages(caplog)
diff --git a/recipes/workflow-pattern-analysis.yaml b/recipes/workflow-pattern-analysis.yaml
index aa08001d..bde52092 100644
--- a/recipes/workflow-pattern-analysis.yaml
+++ b/recipes/workflow-pattern-analysis.yaml
@@ -96,15 +96,30 @@ stages:
condition: "{{prescope}} != ''"
command: |
python3 -c "
- import json, glob, os
- paths = glob.glob(
- os.path.expanduser('~/.amplifier/projects/**/context-intelligence/events.jsonl'),
- recursive=True
- )
+ import json, os
+ from pathlib import Path
+ root = Path(os.environ.get('AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH') or os.path.expanduser('~/.amplifier/projects'))
+ _raw = sorted(root.glob('*/sessions/*/context-intelligence/events.jsonl'))
workspace = '{{workspace}}'
prescope = '{{prescope}}'
+ def _ws_match(p, ws):
+ # Directory-first (fast): slug == workspace. Fall back to the
+ # workspace FIELD in events.jsonl when workspace was set explicitly
+ # and differs from the project slug (matches the navigation skills).
+ if p.parents[3].name == ws:
+ return True
+ try:
+ with open(p) as fh:
+ # Bound the read so a malformed newline-less file can't be slurped whole,
+ # but stay well above any real session:start line so a valid (long) first
+ # event is not truncated into invalid JSON and the session silently dropped.
+ obj = json.loads(fh.readline(1048576))
+ return isinstance(obj, dict) and obj.get('workspace') == ws
+ except Exception:
+ return False
if workspace:
- paths = [p for p in paths if workspace in p]
+ _raw = [p for p in _raw if _ws_match(p, workspace)]
+ paths = [str(p) for p in _raw]
candidates = []
total_scanned = len(paths)
for p in paths:
@@ -169,15 +184,30 @@ stages:
condition: "{{graph_probe.graph_available}} != 'true'"
command: |
python3 -c "
- import json, glob, os
- paths = glob.glob(
- os.path.expanduser('~/.amplifier/projects/**/context-intelligence/events.jsonl'),
- recursive=True
- )
+ import json, os
+ from pathlib import Path
+ root = Path(os.environ.get('AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH') or os.path.expanduser('~/.amplifier/projects'))
+ _raw = sorted(root.glob('*/sessions/*/context-intelligence/events.jsonl'))
workspace = '{{workspace}}'
max_sessions = {{max_sessions}}
+ def _ws_match(p, ws):
+ # Directory-first (fast): slug == workspace. Fall back to the
+ # workspace FIELD in events.jsonl when workspace was set explicitly
+ # and differs from the project slug (matches the navigation skills).
+ if p.parents[3].name == ws:
+ return True
+ try:
+ with open(p) as fh:
+ # Bound the read so a malformed newline-less file can't be slurped whole,
+ # but stay well above any real session:start line so a valid (long) first
+ # event is not truncated into invalid JSON and the session silently dropped.
+ obj = json.loads(fh.readline(1048576))
+ return isinstance(obj, dict) and obj.get('workspace') == ws
+ except Exception:
+ return False
if workspace:
- paths = [p for p in paths if workspace in p]
+ _raw = [p for p in _raw if _ws_match(p, workspace)]
+ paths = [str(p) for p in _raw]
session_ids_str = '{{session_ids}}'
try:
session_ids = json.loads(session_ids_str)
diff --git a/scripts/context-intelligence.py b/scripts/context-intelligence.py
index b297df68..eb00e09c 100644
--- a/scripts/context-intelligence.py
+++ b/scripts/context-intelligence.py
@@ -144,7 +144,14 @@ def cmd_reconstruct(args: argparse.Namespace) -> int:
# ── Discover sessions ─────────────────────────────────────────────────────
log.info("Discovering sessions for workspace %s ...", workspace)
- sessions, disk_only_ids = _ci_discover.discover_sessions(client, workspace, sessions_dir)
+ sessions, scan = _ci_discover.discover_sessions(client, workspace, sessions_dir)
+
+ # Absent root is a distinct failure — not "found zero" (§D.3)
+ if not scan.root_exists:
+ log.error("root not found: %s", scan.root)
+ return 1
+
+ disk_only_ids: list[str] = scan.disk_only_ids
# Filter to a specific session if requested
if args.session:
diff --git a/skills/context-intelligence-session-navigation/SKILL.md b/skills/context-intelligence-session-navigation/SKILL.md
index 15ebe2af..0aa471ba 100644
--- a/skills/context-intelligence-session-navigation/SKILL.md
+++ b/skills/context-intelligence-session-navigation/SKILL.md
@@ -16,26 +16,54 @@ For ready-to-use jq/grep recipes, see `context/safe-extraction-patterns.md`.
## Disk Layout
+Resolve the root once before any discovery:
+
+```bash
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
```
-~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/
+
+```
+$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/
├── events.jsonl # one JSON object per line, append-only
└── metadata.json # session metadata, written on start, updated on end
```
-- `~/.amplifier/projects/` — default base path (configurable via `config.base_path`)
+- `$CONTEXT_INTELLIGENCE_ROOT` — resolved from the idiom `"${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"` (unset → legacy default). **Relocation is reader-visible ONLY via this env var.** The hook's `config.base_path` moves where the *writer* stores captures, but readers (this skill, `discover.py`, the recipe) resolve the root solely from the env var — so always relocate via `AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`, not `config.base_path` alone, or readers will look in the wrong place.
- `{project-slug}` — derived from the full working directory path (see Project Slug Algorithm below)
- `{session_id}` — unique session identifier (UUID or UUID with agent suffix for child sessions)
- `context-intelligence/` — subdirectory containing the session data files
- `events.jsonl` — append-only log of every event the kernel emits during the session
- `metadata.json` — compact session metadata for quick lookup without parsing the full event log
-Example paths:
+Example paths (with root resolved):
```
-~/.amplifier/projects/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/events.jsonl
-~/.amplifier/projects/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/metadata.json
+$CONTEXT_INTELLIGENCE_ROOT/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/events.jsonl
+$CONTEXT_INTELLIGENCE_ROOT/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/metadata.json
```
+> **⚠ MARKER RULE — the defect this prevents:** Every discovery glob MUST include the
+> `context-intelligence/` path segment and MUST NOT stop at `sessions//`:
+>
+> ```
+> CORRECT: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl
+> WRONG: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/metadata.json # catches Amplifier core's files
+> ```
+>
+> **Why:** Amplifier core writes `sessions//metadata.json` with NO `context-intelligence/`
+> segment. Globbing one level too shallow latches onto core's files and produces a confident
+> wrong count.
+>
+> **Canonical marker = `events.jsonl`.** The Python readers (`discover.py`, the workflow
+> recipe) treat `context-intelligence/events.jsonl` as the single discriminator of a real
+> capture. `metadata.json` is used here only to read fields (`workspace`, `status`, …); both
+> files are written together, so either glob includes the `context-intelligence/` segment and
+> avoids the false-positive. When you need a strict capture count that matches the code,
+> glob `events.jsonl`, not `metadata.json`.
+
+> **⚠ FAIL-LOUD RULE:** When zero captures are found, say exactly `"looked in , found 0"` —
+> never report a confident count from a shallower glob, never silently fall back to a different path.
+
---
## Record Format
@@ -106,13 +134,13 @@ Example (child session with optional fields):
| Concept | Purpose | Where used |
|---------|---------|------------|
-| `project_slug` | Directory name under `~/.amplifier/projects/` | On-disk path |
+| `project_slug` | Directory name under `$CONTEXT_INTELLIGENCE_ROOT/` | On-disk path |
| `workspace` | Field in every record | Querying and filtering |
By **default** workspace equals `project_slug` (both derived from the working directory). They can differ when workspace is set explicitly via `settings.yaml` or env var — for example, workspace `"my-api"` while project_slug is `"-home-user-myapp"`.
**Consequence for navigation:**
-- **Directory-first lookup** — when workspace matches the project_slug, all sessions for that workspace live under `~/.amplifier/projects/{workspace}/sessions/`. This is fast and the common case.
+- **Directory-first lookup** — when workspace matches the project_slug, all sessions for that workspace live under `$CONTEXT_INTELLIGENCE_ROOT/{workspace}/sessions/`. This is fast and the common case.
- **Field-based filtering** — when workspace was set explicitly (overriding the slug default), scan across all project directories and filter records by `jq 'select(.workspace == "TARGET")'`.
Always check both: attempt directory lookup first, then fall back to cross-project field scan.
@@ -157,15 +185,21 @@ Session event files can contain lines with 100k+ tokens (e.g., `llm:response` wi
## Common Navigation Patterns
+Resolve the root once before all discovery snippets in this section:
+
+```bash
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+```
+
### Resolve workspace to a project directory
```bash
# If workspace == project_slug (common default), sessions are here:
-ls ~/.amplifier/projects/{workspace}/sessions/
+ls "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/
# If workspace was set explicitly and differs from project_slug,
# find all directories containing sessions tagged with this workspace:
-grep -rl '"workspace":"{workspace}"' ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json \
+grep -rl '"workspace":"{workspace}"' "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/metadata.json \
| sed 's|/context-intelligence/metadata.json||'
```
@@ -173,12 +207,14 @@ grep -rl '"workspace":"{workspace}"' ~/.amplifier/projects/*/sessions/*/context-
```bash
# Fast path: workspace matches directory name (default case)
-for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r '[.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null
done | sort -t$'\t' -k3
# Scoped path: workspace set explicitly — scan all projects, filter by field
-for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.workspace == "my-project") | [.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null
done | sort -t$'\t' -k3
```
@@ -195,7 +231,7 @@ head -1 events.jsonl | jq -r '.workspace'
```bash
# Count events per workspace across all sessions in a project directory:
-jq -r '.workspace' ~/.amplifier/projects/-home-user-myapp/sessions/*/context-intelligence/events.jsonl \
+jq -r '.workspace' "$CONTEXT_INTELLIGENCE_ROOT"/-home-user-myapp/sessions/*/context-intelligence/events.jsonl \
| sort | uniq -c | sort -rn
```
@@ -203,12 +239,14 @@ jq -r '.workspace' ~/.amplifier/projects/-home-user-myapp/sessions/*/context-int
```bash
# Within a single project directory:
-for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.status == "running") | .session_id' "$f" 2>/dev/null
done
# Cross-project, scoped to workspace:
-for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do
+for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do
+ f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling
jq -r 'select(.workspace == "my-project" and .status == "running") | .session_id' "$f" 2>/dev/null
done
```
diff --git a/skills/context-intelligence-session-reconstruction/SKILL.md b/skills/context-intelligence-session-reconstruction/SKILL.md
index 8ed3ecb4..4f6516f8 100644
--- a/skills/context-intelligence-session-reconstruction/SKILL.md
+++ b/skills/context-intelligence-session-reconstruction/SKILL.md
@@ -128,8 +128,9 @@ After reconstruction, verify the output files were written correctly:
**Check line counts for events and transcript:**
```bash
-wc -l ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/events.jsonl
-wc -l ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/transcript.jsonl
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+wc -l "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/events.jsonl
+wc -l "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/transcript.jsonl
```
A reconstructed `events.jsonl` should have at least a few lines (session start, prompts, tool calls). Zero lines indicates the session had no recoverable events.
@@ -137,7 +138,8 @@ A reconstructed `events.jsonl` should have at least a few lines (session start,
**Validate metadata JSON is well-formed:**
```bash
-cat ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/metadata.json | python3 -m json.tool
+CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+cat "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/metadata.json | python3 -m json.tool
```
This confirms the file is valid JSON. Check that `bundle`, `model`, and `session_id` fields are populated.
diff --git a/skills/workflow-pattern-analysis/SKILL.md b/skills/workflow-pattern-analysis/SKILL.md
index 17f6d11e..d7ca08d7 100644
--- a/skills/workflow-pattern-analysis/SKILL.md
+++ b/skills/workflow-pattern-analysis/SKILL.md
@@ -70,7 +70,7 @@ read_file("${SKILL_DIR}/cypher-patterns.md") — Cypher query templates Q1
Also load from the bundle context when querying the graph or scanning JSONL:
```
-@context-intelligence:context/graph-model-reference.md — CI graph schema
+@context-intelligence:context/graph-model-reference.md — context-intelligence graph schema
@context-intelligence:context/jsonl-event-schema.md — on-disk JSONL event schema
@context-intelligence:context/safe-extraction-patterns.md — safe JSONL extraction patterns
```
@@ -129,9 +129,13 @@ Delegate to `context-intelligence:graph-analyst` to:
- **Disk scan is the explicit alternative** when the user needs comprehensive coverage of
ALL on-disk sessions regardless of graph indexing state. Disk scan entry:
```bash
- find ~/.amplifier/projects -path "*/context-intelligence/metadata.json" 2>/dev/null
+ CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"
+ # Enumerate captures by the canonical marker (events.jsonl), matching the Python
+ # readers; read the sibling metadata.json for the format/version filter.
+ find "$CONTEXT_INTELLIGENCE_ROOT" -path "*/context-intelligence/events.jsonl" 2>/dev/null \
+ | while read -r ev; do echo "${ev%/events.jsonl}/metadata.json"; done
```
- Filter to `format == "context-intelligence"` AND `version == "1.0.0"`.
+ Filter to `format == "context-intelligence"` AND `version == "1.0.0"` (from metadata.json).
Follow `@context-intelligence:context/safe-extraction-patterns.md` — never load full
`llm:request` lines.
- Validate coverage: compare graph session count against disk count and inform the user
diff --git a/tests/test_base_path_parity.py b/tests/test_base_path_parity.py
new file mode 100644
index 00000000..d7c698fe
--- /dev/null
+++ b/tests/test_base_path_parity.py
@@ -0,0 +1,106 @@
+# pyright: reportMissingImports=false
+# (pytest is a test-only dep; the hook module is resolved at runtime via the
+# sys.path insert below — neither is visible to the static type checker here.)
+"""Parity + consistency-check tests for relocation base_path (§D.2 / §C.3).
+
+Two **duplicated-by-design** canonicalizers exist because the fold gate forbids the
+hook's ``config_resolver`` from importing the reader package:
+
+ - reader: ``context_intelligence.config.canonicalize_base_path``
+ - writer: ``HookConfigResolver.base_path`` (byte-equivalent inline copy)
+
+These tests PIN ``writer ≡ reader`` so the hand-synced copies cannot drift silently,
+and PIN the §C.3 divergence condition (``reader_writer_roots_disagree``) that the
+hook's ``on_session_ready`` uses to decide whether to warn LOUD. Both were previously
+only verified by hand; this freezes them in CI.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+# The writer-side canonicalizer lives in the hook module package, which is not on
+# the default import path for the root test suite. Add it (the package __init__ is
+# import-clean — handlers / amplifier_core are imported lazily inside functions).
+REPO_ROOT = Path(__file__).parent.parent
+HOOK_MODULE_DIR = REPO_ROOT / "modules" / "hook-context-intelligence"
+if str(HOOK_MODULE_DIR) not in sys.path:
+ sys.path.insert(0, str(HOOK_MODULE_DIR))
+
+from amplifier_module_hook_context_intelligence.config_resolver import ( # noqa: E402
+ HookConfigResolver,
+)
+
+from context_intelligence.config import ( # noqa: E402
+ DEFAULT_BASE_PATH,
+ canonicalize_base_path,
+ reader_writer_roots_disagree,
+)
+
+
+def _writer_root(value: str) -> Path:
+ """Drive the REAL writer property with ``config['base_path'] = value``."""
+ return HookConfigResolver(config={"base_path": value}, coordinator=None).base_path
+
+
+# The input vector that was previously checked by hand — now pinned.
+PARITY_INPUTS = [
+ "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}", # unexpanded placeholder → default
+ "/tmp/relocated", # absolute → used as-is
+ "~/relocated", # tilde → expanded
+ "relative/bad", # relative → default
+ "", # empty → default
+ " ", # whitespace → default
+]
+
+
+class TestWriterReaderParity:
+ """writer ≡ reader for every input shape (the duplication's safety net)."""
+
+ @pytest.mark.parametrize("value", PARITY_INPUTS)
+ def test_writer_equals_reader(self, value: str) -> None:
+ writer = _writer_root(value)
+ reader = canonicalize_base_path(value)
+ assert writer == reader, f"writer/reader drift on {value!r}: {writer} != {reader}"
+
+ @pytest.mark.parametrize("value", PARITY_INPUTS)
+ def test_writer_always_absolute(self, value: str) -> None:
+ assert _writer_root(value).is_absolute()
+
+ def test_unexpanded_placeholder_is_default_silently(self, caplog) -> None:
+ """A literal ``${...}`` (host did not expand) → default, with NO noisy warning."""
+ with caplog.at_level("WARNING"):
+ root = _writer_root("${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}")
+ assert root == DEFAULT_BASE_PATH
+ assert not any("not absolute" in r.getMessage() for r in caplog.records)
+
+
+class TestConsistencyDivergence:
+ """Pins the §C.3 condition consumed by on_session_ready."""
+
+ def test_agree_when_env_matches_writer(self) -> None:
+ disagree, reader, writer = reader_writer_roots_disagree(
+ "/tmp/relocated", Path("/tmp/relocated")
+ )
+ assert disagree is False
+ assert reader == writer
+
+ def test_agree_when_both_default(self) -> None:
+ # env unset (None) and writer at default → consistent.
+ disagree, _reader, _writer = reader_writer_roots_disagree(None, DEFAULT_BASE_PATH)
+ assert disagree is False
+
+ def test_disagree_when_writer_relocated_but_env_unset(self) -> None:
+ # The exact trap fix #1 exists to catch: relocation via config.base_path
+ # with the env var unset — env-only readers cannot see it.
+ disagree, reader, writer = reader_writer_roots_disagree(None, Path("/data/ci"))
+ assert disagree is True
+ assert reader == DEFAULT_BASE_PATH
+ assert writer == Path("/data/ci")
+
+ def test_disagree_when_env_and_writer_differ(self) -> None:
+ disagree, _reader, _writer = reader_writer_roots_disagree("/a/one", Path("/b/two"))
+ assert disagree is True
diff --git a/tests/test_cmd_reconstruct.py b/tests/test_cmd_reconstruct.py
index 34733980..f98db4c5 100644
--- a/tests/test_cmd_reconstruct.py
+++ b/tests/test_cmd_reconstruct.py
@@ -25,6 +25,8 @@
from pathlib import Path
from unittest.mock import MagicMock, patch
+from context_intelligence.reconstruct.discover import DiskScanResult
+
REPO_ROOT = Path(__file__).parent.parent
SCRIPTS_DIR = REPO_ROOT / "scripts"
SCRIPT_PATH = SCRIPTS_DIR / "context-intelligence.py"
@@ -102,7 +104,15 @@ def test_cmd_reconstruct_does_not_raise_not_implemented(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([], []),
+ return_value=(
+ [],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
):
args = _make_args()
@@ -136,7 +146,15 @@ def test_returns_zero_on_no_sessions(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([], []),
+ return_value=(
+ [],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
):
args = _make_args()
@@ -162,7 +180,15 @@ def test_returns_zero_on_successful_processing(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -200,7 +226,15 @@ def test_returns_one_on_extract_errors(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -249,7 +283,15 @@ def test_events_only_calls_extract_events_not_others(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -292,7 +334,15 @@ def test_transcript_only_calls_extract_transcript_not_others(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -335,7 +385,15 @@ def test_metadata_only_calls_extract_metadata_not_others(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -378,7 +436,15 @@ def test_no_only_flags_calls_all_three(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -435,7 +501,15 @@ def test_session_filter_limits_to_matching_sessions(self):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=(sessions, []),
+ return_value=(
+ sessions,
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.events.extract_events",
@@ -500,7 +574,15 @@ def test_skips_events_jsonl_if_exists(self, tmp_path):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.discover.workspace_slug",
@@ -555,7 +637,15 @@ def test_force_overwrites_existing_events_jsonl(self, tmp_path):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.discover.workspace_slug",
@@ -615,7 +705,15 @@ def test_dry_run_does_not_write_files(self, tmp_path):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([mock_session], []),
+ return_value=(
+ [mock_session],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[],
+ candidate_ids=[],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.discover.workspace_slug",
@@ -679,7 +777,15 @@ def test_disk_only_sessions_call_build_disk_only_metadata(self, tmp_path):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([], [disk_only_id]),
+ return_value=(
+ [],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[disk_only_id],
+ candidate_ids=[disk_only_id],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.discover.workspace_slug",
@@ -716,7 +822,15 @@ def test_disk_only_sessions_skipped_when_metadata_only_false(self, tmp_path):
patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")),
patch(
"context_intelligence.reconstruct.discover.discover_sessions",
- return_value=([], [disk_only_id]),
+ return_value=(
+ [],
+ DiskScanResult(
+ root=Path("/tmp/test-project/sessions"),
+ root_exists=True,
+ disk_only_ids=[disk_only_id],
+ candidate_ids=[disk_only_id],
+ ),
+ ),
),
patch(
"context_intelligence.reconstruct.discover.workspace_slug",
diff --git a/tests/test_reconstruct_discover.py b/tests/test_reconstruct_discover.py
index 22b13139..ded59555 100644
--- a/tests/test_reconstruct_discover.py
+++ b/tests/test_reconstruct_discover.py
@@ -1,14 +1,16 @@
"""Tests for context_intelligence.reconstruct.discover (task-8).
Covers:
-- Module imports correctly (workspace_slug, sessions_dir_for_project, discover_sessions)
-- workspace_slug() converts absolute path to slug (replacing / with -)
-- sessions_dir_for_project() returns AMPLIFIER_DIR/'projects'/slug/'sessions'
-- discover_sessions() queries graph for sessions in workspace
-- discover_sessions() returns graph rows and disk-only session IDs
-- discover_sessions() skips subsession directories starting with 0000000000000000
-- discover_sessions() skips graph sessions from disk-only list
-- Imports: CIClient from client, AMPLIFIER_DIR from config
+- Module imports correctly (workspace_slug, sessions_dir_for_project, discover_sessions,
+ DiskScanResult)
+- workspace_slug() converts absolute path to slug (replacing / with -).
+- sessions_dir_for_project() returns DEFAULT_BASE_PATH/slug/'sessions'.
+- discover_sessions() queries graph for sessions in workspace.
+- discover_sessions() returns graph rows and a DiskScanResult.
+- discover_sessions() uses the events.jsonl marker as the sole capture discriminator.
+- discover_sessions() INCLUDES subsession directories (0000…) — §D.1 subsession inclusion.
+- discover_sessions() returns root_exists=False for absent roots — §D.3 absent-root guard.
+- Imports: CIClient from client, AMPLIFIER_DIR from config.
"""
from __future__ import annotations
@@ -18,6 +20,22 @@
from unittest.mock import MagicMock
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_capture(sessions_dir: Path, session_id: str) -> None:
+ """Create the canonical capture structure for *session_id* under *sessions_dir*.
+
+ A qualifying capture requires:
+ ``sessions_dir//context-intelligence/events.jsonl``
+ """
+ cap_dir = sessions_dir / session_id / "context-intelligence"
+ cap_dir.mkdir(parents=True, exist_ok=True)
+ (cap_dir / "events.jsonl").write_text("")
+
+
class TestImport:
"""Module must be importable with the required public API."""
@@ -33,9 +51,14 @@ def test_discover_sessions_import(self):
"""discover_sessions must be importable from context_intelligence.reconstruct.discover."""
from context_intelligence.reconstruct.discover import discover_sessions # noqa: F401
+ def test_disk_scan_result_import(self):
+ """DiskScanResult must be importable from context_intelligence.reconstruct.discover."""
+ from context_intelligence.reconstruct.discover import DiskScanResult # noqa: F401
+
def test_acceptance_criteria_command(self):
"""Simulate the acceptance criteria import command."""
from context_intelligence.reconstruct.discover import (
+ DiskScanResult,
discover_sessions,
sessions_dir_for_project,
workspace_slug,
@@ -44,6 +67,7 @@ def test_acceptance_criteria_command(self):
assert workspace_slug is not None
assert sessions_dir_for_project is not None
assert discover_sessions is not None
+ assert DiskScanResult is not None
def test_uses_ciclient_from_client(self):
"""CIClient must be importable from the client module (imported dependency)."""
@@ -157,6 +181,15 @@ def test_returns_tuple(self):
assert isinstance(result, tuple)
assert len(result) == 2
+ def test_second_element_is_disk_scan_result(self):
+ """Second element is a DiskScanResult, not a bare list."""
+ from context_intelligence.reconstruct.discover import DiskScanResult, discover_sessions
+
+ client = self._make_client([])
+ with tempfile.TemporaryDirectory() as tmpdir:
+ _, scan = discover_sessions(client, "test-workspace", Path(tmpdir))
+ assert isinstance(scan, DiskScanResult)
+
def test_returns_graph_rows(self):
"""First element is the list of rows returned by client.cypher()."""
from context_intelligence.reconstruct.discover import discover_sessions
@@ -171,7 +204,7 @@ def test_returns_graph_rows(self):
]
client = self._make_client(fake_rows)
with tempfile.TemporaryDirectory() as tmpdir:
- graph_sessions, disk_only = discover_sessions(client, "test-workspace", Path(tmpdir))
+ graph_sessions, scan = discover_sessions(client, "test-workspace", Path(tmpdir))
assert graph_sessions == fake_rows
def test_calls_cypher_with_workspace(self):
@@ -211,7 +244,7 @@ def test_cypher_query_ordered_by_started_at(self):
assert "ORDER BY s.started_at" in query_arg
def test_disk_only_includes_dirs_not_in_graph(self):
- """Directories on disk not in graph_ids are returned as disk_only_ids."""
+ """Directories with events.jsonl on disk but not in graph_ids appear in disk_only_ids."""
from context_intelligence.reconstruct.discover import discover_sessions
fake_rows = [
@@ -225,62 +258,164 @@ def test_disk_only_includes_dirs_not_in_graph(self):
client = self._make_client(fake_rows)
with tempfile.TemporaryDirectory() as tmpdir:
sessions_dir = Path(tmpdir)
- # Create a session dir that is on disk but not in graph
- (sessions_dir / "disk-only-session").mkdir()
- # Create a session dir that is also in graph
- (sessions_dir / "graph-session-1").mkdir()
+ # Create a capture that is on disk but not in graph
+ _make_capture(sessions_dir, "disk-only-session")
+ # Create a capture that is also in graph
+ _make_capture(sessions_dir, "graph-session-1")
+
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
+
+ assert "disk-only-session" in scan.disk_only_ids
+ assert "graph-session-1" not in scan.disk_only_ids
+
+ def test_candidate_ids_includes_graph_sessions(self):
+ """candidate_ids is the full set regardless of graph membership."""
+ from context_intelligence.reconstruct.discover import discover_sessions
+
+ fake_rows = [
+ {
+ "s.node_id": "graph-sess",
+ "s.status": "completed",
+ "s.started_at": "2024-01-01",
+ "s.ended_at": "2024-01-02",
+ },
+ ]
+ client = self._make_client(fake_rows)
+ with tempfile.TemporaryDirectory() as tmpdir:
+ sessions_dir = Path(tmpdir)
+ _make_capture(sessions_dir, "disk-only-session")
+ _make_capture(sessions_dir, "graph-sess")
- _, disk_only = discover_sessions(client, "test-workspace", sessions_dir)
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
- assert "disk-only-session" in disk_only
- assert "graph-session-1" not in disk_only
+ assert "disk-only-session" in scan.candidate_ids
+ assert "graph-sess" in scan.candidate_ids
+ assert len(scan.candidate_ids) == 2
+ assert len(scan.disk_only_ids) == 1
- def test_disk_only_skips_subsession_dirs(self):
- """Directories starting with 0000000000000000 are skipped (subsessions)."""
+ def test_disk_only_INCLUDES_subsession_dirs(self):
+ """Subsession directories (0000… prefix) ARE included — §D.1 subsession inclusion.
+
+ The old 0000… skip silently dropped real captures; this test pins the
+ corrected behaviour: subsessions have captures and must be counted.
+ """
from context_intelligence.reconstruct.discover import discover_sessions
client = self._make_client([])
with tempfile.TemporaryDirectory() as tmpdir:
sessions_dir = Path(tmpdir)
- # Subsession directory - should be skipped
- (sessions_dir / "0000000000000000-abc123_some-agent").mkdir()
- # Normal session directory - should be included
- (sessions_dir / "real-session-abc").mkdir()
+ # Subsession capture — MUST be included
+ _make_capture(sessions_dir, "0000000000000000-abc123_some-agent")
+ # Normal root-session capture — also included
+ _make_capture(sessions_dir, "real-session-abc")
- _, disk_only = discover_sessions(client, "test-workspace", sessions_dir)
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
- assert "0000000000000000-abc123_some-agent" not in disk_only
- assert "real-session-abc" in disk_only
+ assert "0000000000000000-abc123_some-agent" in scan.disk_only_ids
+ assert "real-session-abc" in scan.disk_only_ids
- def test_disk_only_empty_when_no_sessions_dir(self):
- """When sessions_dir does not exist, disk_only_ids is empty."""
+ def test_bare_dir_without_events_jsonl_excluded(self):
+ """A directory without context-intelligence/events.jsonl is NOT a capture."""
+ from context_intelligence.reconstruct.discover import discover_sessions
+
+ client = self._make_client([])
+ with tempfile.TemporaryDirectory() as tmpdir:
+ sessions_dir = Path(tmpdir)
+ # Bare dir — no events.jsonl
+ (sessions_dir / "bare-dir").mkdir()
+ # Capture with correct marker
+ _make_capture(sessions_dir, "real-capture")
+
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
+
+ assert "bare-dir" not in scan.disk_only_ids
+ assert "bare-dir" not in scan.candidate_ids
+ assert "real-capture" in scan.disk_only_ids
+
+ def test_amplifier_core_bait_dir_excluded(self):
+ """Amplifier-core sessions//metadata.json (no context-intelligence/) are excluded."""
+ from context_intelligence.reconstruct.discover import discover_sessions
+
+ client = self._make_client([])
+ with tempfile.TemporaryDirectory() as tmpdir:
+ sessions_dir = Path(tmpdir)
+ # Amplifier-core bait: has metadata.json but no context-intelligence/events.jsonl
+ bait_dir = sessions_dir / "core-session-bait"
+ bait_dir.mkdir()
+ (bait_dir / "metadata.json").write_text("{}")
+ # Capture with correct marker
+ _make_capture(sessions_dir, "real-capture")
+
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
+
+ assert "core-session-bait" not in scan.candidate_ids
+ assert "real-capture" in scan.candidate_ids
+
+ def test_absent_root_returns_root_exists_false(self):
+ """When sessions_dir does not exist, root_exists is False — §D.3 absent-root guard."""
from context_intelligence.reconstruct.discover import discover_sessions
client = self._make_client([])
nonexistent = Path("/tmp/this-path-should-not-exist-8675309")
- _, disk_only = discover_sessions(client, "test-workspace", nonexistent)
- assert disk_only == []
+ _, scan = discover_sessions(client, "test-workspace", nonexistent)
+
+ assert scan.root_exists is False
+ assert scan.disk_only_ids == []
+ assert scan.candidate_ids == []
+
+ def test_absent_root_is_distinct_from_found_zero(self):
+ """root_exists=False is distinct from root_exists=True with no captures."""
+ from context_intelligence.reconstruct.discover import discover_sessions
+
+ client = self._make_client([])
+
+ # Absent root
+ nonexistent = Path("/tmp/this-path-should-not-exist-8675309")
+ _, absent_scan = discover_sessions(client, "test-workspace", nonexistent)
+
+ # Existing root but no captures
+ with tempfile.TemporaryDirectory() as tmpdir:
+ _, empty_scan = discover_sessions(client, "test-workspace", Path(tmpdir))
+
+ assert absent_scan.root_exists is False
+ assert empty_scan.root_exists is True
+ assert empty_scan.disk_only_ids == []
+
+ def test_found_zero_root_exists_true(self):
+ """An existing root with no captures: root_exists=True and empty lists."""
+ from context_intelligence.reconstruct.discover import discover_sessions
+
+ client = self._make_client([])
+ with tempfile.TemporaryDirectory() as tmpdir:
+ _, scan = discover_sessions(client, "test-workspace", Path(tmpdir))
+
+ assert scan.root_exists is True
+ assert scan.disk_only_ids == []
+ assert scan.candidate_ids == []
def test_disk_only_is_list_of_strings(self):
- """disk_only_ids is a list of strings (directory names)."""
+ """disk_only_ids is a list of strings (session IDs)."""
from context_intelligence.reconstruct.discover import discover_sessions
client = self._make_client([])
with tempfile.TemporaryDirectory() as tmpdir:
sessions_dir = Path(tmpdir)
- (sessions_dir / "some-session").mkdir()
- _, disk_only = discover_sessions(client, "test-workspace", sessions_dir)
- assert isinstance(disk_only, list)
- assert all(isinstance(s, str) for s in disk_only)
+ _make_capture(sessions_dir, "some-session")
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
+
+ assert isinstance(scan.disk_only_ids, list)
+ assert all(isinstance(s, str) for s in scan.disk_only_ids)
def test_empty_sessions_dir(self):
- """An empty sessions_dir returns no disk_only_ids."""
+ """An empty sessions_dir returns empty lists (root_exists=True)."""
from context_intelligence.reconstruct.discover import discover_sessions
client = self._make_client([])
with tempfile.TemporaryDirectory() as tmpdir:
- _, disk_only = discover_sessions(client, "test-workspace", Path(tmpdir))
- assert disk_only == []
+ _, scan = discover_sessions(client, "test-workspace", Path(tmpdir))
+ assert scan.root_exists is True
+ assert scan.disk_only_ids == []
+ assert scan.candidate_ids == []
def test_files_in_sessions_dir_not_included(self):
"""Regular files (not directories) in sessions_dir are not included."""
@@ -290,5 +425,6 @@ def test_files_in_sessions_dir_not_included(self):
with tempfile.TemporaryDirectory() as tmpdir:
sessions_dir = Path(tmpdir)
(sessions_dir / "not-a-session.txt").write_text("hello")
- _, disk_only = discover_sessions(client, "test-workspace", sessions_dir)
- assert disk_only == []
+ _, scan = discover_sessions(client, "test-workspace", sessions_dir)
+ assert scan.disk_only_ids == []
+ assert scan.candidate_ids == []