diff --git a/agents/session-navigator.md b/agents/session-navigator.md index 19a4ba64..7bc35ee0 100644 --- a/agents/session-navigator.md +++ b/agents/session-navigator.md @@ -8,7 +8,7 @@ meta: description: | MUST NOT be invoked directly by external callers. ALWAYS delegated to by graph-analyst when the graph server is unreachable or returns 0 sessions. - Local fallback agent for navigating session data via flat JSONL files using bash/jq/grep safe extraction patterns. Handles session discovery, event search, and session navigation across ~/.amplifier/projects/ when the context-intelligence graph server is unavailable. + Local fallback agent for navigating session data via flat JSONL files using bash/jq/grep safe extraction patterns. Handles session discovery, event search, and session navigation under the root resolved from `CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"` when the context-intelligence graph server is unavailable. This agent is NOT called directly by external callers. It is only delegated to by graph-analyst when the graph server is unreachable or returns 0 sessions. External callers should use graph-analyst instead. @@ -17,7 +17,7 @@ meta: Context: Graph analyst delegating because server is unreachable user: [graph-analyst delegates] 'Find tool errors in session abc123 — graph server is unreachable. Workspace: my-project' - assistant: 'I will scope search to workspace my-project. I will look in ~/.amplifier/projects/my-project/sessions/ first, then filter by workspace field if needed. I will search for tool errors using safe jq extraction patterns.' + assistant: 'I will scope search to workspace my-project. I will first resolve CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}", then look in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/ first, then filter by workspace field if needed. I will search for tool errors using safe jq extraction patterns.' session-navigator receives workspace from graph-analyst and uses it to scope all directory lookups and field filters. External callers should never invoke session-navigator directly. @@ -113,13 +113,48 @@ You are `session-navigator` — the local JSONL fallback navigation agent for th **No server tools:** You do NOT have `graph_query` or `blob_read` tools. You operate entirely on local filesystem files using bash/jq/grep safe extraction patterns. Never attempt to use server tools — they are not available in your tool set. -**Storage path convention:** All session data lives at: +**Root resolution — MANDATORY FIRST STEP before any discovery:** + +Resolve the root once at the start of every context_intelligence session navigation operation: + +```bash +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +``` + +The on-disk layout is: ``` -~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/events.jsonl -~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/metadata.json +$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/events.jsonl +$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/metadata.json ``` +> **⛔ MARKER RULE — the defect this fixes:** Every discovery glob MUST include the +> `context-intelligence/` path segment and MUST NOT stop at `sessions//`: +> +> ``` +> CORRECT: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl +> WRONG: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/metadata.json # catches Amplifier core's files +> ``` +> +> **Why:** Amplifier core writes `sessions//metadata.json` with NO `context-intelligence/` +> segment. Globbing one level too shallow latches onto core's files and produces a confident +> wrong count. +> +> **Canonical marker = `events.jsonl`.** The Python readers (`discover.py`, the workflow recipe) +> treat `context-intelligence/events.jsonl` as the single discriminator of a real capture. +> `metadata.json` is read here only for its fields; both files are written together, so either +> glob keeps the `context-intelligence/` segment and avoids the false-positive. For a strict +> capture count that matches the code, glob `events.jsonl`. + +> **⛔ RELOCATION RULE:** The root comes ONLY from `AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH` +> (unset → `$HOME/.amplifier/projects`). The hook's `config.base_path` moves the *writer* but +> NOT these readers — if a capture seems missing, confirm the writer was relocated via the env +> var, not `config.base_path` alone. + +> **⛔ FAIL-LOUD RULE:** When zero captures are found, say exactly `"looked in , found 0"` — +> never report a confident count from a shallower glob, never silently fall back to a different +> path. + Every `events.jsonl` line and every `metadata.json` file contains a `workspace` field. The graph-analyst will pass the active workspace when it delegates to you. **Always scope your search to that workspace.** ### Workspace Scoping — Do This First @@ -129,15 +164,23 @@ When a workspace is provided by the caller, apply it immediately before any othe **Step 1 — Try directory-first lookup** (fast, covers the common case where workspace equals the project slug): ```bash -ls ~/.amplifier/projects/{WORKSPACE}/sessions/ 2>/dev/null +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +ls "$CONTEXT_INTELLIGENCE_ROOT"/{WORKSPACE}/sessions/ 2>/dev/null ``` -If this directory exists and contains sessions, work within it exclusively. +> **Guard:** A `sessions//` directory entry only counts as a context_intelligence capture +> when `sessions//context-intelligence/` also exists. `ls sessions/` may list directories +> from Amplifier core with no `context-intelligence/` subdir — do not count those as +> context_intelligence sessions. + +If this directory exists and contains sessions with `context-intelligence/` subdirs, work within it exclusively. **Step 2 — If that directory is empty or missing**, the workspace was set explicitly and differs from the project slug. Scan across all project directories and filter by the `workspace` field in `metadata.json`: ```bash -for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.workspace == "{WORKSPACE}") | input_filename' "$f" 2>/dev/null done ``` @@ -155,26 +198,35 @@ done Find sessions by ID, project slug, date, or agent name, always scoped to the provided workspace. ```bash +# Resolve root first (required before any snippet below) +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" + # List sessions in a workspace (directory-first path) -for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r '[.session_id, .workspace, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null done | sort -t$'\t' -k4 # List sessions scoped by workspace field (cross-project scan) -for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.workspace == "my-project") | [.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null done | sort -t$'\t' -k3 # Find a session by partial ID (within a workspace) -find ~/.amplifier/projects/my-project/sessions -maxdepth 1 -name "*PARTIAL_ID*" -type d +# NOTE: a sessions// directory only counts as a context_intelligence capture when +# sessions//context-intelligence/ also exists. Always confirm the subdir: +find "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions -maxdepth 1 -name "*PARTIAL_ID*" -type d \ + | while read -r d; do [ -d "$d/context-intelligence" ] && echo "$d"; done # Find sessions by agent name within a workspace -for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.agent_name == "TARGET_AGENT") | .session_id' "$f" 2>/dev/null done # Confirm the workspace of a specific session -jq -r '.workspace' ~/.amplifier/projects/my-project/sessions/SESSION_ID/context-intelligence/metadata.json +jq -r '.workspace' "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/SESSION_ID/context-intelligence/metadata.json ``` ### Event Search @@ -207,12 +259,16 @@ wc -l < events.jsonl Trace parent-child chains via `parent_id`, trace delegation trees via `delegate:agent_spawned`/`delegate:agent_completed`. ```bash +# Resolve root first (required before any snippet below) +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" + # Check if session is root or child, and confirm its workspace jq -r '{parent_id, workspace, status}' metadata.json # Find child sessions within a workspace PARENT_ID="YOUR_SESSION_ID_HERE" -for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r "select(.parent_id == \"$PARENT_ID\") | [.session_id, .agent_name // \"(root)\", .status, .workspace] | join(\"\t\")" "$f" 2>/dev/null done @@ -237,8 +293,9 @@ Since session-navigator is active when no server is configured, you must locate 2. Or read from bundle config YAML under `hook-context-intelligence.config`: `context_intelligence_server_url` and `context_intelligence_api_key` ```bash +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" context-intelligence-upload \ - --path ~/.amplifier/projects/my-project \ + --path "$CONTEXT_INTELLIGENCE_ROOT"/my-project \ --server-url "https://your-server.example.com" \ --api-key "your-api-key" ``` diff --git a/behaviors/context-intelligence-logging.yaml b/behaviors/context-intelligence-logging.yaml index 58851f87..0fa45e78 100644 --- a/behaviors/context-intelligence-logging.yaml +++ b/behaviors/context-intelligence-logging.yaml @@ -40,7 +40,30 @@ hooks: - delegate:agent_completed - delegate:agent_cancelled - delegate:error - # base_path: ~/.amplifier/projects (auto-resolved; uncomment to override) + # base_path — relocation root for ALL captures. + # + # This binding is how relocation reaches the readers too: discover.py, the + # workflow recipe, and the navigation skills resolve their root ONLY from + # the AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH env var. Binding base_path to + # the same var keeps the writer and readers in lock-step. + # + # REQUIRES the host app to expand ${VAR:default} placeholders before mount + # (the hook does not self-expand — same contract as url/api_key above). The + # Amplifier app-cli does this. On a host that does NOT expand the ${VAR:} + # (colon-empty-default) form, the unexpanded literal falls back to the + # default SILENTLY, and the §C.3 consistency check in on_session_ready warns + # LOUD if the var was actually set — so a broken binding is never silent. + # + # ⚠ Do NOT relocate by hard-coding a path here instead of the env var: + # the writer would move but the env-only readers would not follow, and they + # would report "found 0". Always relocate via the env var. + # + # ⚠ Relocation granularity is PER-PROCESS, not per-session: every session in + # one process shares the one env value. A host that needs different roots per + # session must use separate processes. At session start the hook logs (INFO) + # the active capture root whenever relocation is in effect, so you can confirm + # it took effect. + base_path: "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}" # project_slug: (auto-resolved from working directory; uncomment to override) # exclude_events: [] (optional fnmatch patterns; uncomment and list events to suppress) # diff --git a/bundle.dot b/bundle.dot index 2fdb4366..b9ddc67d 100644 --- a/bundle.dot +++ b/bundle.dot @@ -8,7 +8,7 @@ digraph context_intelligence { nodesep=0.6 ranksep=0.7 bgcolor="white" - source_hash="ff1d6a9692415633ceafa3389694c17aa82f8f3d8296b1f7c440c9fb815ea5d2" + source_hash="18e3046d9999d7c21d8a948dbb2f0e83ae35c8b72a73e2e2793227571f9c088c" node [fontname="Helvetica", fontsize=11, style="filled,rounded"] edge [fontname="Helvetica", fontsize=9] @@ -23,8 +23,8 @@ digraph context_intelligence { beh_context_intelligence_analysis_behavior [label="context-intelligence-analysis-behavior\n1 tools\n~864 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] beh_context_intelligence_design_behavior [label="context-intelligence-design-behavior\n1 tools\n~331 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] - beh_context_intelligence_logging_behavior [label="context-intelligence-logging-behavior\n1 tools\n~1184 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] - beh_context_intelligence_navigation_behavior [label="context-intelligence-navigation-behavior\n2 tools\n~561 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] + beh_context_intelligence_logging_behavior [label="context-intelligence-logging-behavior\n1 tools\n~1661 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] + beh_context_intelligence_navigation_behavior [label="context-intelligence-navigation-behavior\n2 tools\n~617 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] beh_context_intelligence_behavior [label="context-intelligence-behavior\n~236 tok", shape=box, fillcolor="#e0f2f1", style="filled,rounded"] } @@ -37,7 +37,7 @@ digraph context_intelligence { agt_context_intelligence_design_facilitator [label="context-intelligence-design-facilitator\n~187 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"] agt_context_intelligence_tool_designer [label="context-intelligence-tool-designer\n~198 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"] agt_graph_analyst [label="graph-analyst\n~543 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"] - agt_session_navigator [label="session-navigator\n~366 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"] + agt_session_navigator [label="session-navigator\n~422 tok desc", shape=box, fillcolor="#c8e6c9", style="filled,rounded"] } subgraph cluster_modules { diff --git a/bundle.png b/bundle.png index 9723a87d..64b0a5aa 100644 Binary files a/bundle.png and b/bundle.png differ diff --git a/context_intelligence/__init__.py b/context_intelligence/__init__.py index 007f706e..993a42fd 100644 --- a/context_intelligence/__init__.py +++ b/context_intelligence/__init__.py @@ -25,6 +25,7 @@ resolve_config, ) from context_intelligence.reconstruct import ( + DiskScanResult, build_disk_only_metadata, discover_sessions, extract_events, @@ -45,6 +46,7 @@ "extract_transcript", "extract_metadata", "build_disk_only_metadata", + "DiskScanResult", "discover_sessions", "workspace_slug", "sessions_dir_for_project", diff --git a/context_intelligence/config.py b/context_intelligence/config.py index d53bdcb5..3efc4574 100644 --- a/context_intelligence/config.py +++ b/context_intelligence/config.py @@ -29,6 +29,10 @@ AMPLIFIER_DIR = Path.home() / ".amplifier" SETTINGS_PATH = AMPLIFIER_DIR / "settings.yaml" +#: The ONE canonical reader-side default root for context-intelligence captures. +#: All readers fall back to this when AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH is unset. +DEFAULT_BASE_PATH = Path.home() / ".amplifier" / "projects" + # --------------------------------------------------------------------------- # Shared env-var helpers (used by HookConfigResolver and ToolConfigResolver) @@ -50,6 +54,141 @@ def _env(suffix: str) -> str | None: return value if value else None +def canonicalize_base_path(raw: str | Path | None) -> Path: + """Canonicalise a raw base-path value to a **guaranteed absolute** :class:`Path`. + + Four rules applied in order (§D.2): + + 1. Convert to string and strip whitespace. ``None`` → empty string. + 2. Empty string → :data:`DEFAULT_BASE_PATH` (never anchored to CWD). + 3. Expand ``~`` via :meth:`~pathlib.Path.expanduser`. + 4. If the result is still relative → warn and fall back to + :data:`DEFAULT_BASE_PATH`. Relative paths are invalid: each OS process + has its own CWD, so a relative root produces *different directories* for + different processes even when the string is byte-identical. + + No ``os.path.normpath`` or CWD-anchoring — pathlib already drops trailing + slashes; absoluteness, not normalisation, is the load-bearing property. + + .. important:: **Duplicated by design.** The fold gate forbids the hook's + ``config_resolver.py`` from importing this package, so the SAME rules are + inlined in ``HookConfigResolver.base_path``. The two copies MUST stay + byte-equivalent; ``tests/test_base_path_parity.py`` pins writer ≡ reader. + If you edit one, edit the other and the parity test. + + Parameters + ---------- + raw: + A raw string, :class:`~pathlib.Path`, or ``None``. + + Returns + ------- + Path + An absolute :class:`~pathlib.Path`. Never relative, never empty. + """ + s = str(raw).strip() if raw is not None else "" + if not s: + return DEFAULT_BASE_PATH + p = Path(s).expanduser() + if not p.is_absolute(): + log.warning( + "base_path %r is not absolute; using default %s", + s, + DEFAULT_BASE_PATH, + ) + return DEFAULT_BASE_PATH + return p + + +def context_intelligence_base_path() -> Path: + """Reader-side root for context-intelligence captures. + + Reads ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` from the environment via + :func:`_env` (which returns ``None`` for both unset and empty) and passes the + result through :func:`canonicalize_base_path`, which guarantees an absolute + path and falls back to :data:`DEFAULT_BASE_PATH` for empty or relative values. + + Mirrors the shell idiom:: + + ${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-~/.amplifier/projects} + + This helper is **gate-safe**: it lives in ``config.py``, which already imports + ``os``. It does **not** touch ``config_resolver.py`` (fold-discipline gate). + """ + return canonicalize_base_path(_env("BASE_PATH")) + + +def reader_writer_roots_disagree( + env_raw: str | None, + writer_base_path: str | Path, +) -> tuple[bool, Path, Path]: + """Compare the reader root against the writer root (§C.3 consistency check). + + Pure, side-effect-free core of the startup consistency check in the hook's + ``on_session_ready``. Extracted here (rather than left inline) so the + divergence condition is **unit-testable** without importing the hook package + (which needs ``amplifier_core`` at import time). + + Both operands pass through the SAME :func:`canonicalize_base_path`, so the + comparison is symmetric: a relocated *writer* whose root the env-only readers + cannot see produces ``disagree=True``. + + Parameters + ---------- + env_raw: + The raw ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` value (``None`` when + unset/empty) — exactly what every reader resolves from. + writer_base_path: + The writer's resolved ``base_path`` (e.g. ``resolver.base_path``). + + Returns + ------- + tuple[bool, Path, Path] + ``(disagree, reader_root, writer_root)``. ``disagree`` is ``True`` when + the canonicalized roots differ — the caller should then warn LOUD. + """ + reader_root = canonicalize_base_path(env_raw) + writer_root = canonicalize_base_path(str(writer_base_path)) + return reader_root != writer_root, reader_root, writer_root + + +# --------------------------------------------------------------------------- +# Shared capture-path helpers (canonical capture definition — §D.1) +# --------------------------------------------------------------------------- + +#: Fixed-shape glob (relative to a ``sessions/`` directory) that matches +#: exactly the files the writer produces. One capture = +#: ``//context-intelligence/events.jsonl``. +#: +#: The ``events.jsonl`` **file** is the discriminator — a bare +#: ``context-intelligence/`` directory without the file is not a recoverable +#: capture and must not be counted. Amplifier core's +#: ``sessions//metadata.json`` has no ``context-intelligence/`` segment and +#: is excluded by construction. +CAPTURE_GLOB = "*/context-intelligence/events.jsonl" + + +def capture_paths_under_sessions_dir(sessions_dir: Path) -> list[Path]: + """Return all capture paths under a project ``sessions/`` directory. + + Uses the fixed-shape :data:`CAPTURE_GLOB` — **not** a recursive ``**`` + glob — so only the writer's real output layout is matched. + + Parameters + ---------- + sessions_dir: + The ``//sessions`` directory to scan. + + Returns + ------- + list[Path] + Sorted list of ``events.jsonl`` file paths, one per qualifying session + (including subsessions, which are flat siblings under ``sessions/``). + ``session_id`` for any path ``p`` is ``p.parent.parent.name``. + """ + return sorted(sessions_dir.glob(CAPTURE_GLOB)) + + # --------------------------------------------------------------------------- # Shell-style placeholder expander (used by ToolConfigResolver) # --------------------------------------------------------------------------- diff --git a/context_intelligence/reconstruct/__init__.py b/context_intelligence/reconstruct/__init__.py index e396d32c..51ece5d8 100644 --- a/context_intelligence/reconstruct/__init__.py +++ b/context_intelligence/reconstruct/__init__.py @@ -33,6 +33,7 @@ from __future__ import annotations from context_intelligence.reconstruct.discover import ( + DiskScanResult, discover_sessions, sessions_dir_for_project, workspace_slug, @@ -49,6 +50,7 @@ "extract_transcript", "extract_metadata", "build_disk_only_metadata", + "DiskScanResult", "discover_sessions", "workspace_slug", "sessions_dir_for_project", diff --git a/context_intelligence/reconstruct/discover.py b/context_intelligence/reconstruct/discover.py index e0c73156..8337b062 100644 --- a/context_intelligence/reconstruct/discover.py +++ b/context_intelligence/reconstruct/discover.py @@ -3,7 +3,7 @@ Discovers sessions for a workspace by querying the context-intelligence graph and scanning the local filesystem for sessions not yet in the graph. -Level 2 — Network I/O (queries the CI graph server via CIClient). +Level 2 — Network I/O (queries the context-intelligence graph server via CIClient). Extracted from prototype scripts/ci-reconstruct-sessions.py (lines 104-111, 1051-1084). """ @@ -12,10 +12,14 @@ import logging import os +from dataclasses import dataclass, field from pathlib import Path from context_intelligence.client import CIClient -from context_intelligence.config import AMPLIFIER_DIR +from context_intelligence.config import ( + capture_paths_under_sessions_dir, + context_intelligence_base_path, +) log = logging.getLogger("context_intelligence.reconstruct.discover") @@ -51,9 +55,13 @@ def workspace_slug(project_dir: str) -> str: def sessions_dir_for_project(project_dir: str) -> Path: """Return the sessions directory for a project. - Builds the path: ``AMPLIFIER_DIR / 'projects' / slug / 'sessions'``, + Builds the path: ``context_intelligence_base_path() / slug / 'sessions'``, where *slug* is derived via :func:`workspace_slug`. + The base path honours ``AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`` when + set (and guarantees an absolute result via the canonicalizer); falls back + to :data:`~context_intelligence.config.DEFAULT_BASE_PATH`. + Parameters ---------- project_dir: @@ -65,18 +73,70 @@ def sessions_dir_for_project(project_dir: str) -> Path: Path to the sessions directory for the project. """ slug = workspace_slug(project_dir) - return AMPLIFIER_DIR / "projects" / slug / "sessions" + return context_intelligence_base_path() / slug / "sessions" + + +@dataclass(frozen=True) +class DiskScanResult: + """Tagged result from a disk scan — prevents silent misreading of absent roots. + + The two distinct fail-loud states ride the **return value**, not only the + log, so callers must branch on :attr:`root_exists` rather than treating an + empty :attr:`disk_only_ids` as "success". + + Attributes + ---------- + root: + The ``sessions/`` directory that was (or was attempted to be) scanned. + root_exists: + ``True`` when ``root`` existed at scan time; ``False`` when the directory + was absent or was a typo'd / missing relocated root. A ``False`` value + means **the scan was impossible** — it is NOT equivalent to + "found zero captures". + disk_only_ids: + Session IDs present on disk (via the canonical + ``events.jsonl`` marker) but absent from the graph. + Computed as ``candidate_ids − graph_ids``. + Empty list when ``root_exists`` is ``False``. + candidate_ids: + The full capture-candidate set — all session IDs that have an + ``events.jsonl`` file under ``root``, regardless of graph membership. + Subsessions (directories whose name begins with ``0000000000000000``) + ARE included; the ``events.jsonl`` marker is the sole discriminator. + Empty list when ``root_exists`` is ``False``. + """ + + root: Path + root_exists: bool + disk_only_ids: list[str] = field(default_factory=list) + candidate_ids: list[str] = field(default_factory=list) def discover_sessions( client: CIClient, workspace: str, sessions_dir: Path, -) -> tuple[list[dict], list[str]]: +) -> tuple[list[dict], DiskScanResult]: """Discover all sessions for a workspace from the graph and disk. Queries the context-intelligence graph for all sessions in *workspace*, - then scans *sessions_dir* for session directories not present in the graph. + then scans *sessions_dir* for captures (``events.jsonl`` files) not present + in the graph. + + The canonical capture definition (§D.1) is + ``//context-intelligence/events.jsonl`` — a + **fixed-shape** glob, **not** recursive. Subsessions (flat siblings whose + name begins with ``0000000000000000``) **are** included; they are real + captures that were silently dropped by the old ``0000…`` skip. + + Two distinct fail-loud states ride the **return value**: + + - ``root_exists=False`` — the sessions directory was absent or a typo'd + relocated root. ``disk_only_ids`` and ``candidate_ids`` are both empty. + This is **not** the same as "found zero captures" and the caller MUST + branch on it. + - ``root_exists=True, candidate_ids=[]`` — the directory exists but contains + no qualifying ``events.jsonl`` captures. Parameters ---------- @@ -89,15 +149,15 @@ def discover_sessions( Returns ------- - tuple[list[dict], list[str]] + tuple[list[dict], DiskScanResult] A 2-tuple of: - **graph_sessions** — list of row dicts from the graph query, each with keys ``s.node_id``, ``s.status``, ``s.started_at``, ``s.ended_at``, ordered by ``s.started_at``. - - **disk_only_ids** — list of session directory names found on disk - but absent from the graph (subsession directories starting with - ``0000000000000000`` are excluded). + - **scan** — :class:`DiskScanResult` with tagged disk-scan outcome. + Callers must branch on ``scan.root_exists`` before interpreting + ``scan.disk_only_ids``. """ rows = client.cypher( f'MATCH (s:Session) WHERE s.workspace = "{workspace}" ' @@ -106,21 +166,42 @@ def discover_sessions( workspace=workspace, ) - # Collect graph session IDs + # Collect graph session IDs. graph_ids: set[str] = set() for row in rows: sid = row.get("s.node_id", "") if sid: graph_ids.add(sid) - # Scan disk for session directories not in the graph - disk_only_ids: list[str] = [] - if sessions_dir.is_dir(): - for entry in sorted(sessions_dir.iterdir()): - if entry.is_dir() and entry.name not in graph_ids: - # Skip subsession directories (start with 0000000000000000) - if entry.name.startswith("0000000000000000"): - continue - disk_only_ids.append(entry.name) - - return rows, disk_only_ids + # --- Absent-root guard (§D.3 FIX 4) ------------------------------------ + if not sessions_dir.is_dir(): + log.warning( + "context_intelligence root does not exist: %s", + sessions_dir, + ) + return rows, DiskScanResult( + root=sessions_dir, + root_exists=False, + ) + + # --- Shared capture-candidate set (§D.1) -------------------------------- + # Fixed-shape glob keyed on events.jsonl — the writer's real output. + # Subsessions are flat siblings under sessions/ and ARE counted. + # The events.jsonl marker excludes bare dirs AND Amplifier-core's + # sessions//metadata.json (no context-intelligence/ segment). + capture_paths = capture_paths_under_sessions_dir(sessions_dir) + candidate_ids: list[str] = [p.parent.parent.name for p in capture_paths] + disk_only_ids: list[str] = [sid for sid in candidate_ids if sid not in graph_ids] + + if not candidate_ids: + log.warning( + "looked in %s, found 0 context_intelligence captures", + sessions_dir, + ) + + return rows, DiskScanResult( + root=sessions_dir, + root_exists=True, + disk_only_ids=disk_only_ids, + candidate_ids=candidate_ids, + ) diff --git a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py index b71afe19..f012079e 100644 --- a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py +++ b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/__init__.py @@ -165,6 +165,59 @@ async def on_session_ready(coordinator: Any) -> None: unregister_fns = state["unregister_fns"] destinations: dict[str, Destination] = state["destinations"] + # --- §C.3 mandatory startup consistency check (always-fire, read-only) --- + # Compare what the READERS will compute (canonicalized env var, defaulting + # when unset) against what the WRITER resolved (canonicalized base_path). + # When they disagree the writer and readers target different roots — a silent + # split that this check makes LOUD. Never writes os.environ (multiplexed-safe). + # + # This fires in BOTH directions, covering the two ways relocation can break: + # 1. env SET, writer at a different root — binding did not expand, or a + # config override fought the env var. + # 2. env UNSET, writer NOT at default — someone relocated via + # config.base_path / settings.yaml, which the env-only readers CANNOT + # see (relocation is reader-visible ONLY via the env var). The earlier + # `if _env_raw:` guard missed this case entirely. + import os # local import — only this path needs the process env + + from context_intelligence.config import reader_writer_roots_disagree + + _ENV_VAR = "AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH" + _env_raw = os.environ.get(_ENV_VAR) # may be None/empty → readers fall to default + # Pure, unit-tested divergence core (see tests/test_base_path_parity.py). + _disagree, _reader_root, _writer_root = reader_writer_roots_disagree( + _env_raw, resolver.base_path + ) + if _disagree: + log.warning( + "context-intelligence: writer base_path (%s) and reader root (%s) disagree" + " — readers (discover, recipe, navigation skills) resolve the root ONLY from" + " %s, so captures written under %s will be invisible to them." + ' Relocate via the env var (or bind base_path: "${%s:}" in the hook config),' + " not via config.base_path alone.", + _writer_root, + _reader_root, + _ENV_VAR, + _writer_root, + _ENV_VAR, + ) + else: + # Positive confirmation at the operator's surface (default level is INFO). + # Fires ONLY when relocation is actually in effect, so the operator who + # relocated can SEE it took effect — closing the "success and silent + # misconfiguration look identical at the moment of action" gap. Stays + # silent in the default (non-relocated) case so it adds no noise. + from context_intelligence.config import DEFAULT_BASE_PATH + + if _writer_root != DEFAULT_BASE_PATH: + log.info( + "context-intelligence: capturing to %s" + " (readers resolve the same root from %s)." + " Relocation is per-process, not per-session.", + _writer_root, + _ENV_VAR, + ) + # --- Destination selection (C2: working_dir capability ONLY, fail-loud) --- active: dict[str, Destination] = {} match_key: str = "" diff --git a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py index 413031bd..4f6ed8e1 100644 --- a/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py +++ b/modules/hook-context-intelligence/amplifier_module_hook_context_intelligence/config_resolver.py @@ -239,7 +239,32 @@ def base_path(self) -> Path: """Resolved base path for project storage. Chain: config['base_path'] → coordinator.config['base_path'] → default. - Tilde is expanded. Result is cached after first access. + Result is cached after first access. + + Canonicalisation rules (§D.2 — identical to reader-side + ``canonicalize_base_path`` in ``context_intelligence.config``; inlined + here to keep zero hook→reader-package coupling and the fold gate green): + + 0. **Unexpanded-placeholder guard.** A value that still looks like a + shell placeholder (``${...}``) means the host app did NOT expand the + ``base_path: "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}"`` binding + (the hook relies on the app layer to expand ``${VAR}`` before mount, + exactly as it does for ``url`` / ``api_key`` / ``exclude_events``). + Rather than treat the literal ``${...}`` as a bogus relative path and + warn on every session, fall back to the default **silently**. The + env-reading readers still relocate via the variable directly, and the + §C.3 startup consistency check in ``on_session_ready`` fires LOUD if + the variable was actually set (i.e. real relocation was intended but + the binding did not expand). + 1. Strip whitespace from the raw string. + 2. Empty string → ``_DEFAULT_BASE_PATH`` (never anchored to CWD). + 3. Expand ``~`` via :meth:`~pathlib.Path.expanduser`. + 4. If the result is still relative → warn, fall back to default. + + Both the writer (this property) and readers call the same rules so + canonicalized paths are byte-identical regardless of which side resolves + them. The §D.2 contract test drives the real property and asserts + writer ≡ reader and always absolute. """ if self._base_path is None: raw = ( @@ -247,7 +272,28 @@ def base_path(self) -> Path: or self._coordinator_config_get("base_path") or _DEFAULT_BASE_PATH ) - self._base_path = Path(raw).expanduser() + # §D.2 canonicalizer (inline — no os, no import os, fold-gate safe). + # DUPLICATED BY DESIGN: the byte-equivalent reader copy is + # `canonicalize_base_path` in `context_intelligence.config`. The fold + # gate forbids importing that package here, so the two MUST be kept in + # sync by hand; `tests/test_base_path_parity.py` pins writer ≡ reader. + # Edit one → edit the other and the parity test. + s = str(raw).strip() + if not s or s.startswith("${"): + # Empty, OR an unexpanded ${VAR} placeholder (host app did not + # expand the binding). Either way → default, no noise. §D.2 rule 0. + self._base_path = Path(_DEFAULT_BASE_PATH).expanduser() + else: + p = Path(s).expanduser() + if not p.is_absolute(): + log.warning( + "base_path %r is not absolute; using default %s", + s, + _DEFAULT_BASE_PATH, + ) + self._base_path = Path(_DEFAULT_BASE_PATH).expanduser() + else: + self._base_path = p return self._base_path @property diff --git a/modules/hook-context-intelligence/tests/test_base_path_consistency.py b/modules/hook-context-intelligence/tests/test_base_path_consistency.py new file mode 100644 index 00000000..d8c8712c --- /dev/null +++ b/modules/hook-context-intelligence/tests/test_base_path_consistency.py @@ -0,0 +1,101 @@ +# pyright: reportMissingImports=false +# (pytest / amplifier_core are runtime/CI deps not visible to the static checker here.) +"""End-to-end proof of the §C.3 base_path consistency warning (Restless-Old-Brian gate). + +These tests drive the REAL ``on_session_ready`` against the REAL ``amplifier_core`` +runtime (not the resolver in isolation) and assert the loud divergence warning +actually fires — closing the gap that the writer/consistency branches were only +ever verified by hand/parity. + +NOTE: like the rest of this module's tests, these require ``amplifier_core`` to be +importable (it is in the Amplifier tool venv / CI, but NOT in the bundle's own +isolated ``.venv``). They are skipped automatically if the runtime is unavailable. +""" + +from __future__ import annotations + +import logging + +import pytest + +pytest.importorskip("amplifier_core", reason="amplifier_core runtime not installed in this venv") + +from tests.helpers import make_lifecycle_coordinator, mount_and_ready # noqa: E402 + +_ENV_VAR = "AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH" + + +def _messages(caplog) -> str: + return "\n".join(r.getMessage() for r in caplog.records) + + +class TestConsistencyWarningEndToEnd: + """Real on_session_ready, real runtime — watch the warning fire (or stay silent).""" + + async def test_warns_when_writer_relocated_but_env_unset(self, caplog, monkeypatch) -> None: + """Scenario A — the trap: relocate via config.base_path, env var UNSET. + + Readers resolve the root only from the env var (-> default), the writer + resolved /tmp/relocated-ci-A. on_session_ready MUST warn LOUD. + """ + monkeypatch.delenv(_ENV_VAR, raising=False) + coordinator = make_lifecycle_coordinator() + with caplog.at_level(logging.WARNING): + await mount_and_ready(coordinator, config={"base_path": "/tmp/relocated-ci-A"}) + msgs = _messages(caplog) + assert "disagree" in msgs, f"expected a divergence warning; got:\n{msgs}" + assert "/tmp/relocated-ci-A" in msgs + + async def test_silent_when_env_matches_writer(self, caplog, monkeypatch) -> None: + """Scenario C (positive control) — env set to the same root the writer uses → NO warning.""" + monkeypatch.setenv(_ENV_VAR, "/tmp/relocated-ci-C") + coordinator = make_lifecycle_coordinator() + with caplog.at_level(logging.WARNING): + await mount_and_ready(coordinator, config={"base_path": "/tmp/relocated-ci-C"}) + assert "disagree" not in _messages(caplog) + + async def test_unexpanded_placeholder_is_silent_default(self, caplog, monkeypatch) -> None: + """Scenario B — host did NOT expand the ${VAR} binding, env unset. + + Writer falls back to default SILENTLY (no 'not absolute' noise) and, since + readers also default, there is NO divergence warning either. + """ + monkeypatch.delenv(_ENV_VAR, raising=False) + coordinator = make_lifecycle_coordinator() + with caplog.at_level(logging.WARNING): + await mount_and_ready( + coordinator, + config={"base_path": "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}"}, + ) + msgs = _messages(caplog) + assert "not absolute" not in msgs, f"unexpected noisy warning:\n{msgs}" + assert "disagree" not in msgs, f"unexpected divergence warning:\n{msgs}" + + +class TestPositiveConfirmation: + """Operator-visible (default INFO) confirmation of the active capture root.""" + + async def test_confirmation_when_relocated_and_consistent(self, caplog, monkeypatch) -> None: + """Relocation in effect + reader matches writer → INFO 'capturing to ' fires. + + log_level=INFO mirrors the behavior YAML default (``...:INFO``), i.e. the + level a composed-bundle operator actually runs at. + """ + monkeypatch.setenv(_ENV_VAR, "/tmp/relocated-ci-confirm") + coordinator = make_lifecycle_coordinator() + with caplog.at_level(logging.INFO): + await mount_and_ready( + coordinator, + config={"base_path": "/tmp/relocated-ci-confirm", "log_level": "INFO"}, + ) + msgs = _messages(caplog) + assert "capturing to" in msgs, f"expected positive confirmation; got:\n{msgs}" + assert "/tmp/relocated-ci-confirm" in msgs + + async def test_no_confirmation_in_default_case(self, caplog, monkeypatch) -> None: + """No relocation (default root) → stay silent even at INFO, no confirmation noise.""" + monkeypatch.delenv(_ENV_VAR, raising=False) + coordinator = make_lifecycle_coordinator() + with caplog.at_level(logging.INFO): + await mount_and_ready(coordinator, config={"log_level": "INFO"}) + assert "capturing to" not in _messages(caplog) diff --git a/recipes/workflow-pattern-analysis.yaml b/recipes/workflow-pattern-analysis.yaml index aa08001d..bde52092 100644 --- a/recipes/workflow-pattern-analysis.yaml +++ b/recipes/workflow-pattern-analysis.yaml @@ -96,15 +96,30 @@ stages: condition: "{{prescope}} != ''" command: | python3 -c " - import json, glob, os - paths = glob.glob( - os.path.expanduser('~/.amplifier/projects/**/context-intelligence/events.jsonl'), - recursive=True - ) + import json, os + from pathlib import Path + root = Path(os.environ.get('AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH') or os.path.expanduser('~/.amplifier/projects')) + _raw = sorted(root.glob('*/sessions/*/context-intelligence/events.jsonl')) workspace = '{{workspace}}' prescope = '{{prescope}}' + def _ws_match(p, ws): + # Directory-first (fast): slug == workspace. Fall back to the + # workspace FIELD in events.jsonl when workspace was set explicitly + # and differs from the project slug (matches the navigation skills). + if p.parents[3].name == ws: + return True + try: + with open(p) as fh: + # Bound the read so a malformed newline-less file can't be slurped whole, + # but stay well above any real session:start line so a valid (long) first + # event is not truncated into invalid JSON and the session silently dropped. + obj = json.loads(fh.readline(1048576)) + return isinstance(obj, dict) and obj.get('workspace') == ws + except Exception: + return False if workspace: - paths = [p for p in paths if workspace in p] + _raw = [p for p in _raw if _ws_match(p, workspace)] + paths = [str(p) for p in _raw] candidates = [] total_scanned = len(paths) for p in paths: @@ -169,15 +184,30 @@ stages: condition: "{{graph_probe.graph_available}} != 'true'" command: | python3 -c " - import json, glob, os - paths = glob.glob( - os.path.expanduser('~/.amplifier/projects/**/context-intelligence/events.jsonl'), - recursive=True - ) + import json, os + from pathlib import Path + root = Path(os.environ.get('AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH') or os.path.expanduser('~/.amplifier/projects')) + _raw = sorted(root.glob('*/sessions/*/context-intelligence/events.jsonl')) workspace = '{{workspace}}' max_sessions = {{max_sessions}} + def _ws_match(p, ws): + # Directory-first (fast): slug == workspace. Fall back to the + # workspace FIELD in events.jsonl when workspace was set explicitly + # and differs from the project slug (matches the navigation skills). + if p.parents[3].name == ws: + return True + try: + with open(p) as fh: + # Bound the read so a malformed newline-less file can't be slurped whole, + # but stay well above any real session:start line so a valid (long) first + # event is not truncated into invalid JSON and the session silently dropped. + obj = json.loads(fh.readline(1048576)) + return isinstance(obj, dict) and obj.get('workspace') == ws + except Exception: + return False if workspace: - paths = [p for p in paths if workspace in p] + _raw = [p for p in _raw if _ws_match(p, workspace)] + paths = [str(p) for p in _raw] session_ids_str = '{{session_ids}}' try: session_ids = json.loads(session_ids_str) diff --git a/scripts/context-intelligence.py b/scripts/context-intelligence.py index b297df68..eb00e09c 100644 --- a/scripts/context-intelligence.py +++ b/scripts/context-intelligence.py @@ -144,7 +144,14 @@ def cmd_reconstruct(args: argparse.Namespace) -> int: # ── Discover sessions ───────────────────────────────────────────────────── log.info("Discovering sessions for workspace %s ...", workspace) - sessions, disk_only_ids = _ci_discover.discover_sessions(client, workspace, sessions_dir) + sessions, scan = _ci_discover.discover_sessions(client, workspace, sessions_dir) + + # Absent root is a distinct failure — not "found zero" (§D.3) + if not scan.root_exists: + log.error("root not found: %s", scan.root) + return 1 + + disk_only_ids: list[str] = scan.disk_only_ids # Filter to a specific session if requested if args.session: diff --git a/skills/context-intelligence-session-navigation/SKILL.md b/skills/context-intelligence-session-navigation/SKILL.md index 15ebe2af..0aa471ba 100644 --- a/skills/context-intelligence-session-navigation/SKILL.md +++ b/skills/context-intelligence-session-navigation/SKILL.md @@ -16,26 +16,54 @@ For ready-to-use jq/grep recipes, see `context/safe-extraction-patterns.md`. ## Disk Layout +Resolve the root once before any discovery: + +```bash +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" ``` -~/.amplifier/projects/{project-slug}/sessions/{session_id}/context-intelligence/ + +``` +$CONTEXT_INTELLIGENCE_ROOT/{project-slug}/sessions/{session_id}/context-intelligence/ ├── events.jsonl # one JSON object per line, append-only └── metadata.json # session metadata, written on start, updated on end ``` -- `~/.amplifier/projects/` — default base path (configurable via `config.base_path`) +- `$CONTEXT_INTELLIGENCE_ROOT` — resolved from the idiom `"${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}"` (unset → legacy default). **Relocation is reader-visible ONLY via this env var.** The hook's `config.base_path` moves where the *writer* stores captures, but readers (this skill, `discover.py`, the recipe) resolve the root solely from the env var — so always relocate via `AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH`, not `config.base_path` alone, or readers will look in the wrong place. - `{project-slug}` — derived from the full working directory path (see Project Slug Algorithm below) - `{session_id}` — unique session identifier (UUID or UUID with agent suffix for child sessions) - `context-intelligence/` — subdirectory containing the session data files - `events.jsonl` — append-only log of every event the kernel emits during the session - `metadata.json` — compact session metadata for quick lookup without parsing the full event log -Example paths: +Example paths (with root resolved): ``` -~/.amplifier/projects/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/events.jsonl -~/.amplifier/projects/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/metadata.json +$CONTEXT_INTELLIGENCE_ROOT/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/events.jsonl +$CONTEXT_INTELLIGENCE_ROOT/-home-user-myapp/sessions/55c8841a-1234-5678-9abc-def012345678/context-intelligence/metadata.json ``` +> **⚠ MARKER RULE — the defect this prevents:** Every discovery glob MUST include the +> `context-intelligence/` path segment and MUST NOT stop at `sessions//`: +> +> ``` +> CORRECT: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl +> WRONG: "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/metadata.json # catches Amplifier core's files +> ``` +> +> **Why:** Amplifier core writes `sessions//metadata.json` with NO `context-intelligence/` +> segment. Globbing one level too shallow latches onto core's files and produces a confident +> wrong count. +> +> **Canonical marker = `events.jsonl`.** The Python readers (`discover.py`, the workflow +> recipe) treat `context-intelligence/events.jsonl` as the single discriminator of a real +> capture. `metadata.json` is used here only to read fields (`workspace`, `status`, …); both +> files are written together, so either glob includes the `context-intelligence/` segment and +> avoids the false-positive. When you need a strict capture count that matches the code, +> glob `events.jsonl`, not `metadata.json`. + +> **⚠ FAIL-LOUD RULE:** When zero captures are found, say exactly `"looked in , found 0"` — +> never report a confident count from a shallower glob, never silently fall back to a different path. + --- ## Record Format @@ -106,13 +134,13 @@ Example (child session with optional fields): | Concept | Purpose | Where used | |---------|---------|------------| -| `project_slug` | Directory name under `~/.amplifier/projects/` | On-disk path | +| `project_slug` | Directory name under `$CONTEXT_INTELLIGENCE_ROOT/` | On-disk path | | `workspace` | Field in every record | Querying and filtering | By **default** workspace equals `project_slug` (both derived from the working directory). They can differ when workspace is set explicitly via `settings.yaml` or env var — for example, workspace `"my-api"` while project_slug is `"-home-user-myapp"`. **Consequence for navigation:** -- **Directory-first lookup** — when workspace matches the project_slug, all sessions for that workspace live under `~/.amplifier/projects/{workspace}/sessions/`. This is fast and the common case. +- **Directory-first lookup** — when workspace matches the project_slug, all sessions for that workspace live under `$CONTEXT_INTELLIGENCE_ROOT/{workspace}/sessions/`. This is fast and the common case. - **Field-based filtering** — when workspace was set explicitly (overriding the slug default), scan across all project directories and filter records by `jq 'select(.workspace == "TARGET")'`. Always check both: attempt directory lookup first, then fall back to cross-project field scan. @@ -157,15 +185,21 @@ Session event files can contain lines with 100k+ tokens (e.g., `llm:response` wi ## Common Navigation Patterns +Resolve the root once before all discovery snippets in this section: + +```bash +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +``` + ### Resolve workspace to a project directory ```bash # If workspace == project_slug (common default), sessions are here: -ls ~/.amplifier/projects/{workspace}/sessions/ +ls "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/ # If workspace was set explicitly and differs from project_slug, # find all directories containing sessions tagged with this workspace: -grep -rl '"workspace":"{workspace}"' ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json \ +grep -rl '"workspace":"{workspace}"' "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/metadata.json \ | sed 's|/context-intelligence/metadata.json||' ``` @@ -173,12 +207,14 @@ grep -rl '"workspace":"{workspace}"' ~/.amplifier/projects/*/sessions/*/context- ```bash # Fast path: workspace matches directory name (default case) -for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r '[.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null done | sort -t$'\t' -k3 # Scoped path: workspace set explicitly — scan all projects, filter by field -for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.workspace == "my-project") | [.session_id, .status, .started_at, .agent_name // "(root)"] | join("\t")' "$f" 2>/dev/null done | sort -t$'\t' -k3 ``` @@ -195,7 +231,7 @@ head -1 events.jsonl | jq -r '.workspace' ```bash # Count events per workspace across all sessions in a project directory: -jq -r '.workspace' ~/.amplifier/projects/-home-user-myapp/sessions/*/context-intelligence/events.jsonl \ +jq -r '.workspace' "$CONTEXT_INTELLIGENCE_ROOT"/-home-user-myapp/sessions/*/context-intelligence/events.jsonl \ | sort | uniq -c | sort -rn ``` @@ -203,12 +239,14 @@ jq -r '.workspace' ~/.amplifier/projects/-home-user-myapp/sessions/*/context-int ```bash # Within a single project directory: -for f in ~/.amplifier/projects/my-project/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/my-project/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.status == "running") | .session_id' "$f" 2>/dev/null done # Cross-project, scoped to workspace: -for f in ~/.amplifier/projects/*/sessions/*/context-intelligence/metadata.json; do +for ev in "$CONTEXT_INTELLIGENCE_ROOT"/*/sessions/*/context-intelligence/events.jsonl; do + f="${ev%/events.jsonl}/metadata.json" # canonical marker = events.jsonl; fields from sibling jq -r 'select(.workspace == "my-project" and .status == "running") | .session_id' "$f" 2>/dev/null done ``` diff --git a/skills/context-intelligence-session-reconstruction/SKILL.md b/skills/context-intelligence-session-reconstruction/SKILL.md index 8ed3ecb4..4f6516f8 100644 --- a/skills/context-intelligence-session-reconstruction/SKILL.md +++ b/skills/context-intelligence-session-reconstruction/SKILL.md @@ -128,8 +128,9 @@ After reconstruction, verify the output files were written correctly: **Check line counts for events and transcript:** ```bash -wc -l ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/events.jsonl -wc -l ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/transcript.jsonl +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +wc -l "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/events.jsonl +wc -l "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/transcript.jsonl ``` A reconstructed `events.jsonl` should have at least a few lines (session start, prompts, tool calls). Zero lines indicates the session had no recoverable events. @@ -137,7 +138,8 @@ A reconstructed `events.jsonl` should have at least a few lines (session start, **Validate metadata JSON is well-formed:** ```bash -cat ~/.amplifier/projects/{workspace}/sessions/{session-id}/context-intelligence/metadata.json | python3 -m json.tool +CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" +cat "$CONTEXT_INTELLIGENCE_ROOT"/{workspace}/sessions/{session-id}/context-intelligence/metadata.json | python3 -m json.tool ``` This confirms the file is valid JSON. Check that `bundle`, `model`, and `session_id` fields are populated. diff --git a/skills/workflow-pattern-analysis/SKILL.md b/skills/workflow-pattern-analysis/SKILL.md index 17f6d11e..d7ca08d7 100644 --- a/skills/workflow-pattern-analysis/SKILL.md +++ b/skills/workflow-pattern-analysis/SKILL.md @@ -70,7 +70,7 @@ read_file("${SKILL_DIR}/cypher-patterns.md") — Cypher query templates Q1 Also load from the bundle context when querying the graph or scanning JSONL: ``` -@context-intelligence:context/graph-model-reference.md — CI graph schema +@context-intelligence:context/graph-model-reference.md — context-intelligence graph schema @context-intelligence:context/jsonl-event-schema.md — on-disk JSONL event schema @context-intelligence:context/safe-extraction-patterns.md — safe JSONL extraction patterns ``` @@ -129,9 +129,13 @@ Delegate to `context-intelligence:graph-analyst` to: - **Disk scan is the explicit alternative** when the user needs comprehensive coverage of ALL on-disk sessions regardless of graph indexing state. Disk scan entry: ```bash - find ~/.amplifier/projects -path "*/context-intelligence/metadata.json" 2>/dev/null + CONTEXT_INTELLIGENCE_ROOT="${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:-$HOME/.amplifier/projects}" + # Enumerate captures by the canonical marker (events.jsonl), matching the Python + # readers; read the sibling metadata.json for the format/version filter. + find "$CONTEXT_INTELLIGENCE_ROOT" -path "*/context-intelligence/events.jsonl" 2>/dev/null \ + | while read -r ev; do echo "${ev%/events.jsonl}/metadata.json"; done ``` - Filter to `format == "context-intelligence"` AND `version == "1.0.0"`. + Filter to `format == "context-intelligence"` AND `version == "1.0.0"` (from metadata.json). Follow `@context-intelligence:context/safe-extraction-patterns.md` — never load full `llm:request` lines. - Validate coverage: compare graph session count against disk count and inform the user diff --git a/tests/test_base_path_parity.py b/tests/test_base_path_parity.py new file mode 100644 index 00000000..d7c698fe --- /dev/null +++ b/tests/test_base_path_parity.py @@ -0,0 +1,106 @@ +# pyright: reportMissingImports=false +# (pytest is a test-only dep; the hook module is resolved at runtime via the +# sys.path insert below — neither is visible to the static type checker here.) +"""Parity + consistency-check tests for relocation base_path (§D.2 / §C.3). + +Two **duplicated-by-design** canonicalizers exist because the fold gate forbids the +hook's ``config_resolver`` from importing the reader package: + + - reader: ``context_intelligence.config.canonicalize_base_path`` + - writer: ``HookConfigResolver.base_path`` (byte-equivalent inline copy) + +These tests PIN ``writer ≡ reader`` so the hand-synced copies cannot drift silently, +and PIN the §C.3 divergence condition (``reader_writer_roots_disagree``) that the +hook's ``on_session_ready`` uses to decide whether to warn LOUD. Both were previously +only verified by hand; this freezes them in CI. +""" + +from __future__ import annotations + +import sys +from pathlib import Path + +import pytest + +# The writer-side canonicalizer lives in the hook module package, which is not on +# the default import path for the root test suite. Add it (the package __init__ is +# import-clean — handlers / amplifier_core are imported lazily inside functions). +REPO_ROOT = Path(__file__).parent.parent +HOOK_MODULE_DIR = REPO_ROOT / "modules" / "hook-context-intelligence" +if str(HOOK_MODULE_DIR) not in sys.path: + sys.path.insert(0, str(HOOK_MODULE_DIR)) + +from amplifier_module_hook_context_intelligence.config_resolver import ( # noqa: E402 + HookConfigResolver, +) + +from context_intelligence.config import ( # noqa: E402 + DEFAULT_BASE_PATH, + canonicalize_base_path, + reader_writer_roots_disagree, +) + + +def _writer_root(value: str) -> Path: + """Drive the REAL writer property with ``config['base_path'] = value``.""" + return HookConfigResolver(config={"base_path": value}, coordinator=None).base_path + + +# The input vector that was previously checked by hand — now pinned. +PARITY_INPUTS = [ + "${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}", # unexpanded placeholder → default + "/tmp/relocated", # absolute → used as-is + "~/relocated", # tilde → expanded + "relative/bad", # relative → default + "", # empty → default + " ", # whitespace → default +] + + +class TestWriterReaderParity: + """writer ≡ reader for every input shape (the duplication's safety net).""" + + @pytest.mark.parametrize("value", PARITY_INPUTS) + def test_writer_equals_reader(self, value: str) -> None: + writer = _writer_root(value) + reader = canonicalize_base_path(value) + assert writer == reader, f"writer/reader drift on {value!r}: {writer} != {reader}" + + @pytest.mark.parametrize("value", PARITY_INPUTS) + def test_writer_always_absolute(self, value: str) -> None: + assert _writer_root(value).is_absolute() + + def test_unexpanded_placeholder_is_default_silently(self, caplog) -> None: + """A literal ``${...}`` (host did not expand) → default, with NO noisy warning.""" + with caplog.at_level("WARNING"): + root = _writer_root("${AMPLIFIER_CONTEXT_INTELLIGENCE_BASE_PATH:}") + assert root == DEFAULT_BASE_PATH + assert not any("not absolute" in r.getMessage() for r in caplog.records) + + +class TestConsistencyDivergence: + """Pins the §C.3 condition consumed by on_session_ready.""" + + def test_agree_when_env_matches_writer(self) -> None: + disagree, reader, writer = reader_writer_roots_disagree( + "/tmp/relocated", Path("/tmp/relocated") + ) + assert disagree is False + assert reader == writer + + def test_agree_when_both_default(self) -> None: + # env unset (None) and writer at default → consistent. + disagree, _reader, _writer = reader_writer_roots_disagree(None, DEFAULT_BASE_PATH) + assert disagree is False + + def test_disagree_when_writer_relocated_but_env_unset(self) -> None: + # The exact trap fix #1 exists to catch: relocation via config.base_path + # with the env var unset — env-only readers cannot see it. + disagree, reader, writer = reader_writer_roots_disagree(None, Path("/data/ci")) + assert disagree is True + assert reader == DEFAULT_BASE_PATH + assert writer == Path("/data/ci") + + def test_disagree_when_env_and_writer_differ(self) -> None: + disagree, _reader, _writer = reader_writer_roots_disagree("/a/one", Path("/b/two")) + assert disagree is True diff --git a/tests/test_cmd_reconstruct.py b/tests/test_cmd_reconstruct.py index 34733980..f98db4c5 100644 --- a/tests/test_cmd_reconstruct.py +++ b/tests/test_cmd_reconstruct.py @@ -25,6 +25,8 @@ from pathlib import Path from unittest.mock import MagicMock, patch +from context_intelligence.reconstruct.discover import DiskScanResult + REPO_ROOT = Path(__file__).parent.parent SCRIPTS_DIR = REPO_ROOT / "scripts" SCRIPT_PATH = SCRIPTS_DIR / "context-intelligence.py" @@ -102,7 +104,15 @@ def test_cmd_reconstruct_does_not_raise_not_implemented(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([], []), + return_value=( + [], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), ): args = _make_args() @@ -136,7 +146,15 @@ def test_returns_zero_on_no_sessions(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([], []), + return_value=( + [], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), ): args = _make_args() @@ -162,7 +180,15 @@ def test_returns_zero_on_successful_processing(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -200,7 +226,15 @@ def test_returns_one_on_extract_errors(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -249,7 +283,15 @@ def test_events_only_calls_extract_events_not_others(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -292,7 +334,15 @@ def test_transcript_only_calls_extract_transcript_not_others(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -335,7 +385,15 @@ def test_metadata_only_calls_extract_metadata_not_others(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -378,7 +436,15 @@ def test_no_only_flags_calls_all_three(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -435,7 +501,15 @@ def test_session_filter_limits_to_matching_sessions(self): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=(sessions, []), + return_value=( + sessions, + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.events.extract_events", @@ -500,7 +574,15 @@ def test_skips_events_jsonl_if_exists(self, tmp_path): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.discover.workspace_slug", @@ -555,7 +637,15 @@ def test_force_overwrites_existing_events_jsonl(self, tmp_path): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.discover.workspace_slug", @@ -615,7 +705,15 @@ def test_dry_run_does_not_write_files(self, tmp_path): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([mock_session], []), + return_value=( + [mock_session], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[], + candidate_ids=[], + ), + ), ), patch( "context_intelligence.reconstruct.discover.workspace_slug", @@ -679,7 +777,15 @@ def test_disk_only_sessions_call_build_disk_only_metadata(self, tmp_path): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([], [disk_only_id]), + return_value=( + [], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[disk_only_id], + candidate_ids=[disk_only_id], + ), + ), ), patch( "context_intelligence.reconstruct.discover.workspace_slug", @@ -716,7 +822,15 @@ def test_disk_only_sessions_skipped_when_metadata_only_false(self, tmp_path): patch("context_intelligence.config.resolve_config", return_value=("http://s", "key")), patch( "context_intelligence.reconstruct.discover.discover_sessions", - return_value=([], [disk_only_id]), + return_value=( + [], + DiskScanResult( + root=Path("/tmp/test-project/sessions"), + root_exists=True, + disk_only_ids=[disk_only_id], + candidate_ids=[disk_only_id], + ), + ), ), patch( "context_intelligence.reconstruct.discover.workspace_slug", diff --git a/tests/test_reconstruct_discover.py b/tests/test_reconstruct_discover.py index 22b13139..ded59555 100644 --- a/tests/test_reconstruct_discover.py +++ b/tests/test_reconstruct_discover.py @@ -1,14 +1,16 @@ """Tests for context_intelligence.reconstruct.discover (task-8). Covers: -- Module imports correctly (workspace_slug, sessions_dir_for_project, discover_sessions) -- workspace_slug() converts absolute path to slug (replacing / with -) -- sessions_dir_for_project() returns AMPLIFIER_DIR/'projects'/slug/'sessions' -- discover_sessions() queries graph for sessions in workspace -- discover_sessions() returns graph rows and disk-only session IDs -- discover_sessions() skips subsession directories starting with 0000000000000000 -- discover_sessions() skips graph sessions from disk-only list -- Imports: CIClient from client, AMPLIFIER_DIR from config +- Module imports correctly (workspace_slug, sessions_dir_for_project, discover_sessions, + DiskScanResult) +- workspace_slug() converts absolute path to slug (replacing / with -). +- sessions_dir_for_project() returns DEFAULT_BASE_PATH/slug/'sessions'. +- discover_sessions() queries graph for sessions in workspace. +- discover_sessions() returns graph rows and a DiskScanResult. +- discover_sessions() uses the events.jsonl marker as the sole capture discriminator. +- discover_sessions() INCLUDES subsession directories (0000…) — §D.1 subsession inclusion. +- discover_sessions() returns root_exists=False for absent roots — §D.3 absent-root guard. +- Imports: CIClient from client, AMPLIFIER_DIR from config. """ from __future__ import annotations @@ -18,6 +20,22 @@ from unittest.mock import MagicMock +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _make_capture(sessions_dir: Path, session_id: str) -> None: + """Create the canonical capture structure for *session_id* under *sessions_dir*. + + A qualifying capture requires: + ``sessions_dir//context-intelligence/events.jsonl`` + """ + cap_dir = sessions_dir / session_id / "context-intelligence" + cap_dir.mkdir(parents=True, exist_ok=True) + (cap_dir / "events.jsonl").write_text("") + + class TestImport: """Module must be importable with the required public API.""" @@ -33,9 +51,14 @@ def test_discover_sessions_import(self): """discover_sessions must be importable from context_intelligence.reconstruct.discover.""" from context_intelligence.reconstruct.discover import discover_sessions # noqa: F401 + def test_disk_scan_result_import(self): + """DiskScanResult must be importable from context_intelligence.reconstruct.discover.""" + from context_intelligence.reconstruct.discover import DiskScanResult # noqa: F401 + def test_acceptance_criteria_command(self): """Simulate the acceptance criteria import command.""" from context_intelligence.reconstruct.discover import ( + DiskScanResult, discover_sessions, sessions_dir_for_project, workspace_slug, @@ -44,6 +67,7 @@ def test_acceptance_criteria_command(self): assert workspace_slug is not None assert sessions_dir_for_project is not None assert discover_sessions is not None + assert DiskScanResult is not None def test_uses_ciclient_from_client(self): """CIClient must be importable from the client module (imported dependency).""" @@ -157,6 +181,15 @@ def test_returns_tuple(self): assert isinstance(result, tuple) assert len(result) == 2 + def test_second_element_is_disk_scan_result(self): + """Second element is a DiskScanResult, not a bare list.""" + from context_intelligence.reconstruct.discover import DiskScanResult, discover_sessions + + client = self._make_client([]) + with tempfile.TemporaryDirectory() as tmpdir: + _, scan = discover_sessions(client, "test-workspace", Path(tmpdir)) + assert isinstance(scan, DiskScanResult) + def test_returns_graph_rows(self): """First element is the list of rows returned by client.cypher().""" from context_intelligence.reconstruct.discover import discover_sessions @@ -171,7 +204,7 @@ def test_returns_graph_rows(self): ] client = self._make_client(fake_rows) with tempfile.TemporaryDirectory() as tmpdir: - graph_sessions, disk_only = discover_sessions(client, "test-workspace", Path(tmpdir)) + graph_sessions, scan = discover_sessions(client, "test-workspace", Path(tmpdir)) assert graph_sessions == fake_rows def test_calls_cypher_with_workspace(self): @@ -211,7 +244,7 @@ def test_cypher_query_ordered_by_started_at(self): assert "ORDER BY s.started_at" in query_arg def test_disk_only_includes_dirs_not_in_graph(self): - """Directories on disk not in graph_ids are returned as disk_only_ids.""" + """Directories with events.jsonl on disk but not in graph_ids appear in disk_only_ids.""" from context_intelligence.reconstruct.discover import discover_sessions fake_rows = [ @@ -225,62 +258,164 @@ def test_disk_only_includes_dirs_not_in_graph(self): client = self._make_client(fake_rows) with tempfile.TemporaryDirectory() as tmpdir: sessions_dir = Path(tmpdir) - # Create a session dir that is on disk but not in graph - (sessions_dir / "disk-only-session").mkdir() - # Create a session dir that is also in graph - (sessions_dir / "graph-session-1").mkdir() + # Create a capture that is on disk but not in graph + _make_capture(sessions_dir, "disk-only-session") + # Create a capture that is also in graph + _make_capture(sessions_dir, "graph-session-1") + + _, scan = discover_sessions(client, "test-workspace", sessions_dir) + + assert "disk-only-session" in scan.disk_only_ids + assert "graph-session-1" not in scan.disk_only_ids + + def test_candidate_ids_includes_graph_sessions(self): + """candidate_ids is the full set regardless of graph membership.""" + from context_intelligence.reconstruct.discover import discover_sessions + + fake_rows = [ + { + "s.node_id": "graph-sess", + "s.status": "completed", + "s.started_at": "2024-01-01", + "s.ended_at": "2024-01-02", + }, + ] + client = self._make_client(fake_rows) + with tempfile.TemporaryDirectory() as tmpdir: + sessions_dir = Path(tmpdir) + _make_capture(sessions_dir, "disk-only-session") + _make_capture(sessions_dir, "graph-sess") - _, disk_only = discover_sessions(client, "test-workspace", sessions_dir) + _, scan = discover_sessions(client, "test-workspace", sessions_dir) - assert "disk-only-session" in disk_only - assert "graph-session-1" not in disk_only + assert "disk-only-session" in scan.candidate_ids + assert "graph-sess" in scan.candidate_ids + assert len(scan.candidate_ids) == 2 + assert len(scan.disk_only_ids) == 1 - def test_disk_only_skips_subsession_dirs(self): - """Directories starting with 0000000000000000 are skipped (subsessions).""" + def test_disk_only_INCLUDES_subsession_dirs(self): + """Subsession directories (0000… prefix) ARE included — §D.1 subsession inclusion. + + The old 0000… skip silently dropped real captures; this test pins the + corrected behaviour: subsessions have captures and must be counted. + """ from context_intelligence.reconstruct.discover import discover_sessions client = self._make_client([]) with tempfile.TemporaryDirectory() as tmpdir: sessions_dir = Path(tmpdir) - # Subsession directory - should be skipped - (sessions_dir / "0000000000000000-abc123_some-agent").mkdir() - # Normal session directory - should be included - (sessions_dir / "real-session-abc").mkdir() + # Subsession capture — MUST be included + _make_capture(sessions_dir, "0000000000000000-abc123_some-agent") + # Normal root-session capture — also included + _make_capture(sessions_dir, "real-session-abc") - _, disk_only = discover_sessions(client, "test-workspace", sessions_dir) + _, scan = discover_sessions(client, "test-workspace", sessions_dir) - assert "0000000000000000-abc123_some-agent" not in disk_only - assert "real-session-abc" in disk_only + assert "0000000000000000-abc123_some-agent" in scan.disk_only_ids + assert "real-session-abc" in scan.disk_only_ids - def test_disk_only_empty_when_no_sessions_dir(self): - """When sessions_dir does not exist, disk_only_ids is empty.""" + def test_bare_dir_without_events_jsonl_excluded(self): + """A directory without context-intelligence/events.jsonl is NOT a capture.""" + from context_intelligence.reconstruct.discover import discover_sessions + + client = self._make_client([]) + with tempfile.TemporaryDirectory() as tmpdir: + sessions_dir = Path(tmpdir) + # Bare dir — no events.jsonl + (sessions_dir / "bare-dir").mkdir() + # Capture with correct marker + _make_capture(sessions_dir, "real-capture") + + _, scan = discover_sessions(client, "test-workspace", sessions_dir) + + assert "bare-dir" not in scan.disk_only_ids + assert "bare-dir" not in scan.candidate_ids + assert "real-capture" in scan.disk_only_ids + + def test_amplifier_core_bait_dir_excluded(self): + """Amplifier-core sessions//metadata.json (no context-intelligence/) are excluded.""" + from context_intelligence.reconstruct.discover import discover_sessions + + client = self._make_client([]) + with tempfile.TemporaryDirectory() as tmpdir: + sessions_dir = Path(tmpdir) + # Amplifier-core bait: has metadata.json but no context-intelligence/events.jsonl + bait_dir = sessions_dir / "core-session-bait" + bait_dir.mkdir() + (bait_dir / "metadata.json").write_text("{}") + # Capture with correct marker + _make_capture(sessions_dir, "real-capture") + + _, scan = discover_sessions(client, "test-workspace", sessions_dir) + + assert "core-session-bait" not in scan.candidate_ids + assert "real-capture" in scan.candidate_ids + + def test_absent_root_returns_root_exists_false(self): + """When sessions_dir does not exist, root_exists is False — §D.3 absent-root guard.""" from context_intelligence.reconstruct.discover import discover_sessions client = self._make_client([]) nonexistent = Path("/tmp/this-path-should-not-exist-8675309") - _, disk_only = discover_sessions(client, "test-workspace", nonexistent) - assert disk_only == [] + _, scan = discover_sessions(client, "test-workspace", nonexistent) + + assert scan.root_exists is False + assert scan.disk_only_ids == [] + assert scan.candidate_ids == [] + + def test_absent_root_is_distinct_from_found_zero(self): + """root_exists=False is distinct from root_exists=True with no captures.""" + from context_intelligence.reconstruct.discover import discover_sessions + + client = self._make_client([]) + + # Absent root + nonexistent = Path("/tmp/this-path-should-not-exist-8675309") + _, absent_scan = discover_sessions(client, "test-workspace", nonexistent) + + # Existing root but no captures + with tempfile.TemporaryDirectory() as tmpdir: + _, empty_scan = discover_sessions(client, "test-workspace", Path(tmpdir)) + + assert absent_scan.root_exists is False + assert empty_scan.root_exists is True + assert empty_scan.disk_only_ids == [] + + def test_found_zero_root_exists_true(self): + """An existing root with no captures: root_exists=True and empty lists.""" + from context_intelligence.reconstruct.discover import discover_sessions + + client = self._make_client([]) + with tempfile.TemporaryDirectory() as tmpdir: + _, scan = discover_sessions(client, "test-workspace", Path(tmpdir)) + + assert scan.root_exists is True + assert scan.disk_only_ids == [] + assert scan.candidate_ids == [] def test_disk_only_is_list_of_strings(self): - """disk_only_ids is a list of strings (directory names).""" + """disk_only_ids is a list of strings (session IDs).""" from context_intelligence.reconstruct.discover import discover_sessions client = self._make_client([]) with tempfile.TemporaryDirectory() as tmpdir: sessions_dir = Path(tmpdir) - (sessions_dir / "some-session").mkdir() - _, disk_only = discover_sessions(client, "test-workspace", sessions_dir) - assert isinstance(disk_only, list) - assert all(isinstance(s, str) for s in disk_only) + _make_capture(sessions_dir, "some-session") + _, scan = discover_sessions(client, "test-workspace", sessions_dir) + + assert isinstance(scan.disk_only_ids, list) + assert all(isinstance(s, str) for s in scan.disk_only_ids) def test_empty_sessions_dir(self): - """An empty sessions_dir returns no disk_only_ids.""" + """An empty sessions_dir returns empty lists (root_exists=True).""" from context_intelligence.reconstruct.discover import discover_sessions client = self._make_client([]) with tempfile.TemporaryDirectory() as tmpdir: - _, disk_only = discover_sessions(client, "test-workspace", Path(tmpdir)) - assert disk_only == [] + _, scan = discover_sessions(client, "test-workspace", Path(tmpdir)) + assert scan.root_exists is True + assert scan.disk_only_ids == [] + assert scan.candidate_ids == [] def test_files_in_sessions_dir_not_included(self): """Regular files (not directories) in sessions_dir are not included.""" @@ -290,5 +425,6 @@ def test_files_in_sessions_dir_not_included(self): with tempfile.TemporaryDirectory() as tmpdir: sessions_dir = Path(tmpdir) (sessions_dir / "not-a-session.txt").write_text("hello") - _, disk_only = discover_sessions(client, "test-workspace", sessions_dir) - assert disk_only == [] + _, scan = discover_sessions(client, "test-workspace", sessions_dir) + assert scan.disk_only_ids == [] + assert scan.candidate_ids == []