stablekernel
diff --git a/‎CHANGELOG.md‎
Lines changed: 48 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 48 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 64 additions & 14 deletions b/‎README.md‎
Lines changed: 64 additions & 14 deletions
diff --git a/‎scripts/session-reuse-smoke.mjs‎
Lines changed: 110 additions & 0 deletions b/‎scripts/session-reuse-smoke.mjs‎
Lines changed: 110 additions & 0 deletions
@@ -4,6 +4,52 @@ All notable changes to this project will be documented in this file.
 
 ## [Unreleased]
 
+- **Fingerprint-guarded session reuse, now the default (`session: "auto"`).**
+  Previously the provider created a fresh Cursor agent every turn and re-sent
+  the whole transcript (robust but cache-hostile and increasingly costly as a
+  conversation grows), while opt-in `session: true` resumed one agent per
+  session but could drift from opencode's history (edits/reverts/compaction) and
+  was disturbed by non-chat side calls. `session: "auto"` (the new default)
+  hashes only the parts opencode replays verbatim — the system prompt and the
+  user-message sequence — and classifies each turn: a clean **continuation**
+  resumes the pooled agent and sends only the new message (maximizing prefix
+  cache hits); a **side-call** (system prompt differs, e.g. title generation)
+  runs a fresh ephemeral agent without touching the pool; a **divergence**
+  (edit/revert/compaction/queued messages) or a failed resume falls back to a
+  fresh agent + full transcript and re-pools. Worst case is one self-healing
+  full replay — never worse than the old default. `session: true` is now an
+  alias for `"auto"`; `session: false` keeps the always-fresh behavior.
+  Set `OPENCODE_CURSOR_DEBUG=1` to log per-turn classification and cache usage.
+- **Session reuse survives opencode restarts.** The pool's fingerprint records
+  persist (best-effort) to `~/.cache/opencode-cursor/session-pool.json` (7-day
+  TTL, 200-entry LRU cap), so the first turn after a restart resumes the
+  session's Cursor agent — whose conversation lives in Cursor's own checkpoint
+  store — instead of paying a cache-cold full-transcript replay.
+- **MCP servers are re-forwarded live, per turn, with OAuth mapping.** The
+  `config` hook's startup snapshot meant mid-session MCP enable/disable never
+  reached the Cursor agent. The `chat.params` hook now forwards the live set
+  each turn (`client.mcp.status()` for runtime truth, `client.config.get()` for
+  launch specs). Because a resumed agent keeps its original servers, a changed
+  set forces a fresh agent (full-transcript replay, re-pooled) so the new
+  servers take effect — the session fingerprint carries an `mcpHash` for this.
+  Remote servers with a registered OAuth client are forwarded with a Cursor
+  `auth` block so the agent runs its own OAuth flow; servers needing OAuth
+  without a shareable `clientId` (dynamic registration) are skipped with a
+  one-time toast instead of forwarding a spec that would 401.
+- **Fixed: text/reasoning streamed after a tool call rendered above the tool
+  block.** The earlier ordering fix closed parts on text↔reasoning transitions,
+  but blocks-mode tool parts were emitted while the narration part stayed open
+  — and hosts position a part where it started. Open text/reasoning parts are
+  now closed before tool parts are emitted (except for buffered edit calls,
+  which emit nothing until their result arrives, so narration isn't split
+  needlessly).
+- **Tool outputs are included (truncated) in flattened transcripts.** The
+  fresh/divergence/`session: false` replay paths previously dropped Cursor tool
+  results to bare `[result of X]` placeholders, so a fresh agent re-read a
+  transcript with prior tool outputs missing. Outputs are now inlined and capped
+  (2,000 chars per result, 500 per tool-call args) so context stays faithful
+  without unbounded bloat.
+
 ## [0.2.0] — 2026-06-11
 
 - **More Cursor tools map onto opencode's native tool renderers (blocks mode).**
@@ -47,7 +93,8 @@ and a permission-gated delegation tool surface.
 - **Session reuse** (`session: true`) — keeps one Cursor agent per opencode
   session via `Agent.resume()` across turns, with automatic fallback to a fresh
   agent. A run wedged by a crashed/duplicate process is recovered by retrying
-  the send once with the SDK's `local.force` escape hatch.
+  the send once with the SDK's `local.force` escape hatch. (Superseded by the
+  fingerprint-guarded `session: "auto"` default; see Unreleased.)
 - **Native diff viewer for Cursor edits (blocks mode).** A Cursor `edit` tool
   call is now surfaced under opencode's registered `edit` tool with its real
   unified diff in `metadata.diff`, so opencode renders its built-in diff viewer
 
@@ -152,20 +152,53 @@ This plugin also registers two **delegation tools** that complement the provider
 | `settingSources` | — | Cursor settings layers to load from disk: `["project","user","all",...]` — pulls in your Cursor **skills**, rules, and `.cursor/mcp.json` |
 | `sandbox` | — | Run the agent's tools inside Cursor's sandbox (`true`/`false`) |
 | `agents` | — | Cursor subagent definitions (`{ <name>: { description, prompt, model?, mcpServers? } }`) |
-| `session` | `false` | Reuse one Cursor agent per opencode session (resume across turns; see below) |
+| `session` | `"auto"` | Session reuse strategy: `"auto"` (fingerprint-guarded resume), `true` (alias for `"auto"`), or `false` (always fresh). See below |
 | `forwardMcp` | `true` | Forward opencode's configured MCP servers to the Cursor agent |
 | `mcpServers` | — | Extra MCP servers (Cursor `McpServerConfig` shape); merged with forwarded ones |
 | `toolDisplay` | `"blocks"` | How Cursor's internal tool activity is shown: `"blocks"` (structured provider-executed tool blocks; default, requires opencode 1.16+) or `"reasoning"` (compact lines, the fallback for older/non-V3 hosts). See [Tool display](#tool-display) |
 
 ### Session reuse (`session`)
 
-By default each opencode turn spins up a **fresh** Cursor agent and re-sends the full conversation
-transcript — robust, and correct even for opencode's non-chat calls (e.g. title generation). Set
-`session: true` to instead keep **one Cursor agent per opencode session**: the provider names the
-agent after the session, `Agent.resume()`s it on later turns, and sends only the new message so
-Cursor uses its native conversation memory and checkpoints (the agent is visible in Cursor's
-dashboard). The opencode session id reaches the provider via the plugin's `chat.params` hook
-(`providerOptions.cursor.sessionID`); a failed resume falls back to a fresh turn automatically.
+opencode re-sends the **entire** conversation transcript on every turn. Replaying that into a fresh
+Cursor agent each turn is robust but costs more input tokens as the conversation grows (and pays
+opencode's system prompt on top of Cursor's own). Reusing one Cursor agent and sending only the new
+message is the cache-friendly, native-CLI-like path — but a blindly resumed agent can drift from
+opencode's view of history (message edits, reverts, opencode-side compaction) and must not be
+disturbed by opencode's non-chat side calls (e.g. title generation).
+
+**`session: "auto"` (the default) resolves this with a per-turn fingerprint.** The provider hashes
+only the parts opencode replays verbatim — the system prompt and the user-message sequence — and
+classifies each turn:
+
+| Situation | Classification | What the provider does |
+| --- | --- | --- |
+| First turn of the session | **new** | fresh agent, full transcript, pool it |
+| System prompt differs (title gen and other side calls) | **side-call** | fresh ephemeral agent; the pooled agent is left untouched |
+| Prior user sequence is an exact prefix + exactly one new user message | **continuation** | `Agent.resume` the pooled agent, send **only** the new message |
+| Continuation, but the forwarded MCP server set changed | **continuation** (fresh agent) | fresh agent + full transcript, re-pool — a resumed agent keeps its original MCP servers, so a fresh one is needed for the new set |
+| Earlier message edited/reverted, conversation compacted, or several messages queued | **divergence** | fresh agent, full transcript, re-pool |
+
+The worst case on any misclassification is a single full-transcript replay that self-heals on the
+next turn — never worse than `session: false`. A failed resume also degrades to a fresh replay. The
+resumed agent is named after the session and visible in Cursor's dashboard; the opencode session id
+reaches the provider via the plugin's `chat.params` hook (`providerOptions.cursor.sessionID`).
+Fingerprint records persist (best-effort) to `~/.cache/opencode-cursor/session-pool.json`, so
+session reuse survives opencode restarts — the conversation itself lives in Cursor's own local
+checkpoint store, and the next turn resumes it instead of replaying the transcript.
+
+- `session: true` is an alias for `"auto"`.
+- `session: false` restores the original behavior: always a fresh agent + full transcript, every
+  turn. Use it if you want each turn fully independent.
+
+**Cache implications.** Cursor builds prompts cache-friendly and the model provider's own prefix
+cache (Anthropic uses a ~5-minute sliding TTL) decides hits. `"auto"` keeps the prompt prefix stable
+across turns, which is what lands cache reads instead of expensive re-seeds. Things that re-seed the
+cache even mid-window: switching model/variant, changing the thinking level, toggling agent/plan
+mode, editing an earlier message, or changing the forwarded MCP server set (tool definitions sit at
+the top of the provider's cache-prefix hierarchy, so they invalidate everything after them). Tool outputs from earlier
+turns are included (truncated) in the replay paths so a fresh/diverged agent still sees what prior
+tools produced. Set `OPENCODE_CURSOR_DEBUG=1` to log the per-turn classification and the
+`cacheReadTokens`/`cacheWriteTokens` reported by Cursor.
 
 ### Per-request controls (`mode`, thinking level)
 
@@ -205,20 +238,36 @@ To disable MCP forwarding, set `provider.cursor.options.forwardMcp: false` in yo
 
 ## MCP servers
 
-The Cursor agent can use the **same MCP servers you've configured in opencode**. The plugin's
-`config` hook reads opencode's `config.mcp`, translates each entry into the Cursor SDK's
-`McpServerConfig` shape, and hands them to the agent via `Agent.create({ mcpServers })`:
+The Cursor agent can use the **same MCP servers you've configured in opencode**. Forwarding is
+**live, per turn**: the plugin's `chat.params` hook reads opencode's current MCP state
+(`client.mcp.status()` for what's actually enabled right now, `client.config.get()` for the launch
+specs), translates each entry into the Cursor SDK's `McpServerConfig` shape, and hands the set to
+the agent — so enabling or disabling an MCP server mid-session takes effect on the next turn, not
+the next restart. A startup snapshot from the `config` hook remains as the fallback when the live
+read is unavailable.
 
 | opencode `config.mcp` | → Cursor |
 | --- | --- |
 | `{ type: "local", command: [cmd, ...args], environment }` | `{ type: "stdio", command: cmd, args, env }` |
 | `{ type: "remote", url, headers }` | `{ type: "http", url, headers }` |
+| remote with registered OAuth client (`clientId`, optional secret/scopes) | `{ type: "http", url, auth: { CLIENT_ID, … } }` — the agent runs its own OAuth flow |
 
 So whatever MCP servers your `opencode.json` defines, your Cursor agent connects to those same
 servers — MCP servers are independent processes, so opencode and the agent each connect to them
 directly.
 Disabled entries (`enabled: false`) are skipped. Turn this off with `forwardMcp: false`.
 
+> **OAuth caveat.** opencode's own access tokens never land in `config.mcp`, so a remote server
+> that needs OAuth **without** a shareable `clientId` (dynamic client registration / `needs_auth`)
+> can't be forwarded — forwarding its spec would just 401. Such servers are skipped and a one-time
+> toast tells you which ones; they keep working inside opencode itself.
+>
+> **Session-reuse interaction.** A resumed Cursor agent keeps the MCP servers it was created with,
+> so when the forwarded set changes between turns the provider creates a fresh agent (full
+> transcript replay, re-pooled) instead of resuming — see
+> [Session reuse](#session-reuse-session). Tool definitions sit at the top of the provider's
+> cache-prefix hierarchy, so an MCP change also re-seeds the prompt cache.
+
 > Scope note: this forwards **MCP servers**. opencode's *loop-internal* features — its own skills
 > and subagents — are not exposed to the Cursor agent (they run inside opencode's agent loop, which
 > this provider bypasses). The Cursor agent's *own* skills/rules can be loaded with the
@@ -297,9 +346,10 @@ This plugin runs Cursor as a **local agent** (`Agent.create({ local: { cwd } })`
   directory. How that activity is shown is controlled by the [`toolDisplay`](#tool-display) option.
   Either way it is **not** routed through opencode's tool/permission system — Cursor runs the tools
   itself.
-- By default each turn creates a fresh local agent and sends the full conversation transcript, so
-  context is always complete. Enable `session: true` to reuse Cursor's native per-agent memory
-  across turns (see [Session reuse](#session-reuse-session)).
+- By default (`session: "auto"`) the provider resumes one Cursor agent per session and sends only
+  the new message on a clean continuation, falling back to a fresh agent + full transcript on
+  edits/reverts/compaction/side calls (see [Session reuse](#session-reuse-session)). Set
+  `session: false` to always create a fresh agent and re-send the full transcript every turn.
 - Token usage is reported from Cursor's `turn-ended` event; cost is shown as `0` because Cursor
   bills your account separately.
 - **Provider path is local.** The `cursor/*` models you chat with run as a **local** agent. Cursor's
 
@@ -0,0 +1,110 @@
+// Live smoke test for fingerprint-guarded session reuse (`session: "auto"`).
+//
+// Simulates how opencode drives the provider across turns: it re-sends the
+// whole transcript each call with a stable providerOptions.cursor.sessionID.
+// This script asserts the classification + cache behavior empirically:
+//
+//   Turn 1 (new)          -> fresh agent, full transcript
+//   Turn 2,3 (continuation)-> RESUME pooled agent, send only the new message;
+//                            inputTokens stays flat, cacheRead dominates
+//   Turn 4 (divergence)   -> edit an earlier user message -> fresh replay,
+//                            re-pool. Demonstrates the safety fallback.
+//
+// Classification is logged to stderr (OPENCODE_CURSOR_DEBUG=1, set below).
+// Skips cleanly (exit 0) when CURSOR_API_KEY is absent.
+import { mkdtempSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+
+process.env.OPENCODE_CURSOR_DEBUG = "1";
+
+const apiKey = process.env.CURSOR_API_KEY?.trim();
+if (!apiKey) {
+	console.log("[session-smoke] No CURSOR_API_KEY; skipping.");
+	process.exit(0);
+}
+
+const modelId = process.env.CURSOR_SMOKE_MODEL?.trim() || "composer-2.5";
+const providerUrl = new URL("../dist/provider/index.js", import.meta.url).href;
+const { createCursor } = await import(providerUrl);
+
+const cwd = mkdtempSync(join(tmpdir(), "cursor-session-"));
+const model = createCursor({ apiKey, cwd, session: "auto" }).languageModel(
+	modelId,
+);
+const sessionID = `smoke-${Date.now()}`;
+
+const sys = {
+	role: "system",
+	content: "You are terse. Answer in one short sentence.",
+};
+const user = (text) => ({ role: "user", content: [{ type: "text", text }] });
+const assistant = (text) => ({
+	role: "assistant",
+	content: [{ type: "text", text }],
+});
+
+async function turn(label, prompt) {
+	const controller = new AbortController();
+	const timer = setTimeout(() => controller.abort(), 180_000);
+	let text = "";
+	let usage;
+	try {
+		const { stream } = await model.doStream({
+			prompt,
+			abortSignal: controller.signal,
+			providerOptions: { cursor: { sessionID } },
+		});
+		const reader = stream.getReader();
+		for (;;) {
+			const { done, value } = await reader.read();
+			if (done) break;
+			if (value.type === "text-delta") text += value.delta;
+			else if (value.type === "finish") usage = value.usage;
+		}
+	} finally {
+		clearTimeout(timer);
+	}
+	const inp = usage?.inputTokens ?? {};
+	console.log(
+		`[session-smoke:${label}] reply=${JSON.stringify(text.trim().slice(0, 60))} ` +
+			`input=${inp.total ?? "?"} cacheRead=${inp.cacheRead ?? "?"} cacheWrite=${inp.cacheWrite ?? "?"}`,
+	);
+	return text.trim();
+}
+
+// Turn 1 — new session.
+const r1 = await turn("t1-new", [sys, user("Name a primary color.")]);
+// Turn 2 — continuation (one new user message appended).
+const r2 = await turn("t2-cont", [
+	sys,
+	user("Name a primary color."),
+	assistant(r1),
+	user("Name another one."),
+]);
+// Turn 3 — continuation again.
+const r3 = await turn("t3-cont", [
+	sys,
+	user("Name a primary color."),
+	assistant(r1),
+	user("Name another one."),
+	assistant(r2),
+	user("And a third?"),
+]);
+// Turn 4 — divergence: edit the FIRST user message -> must fall back to replay.
+await turn("t4-diverge", [
+	sys,
+	user("Name a primary color. (edited)"),
+	assistant(r1),
+	user("Name another one."),
+	assistant(r2),
+	user("And a third?"),
+	assistant(r3),
+	user("One more?"),
+]);
+
+console.log(
+	"[session-smoke] Done. Expect stderr classifications: " +
+		"fresh:new, resume, resume, fresh:divergence. " +
+		"On t2/t3 inputTokens should stay flat with cacheRead dominating.",
+);