getagentseal · iamtoruk · Jul 5, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
diff --git a/README.md b/README.md
@@ -506,7 +506,7 @@ These are starting points, not verdicts. A 60% cache hit on a single experimenta
 | **OpenCode** | SQLite `~/.local/share/opencode/opencode*.db` (respects `XDG_DATA_HOME`) | Queries `session`, `message`, and `part` read-only and recalculates cost via LiteLLM (falling back to OpenCode's own cost field for unpriced models). Subtask sessions (`parent_id IS NOT NULL`) are excluded to avoid double counting; multiple channel databases are supported. |
 | **Gemini CLI** | `~/.gemini/tmp/<project>/chats/session-*.json` | One JSON file per session with real token counts (input, output, cached, thoughts) per message, so no estimation is needed. Input is reported inclusive of cached, so CodeBurn subtracts cached before pricing to avoid double charging. |
 | **Antigravity (CLI & IDE)** | Session files under `.gemini/` folders, plus the running language server | Pulls granular trajectory and pricing from the language server process. For the short-lived CLI, optionally install a status-line hook with `codeburn antigravity-hook install` so usage is captured between menubar refreshes. The IDE is detected via the `--app-data-dir antigravity-ide` flag on Windows. |
-| **GitHub Copilot** | `~/.copilot/session-state/` (legacy CLI) and VS Code/VSCodium `workspaceStorage/*/GitHub.copilot-chat/transcripts/` | Editor transcripts carry no explicit token counts, so tokens are estimated from content length and the model is inferred from tool call ID prefixes. |
+| **GitHub Copilot** | `~/.copilot/session-state/` (legacy CLI); VS Code/VSCodium `workspaceStorage/*` chat sessions, `GitHub.copilot-chat/transcripts/`, and the `agent-traces.db` OpenTelemetry store; JetBrains IDEs (IntelliJ, PyCharm, …) under `~/.config/github-copilot/<ide>/<kind>/<storeId>/copilot-*-nitrite.db` | The OTel SQLite store is preferred when present (it carries real input/output/cache token counts). Other sources carry no explicit counts, so tokens are estimated from content length and the model is inferred from tool call ID prefixes. JetBrains sessions read from a Nitrite (H2 MVStore) `.db`; project comes from the plugin's `projectName` field (else the `.git` root of a referenced file). See [docs/providers/copilot.md](docs/providers/copilot.md). |
 | **Kiro** | `.chat` JSON files | Token counts are estimated from content length. The model is not exposed, so sessions are labeled `kiro-auto` and costed at Sonnet rates. |
 | **Mistral Vibe** | `~/.vibe/logs/session/` (or `$VIBE_HOME/logs/session/`); each folder has `meta.json` + `messages.jsonl` | Reads cumulative prompt/completion totals and model pricing from `meta.json`, then the first user prompt and tool calls from `messages.jsonl`. Emits one record per session (source data is cumulative, not per turn); subagent sessions under `agents/` are counted separately. |
 | **OpenClaw** | `~/.openclaw/agents/*.jsonl` (legacy `.clawdbot`, `.moltbot`, `.moldbot`) | Token usage comes from assistant message `usage` blocks; the model from `modelId` or `message.model`. |

diff --git a/docs/providers/README.md b/docs/providers/README.md
@@ -13,7 +13,7 @@ For the architectural picture, see `../architecture.md`.
 | [Claude](claude.md) | JSONL (no parser) | `src/providers/claude.ts` | none (covered indirectly) |
 | [Cline](cline.md) | JSON | `src/providers/cline.ts` | `tests/providers/cline.test.ts` |
 | [Codex](codex.md) | JSONL | `src/providers/codex.ts` | `tests/providers/codex.test.ts` |
-| [Copilot](copilot.md) | JSONL | `src/providers/copilot.ts` | `tests/providers/copilot.test.ts` |
+| [Copilot](copilot.md) | JSONL + SQLite (OTel) + Nitrite .db (JetBrains) | `src/providers/copilot.ts` | `tests/providers/copilot.test.ts` |
 | [Devin](devin.md) | JSON + SQLite enrichment | `src/providers/devin.ts` | `tests/providers/devin.test.ts` |
 | [Droid](droid.md) | JSONL | `src/providers/droid.ts` | `tests/providers/droid.test.ts` |
 | [Gemini](gemini.md) | JSON / JSONL | `src/providers/gemini.ts` | none |

diff --git a/docs/providers/copilot.md b/docs/providers/copilot.md
@@ -1,6 +1,6 @@
 # Copilot
 
-GitHub Copilot Chat (CLI, VS Code core chat sessions, and VS Code extension transcripts).
+GitHub Copilot Chat (CLI, VS Code core chat sessions, VS Code extension transcripts, and JetBrains IDE sessions).
 
 - **Source:** `src/providers/copilot.ts`
 - **Loading:** eager (`src/providers/index.ts:3`)
@@ -16,10 +16,11 @@ Other discovered sources are walked on every run; results merge and dedupe.
 2. **VS Code core chat sessions:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/chatSessions/*.jsonl` plus `~/Library/Application Support/Code/User/globalStorage/emptyWindowChatSessions/*.jsonl` and equivalents on Windows / Linux
 3. **VS Code transcripts:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/GitHub.copilot-chat/transcripts/` and equivalents on Windows / Linux
 4. **OTel SQLite store:** VS Code Copilot Chat's `agent-traces.db` (see the OTel section). Preferred when present because it carries full input / output / cache token counts; legacy JSONL sources only record output tokens.
+5. **JetBrains IDE sessions:** `~/.config/github-copilot/<ide>/<kind>/<storeId>/copilot-*-nitrite.db` (see the JetBrains section). Covers IntelliJ IDEA, PyCharm, RubyMine, etc.
 
 ## Storage format
 
-JSONL in the first three locations (schemas differ; the parser switches by source type / event shape), and a SQLite DB for the OTel source. VS Code core chat sessions use a delta journal: `kind:0` sets the root object, `kind:1` writes a value at path `k`, and `kind:2` appends items to an array path.
+JSONL in the first three locations (schemas differ; the parser switches by source type / event shape), a SQLite DB for the OTel source, and a Nitrite (H2 MVStore) `.db` for the JetBrains source. VS Code core chat sessions use a delta journal: `kind:0` sets the root object, `kind:1` writes a value at path `k`, and `kind:2` appends items to an array path.
 
 ## OpenTelemetry (OTel) source
 
@@ -44,13 +45,126 @@ instead of trying to dedupe across stores.
   before the upgrade cannot be recovered, so monotonicity starts from the upgrade point,
   not retroactively.
 
+## JetBrains IDEs (IntelliJ, PyCharm, …)
+
+The JetBrains Copilot plugin does **not** write to any of the VS Code or CLI
+locations above. It persists chat/agent sessions under the shared GitHub Copilot
+config root, in one store directory per session store:
+
+```
+~/.config/github-copilot/<ide>/<kind>/<storeId>/
+  copilot-*-nitrite.db     # Nitrite (H2 MVStore) — the session content
+  blobs/
+```
+
+`<ide>` is a per-IDE dir (`iu` for IntelliJ IDEA Ultimate, `intellij` for the
+community edition, `PyCharm2025.2`, …). `<kind>` ∈ `chat-agent-sessions`,
+`chat-sessions`, `chat-edit-sessions` (agent / ask / edit mode). The root follows
+XDG rules: `$XDG_CONFIG_HOME/github-copilot` when set, else
+`~/.config/github-copilot` (macOS / Linux) or `%LOCALAPPDATA%\github-copilot`
+(Windows).
+
+**Storage: the Nitrite `.db`.** An H2 MVStore file (header
+`H:2,block:9,…format:3`) of Java-serialized Nitrite documents (`NtAgentSession`,
+`NtAgentTurn`). It is read as `latin1` (byte-offset-stable, lossless) and scanned
+— no Java deserializer, no new deps, and it is **not** SQLite so `node:sqlite` is
+not used. Each assistant reply is a `{"__first__":{"type":"Subgraph",…}}` blob.
+`extractResponseText` recovers the reply by unescaping one level at a time and,
+at the first depth where the record markers appear bare, reading the reply
+**structurally** (the payload is parsed as a delimited JSON-string literal, so a
+reply containing its own quotes is never truncated).
+
+**Two turn shapes, both handled** (a blob is one or the other — verified across
+every observed store that they never coexist):
+
+- **Ask mode** — the reply is a `Markdown` record's `text`.
+- **Agent / plan mode** (agent sessions, `/plan …`, e.g. in PyCharm) — the reply
+  is the `reply` field of an `AgentRound` record; here the `Markdown` records
+  hold the *user's* prompt instead. The mode is decided by the **presence** of an
+  `AgentRound` record, and only its `reply` is read — so an agent turn with an
+  empty reply (a failed turn or a pure tool-call round) is billed **$0** rather
+  than falling back to the prompt. A multi-round blob contributes every non-empty
+  round's reply.
+
+Sidecar records that plan/agent mode also writes — `Thinking` (chain-of-thought),
+`PendingChanges` (proposed code diff, stored under `content` not `data`),
+`AskQuestion`, `Notification`, `SubTurn`, and file-read `text` results — are
+**not** billable assistant output and are deliberately skipped. User prompts are
+the simpler `{"<uuid>":{"type":"Value",…}}` value-maps.
+
+**Old plugin format (≤1.5.x, e.g. 1.5.59-243).** Older plugins do not write
+per-turn `__first__`/Subgraph blobs at all — they store the whole session as ONE
+binary-framed outer Nitrite document of UUID-keyed `Value` entries, with the
+`AgentRound` records one escaping level deeper. When the Subgraph scan finds no
+turns but the raw file contains `AgentRound` text, a fallback locates that outer
+document (`extractJetBrainsDbTurns`), runs it through the same
+`extractResponseText` depth-unescape, and emits **one session-level call** per
+document (all rounds' replies joined). Cost and tokens are correct; only the
+per-turn call-count granularity is coarser than the new format — an accepted
+tradeoff for legacy data. The fallback is gated on the new-format scan yielding
+nothing, so current sessions are never affected or double-counted.
+
+(Store dirs may also contain a legacy `00000000000.xd` Xodus log from older
+plugin versions. On every installation observed it is either empty or shadowed
+by the `.db`, so CodeBurn reads only the `.db`. If a real `.xd`-only session ever
+surfaces, add a reader with a captured fixture.)
+
+- **No token accounting.** No store records token counts. Output tokens are
+  **estimated** from the reply text via `estimateTokens` (`CHARS_PER_TOKEN = 4`,
+  as for Cursor and legacy Copilot JSONL); input tokens are 0; every JetBrains
+  call is marked `costIsEstimated: true`.
+- **Errored turns.** A failed generation ("Sorry, an error occurred …") is stored
+  as an assistant blob with an error status and no reply text; it is detected and
+  billed **$0** (not conflated with an empty success). In agent mode a failed turn
+  has an empty `AgentRound` reply — the parser does not fall back to the prompt
+  `Markdown`, so the user's words are never billed as the assistant's output.
+- **Per-turn model.** The model varies per turn within one `.db`. It is recovered
+  from inside the assistant blob when present, else a store-wide default, else a
+  generic Copilot bucket. Dotted Claude names are normalised to canonical ids
+  (`claude-opus-4.5` → `claude-opus-4-5`); GPT/Gemini names kept verbatim.
+- **Duplicates.** The store keeps several byte-copies of each reply (original,
+  lowercased, revisions); assistant turns are de-duplicated by reply content.
+- **One `.db` holds many chat tabs.** A single store `.db` contains multiple
+  conversations, each with an internal GUID and an evolving title
+  (`New Agent Session` → auto-name → final title). CodeBurn recovers the
+  `GUID → title` map (`extractJetBrainsConversations`, keeping the latest
+  non-default title), attributes each turn to the nearest preceding conversation
+  GUID, and emits **one session per conversation** (not one per `.db`). Reply
+  content is de-duplicated per conversation.
+- **Project.** Resolved in three tiers, most authoritative first:
+  1. **`projectName` field (plugin 1.12+).** Recent plugins serialize the repo
+     label directly on the session doc (`extractJetBrainsProjectName`) — the
+     JetBrains analogue of the OTel source's `github.copilot.git.repository`.
+     **Cross-kind join:** the billable turns live in `chat-agent-sessions`, but
+     the `projectName` is usually written only into the sibling
+     `chat-sessions` / `chat-edit-sessions` store. Discovery
+     (`resolveJetBrainsProjectNames`) joins them by **store id** so the agent
+     session inherits the label from whichever store recorded it. Read
+     length-prefixed (Java `TC_STRING`) so an embedded quote/newline can't
+     truncate it.
+  2. **`.git` walk-up (older plugins / no `projectName`).** For each `file://`
+     URI a chat referenced, walk UP the real filesystem to the nearest ancestor
+     containing a `.git` and use that repo's basename (e.g. `pinot`).
+  3. **`copilot-jetbrains`** bucket when neither signal exists (chat referenced
+     no files and no `projectName` was recorded, or the repo no longer exists on
+     disk).
+
+  The conversation **title** is a chat-thread name, NOT a project — it is the
+  session label (`userMessage`) and deliberately kept out of `project` so it does
+  not pollute the By-Project view. Note that `bg-agent-sessions/` (a newer kind
+  dir holding `copilot-agent-snapshots.db` / `copilot-session-metadata.db`) is
+  **not** scanned: those DBs carry file snapshots and metadata, not billable
+  turns, and the same session's turns are already read from
+  `chat-agent-sessions`.
+- **Override the root** with `CODEBURN_COPILOT_JETBRAINS_DIR`.
+
 ## Caching
 
 None for the JSONL sources. The OTel source uses a durable cache (see above).
 
 ## Deduplication
 
-Legacy JSONL and transcript sessions dedupe per `messageId`. Core chat sessions dedupe per `copilot-chatsession:<sessionId>:<requestId>`, and are not discovered when an OTel source is present.
+Legacy JSONL and transcript sessions dedupe per `messageId`. Core chat sessions dedupe per `copilot-chatsession:<sessionId>:<requestId>`, and are not discovered when an OTel source is present. JetBrains `.db` turns dedupe per `copilot:jb:<conversationId>:<turnIndex>` (a per-conversation index, plus reply-content dedup within each conversation). These sources otherwise touch disjoint locations from the VS Code / CLI sources.
 
 If a workspace hash contains at least one `chatSessions/*.jsonl` file, the provider skips that hash's legacy `GitHub.copilot-chat/transcripts/` directory. The core chat session journal is the modern token-bearing source for the same conversations, so reading both would inflate call counts.
 

diff --git a/src/fs-utils.ts b/src/fs-utils.ts
@@ -29,7 +29,10 @@ function notice(msg: string): void {
   process.stderr.write(`codeburn: ${msg}\n`)
 }
 
-export async function readSessionFile(filePath: string): Promise<string | null> {
+export async function readSessionFile(
+  filePath: string,
+  encoding: BufferEncoding = 'utf-8'
+): Promise<string | null> {
   let size: number
   try {
     size = (await stat(filePath)).size
@@ -44,7 +47,7 @@ export async function readSessionFile(filePath: string): Promise<string | null>
   }
 
   try {
-    return await readFile(filePath, 'utf-8')
+    return await readFile(filePath, encoding)
   } catch (err) {
     warn(`read failed for ${filePath}: ${(err as NodeJS.ErrnoException).code ?? 'unknown'}`)
     return null