Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@ These are starting points, not verdicts. A 60% cache hit on a single experimenta
| **OpenCode** | SQLite `~/.local/share/opencode/opencode*.db` (respects `XDG_DATA_HOME`) | Queries `session`, `message`, and `part` read-only and recalculates cost via LiteLLM (falling back to OpenCode's own cost field for unpriced models). Subtask sessions (`parent_id IS NOT NULL`) are excluded to avoid double counting; multiple channel databases are supported. |
| **Gemini CLI** | `~/.gemini/tmp/<project>/chats/session-*.json` | One JSON file per session with real token counts (input, output, cached, thoughts) per message, so no estimation is needed. Input is reported inclusive of cached, so CodeBurn subtracts cached before pricing to avoid double charging. |
| **Antigravity (CLI & IDE)** | Session files under `.gemini/` folders, plus the running language server | Pulls granular trajectory and pricing from the language server process. For the short-lived CLI, optionally install a status-line hook with `codeburn antigravity-hook install` so usage is captured between menubar refreshes. The IDE is detected via the `--app-data-dir antigravity-ide` flag on Windows. |
| **GitHub Copilot** | `~/.copilot/session-state/` (legacy CLI) and VS Code/VSCodium `workspaceStorage/*/GitHub.copilot-chat/transcripts/` | Editor transcripts carry no explicit token counts, so tokens are estimated from content length and the model is inferred from tool call ID prefixes. |
| **GitHub Copilot** | `~/.copilot/session-state/` (legacy CLI); VS Code/VSCodium `workspaceStorage/*` chat sessions, `GitHub.copilot-chat/transcripts/`, and the `agent-traces.db` OpenTelemetry store; JetBrains IDEs (IntelliJ, PyCharm, …) under `~/.config/github-copilot/<ide>/<kind>/<storeId>/copilot-*-nitrite.db` | The OTel SQLite store is preferred when present (it carries real input/output/cache token counts). Other sources carry no explicit counts, so tokens are estimated from content length and the model is inferred from tool call ID prefixes. JetBrains sessions read from a Nitrite (H2 MVStore) `.db`; project comes from the plugin's `projectName` field (else the `.git` root of a referenced file). See [docs/providers/copilot.md](docs/providers/copilot.md). |
| **Kiro** | `.chat` JSON files | Token counts are estimated from content length. The model is not exposed, so sessions are labeled `kiro-auto` and costed at Sonnet rates. |
| **Mistral Vibe** | `~/.vibe/logs/session/` (or `$VIBE_HOME/logs/session/`); each folder has `meta.json` + `messages.jsonl` | Reads cumulative prompt/completion totals and model pricing from `meta.json`, then the first user prompt and tool calls from `messages.jsonl`. Emits one record per session (source data is cumulative, not per turn); subagent sessions under `agents/` are counted separately. |
| **OpenClaw** | `~/.openclaw/agents/*.jsonl` (legacy `.clawdbot`, `.moltbot`, `.moldbot`) | Token usage comes from assistant message `usage` blocks; the model from `modelId` or `message.model`. |
Expand Down
2 changes: 1 addition & 1 deletion docs/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ For the architectural picture, see `../architecture.md`.
| [Claude](claude.md) | JSONL (no parser) | `src/providers/claude.ts` | none (covered indirectly) |
| [Cline](cline.md) | JSON | `src/providers/cline.ts` | `tests/providers/cline.test.ts` |
| [Codex](codex.md) | JSONL | `src/providers/codex.ts` | `tests/providers/codex.test.ts` |
| [Copilot](copilot.md) | JSONL | `src/providers/copilot.ts` | `tests/providers/copilot.test.ts` |
| [Copilot](copilot.md) | JSONL + SQLite (OTel) + Nitrite .db (JetBrains) | `src/providers/copilot.ts` | `tests/providers/copilot.test.ts` |
| [Devin](devin.md) | JSON + SQLite enrichment | `src/providers/devin.ts` | `tests/providers/devin.test.ts` |
| [Droid](droid.md) | JSONL | `src/providers/droid.ts` | `tests/providers/droid.test.ts` |
| [Gemini](gemini.md) | JSON / JSONL | `src/providers/gemini.ts` | none |
Expand Down
120 changes: 117 additions & 3 deletions docs/providers/copilot.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Copilot

GitHub Copilot Chat (CLI, VS Code core chat sessions, and VS Code extension transcripts).
GitHub Copilot Chat (CLI, VS Code core chat sessions, VS Code extension transcripts, and JetBrains IDE sessions).

- **Source:** `src/providers/copilot.ts`
- **Loading:** eager (`src/providers/index.ts:3`)
Expand All @@ -16,10 +16,11 @@ Other discovered sources are walked on every run; results merge and dedupe.
2. **VS Code core chat sessions:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/chatSessions/*.jsonl` plus `~/Library/Application Support/Code/User/globalStorage/emptyWindowChatSessions/*.jsonl` and equivalents on Windows / Linux
3. **VS Code transcripts:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/GitHub.copilot-chat/transcripts/` and equivalents on Windows / Linux
4. **OTel SQLite store:** VS Code Copilot Chat's `agent-traces.db` (see the OTel section). Preferred when present because it carries full input / output / cache token counts; legacy JSONL sources only record output tokens.
5. **JetBrains IDE sessions:** `~/.config/github-copilot/<ide>/<kind>/<storeId>/copilot-*-nitrite.db` (see the JetBrains section). Covers IntelliJ IDEA, PyCharm, RubyMine, etc.

## Storage format

JSONL in the first three locations (schemas differ; the parser switches by source type / event shape), and a SQLite DB for the OTel source. VS Code core chat sessions use a delta journal: `kind:0` sets the root object, `kind:1` writes a value at path `k`, and `kind:2` appends items to an array path.
JSONL in the first three locations (schemas differ; the parser switches by source type / event shape), a SQLite DB for the OTel source, and a Nitrite (H2 MVStore) `.db` for the JetBrains source. VS Code core chat sessions use a delta journal: `kind:0` sets the root object, `kind:1` writes a value at path `k`, and `kind:2` appends items to an array path.

## OpenTelemetry (OTel) source

Expand All @@ -44,13 +45,126 @@ instead of trying to dedupe across stores.
before the upgrade cannot be recovered, so monotonicity starts from the upgrade point,
not retroactively.

## JetBrains IDEs (IntelliJ, PyCharm, …)

The JetBrains Copilot plugin does **not** write to any of the VS Code or CLI
locations above. It persists chat/agent sessions under the shared GitHub Copilot
config root, in one store directory per session store:

```
~/.config/github-copilot/<ide>/<kind>/<storeId>/
copilot-*-nitrite.db # Nitrite (H2 MVStore) — the session content
blobs/
```

`<ide>` is a per-IDE dir (`iu` for IntelliJ IDEA Ultimate, `intellij` for the
community edition, `PyCharm2025.2`, …). `<kind>` ∈ `chat-agent-sessions`,
`chat-sessions`, `chat-edit-sessions` (agent / ask / edit mode). The root follows
XDG rules: `$XDG_CONFIG_HOME/github-copilot` when set, else
`~/.config/github-copilot` (macOS / Linux) or `%LOCALAPPDATA%\github-copilot`
(Windows).

**Storage: the Nitrite `.db`.** An H2 MVStore file (header
`H:2,block:9,…format:3`) of Java-serialized Nitrite documents (`NtAgentSession`,
`NtAgentTurn`). It is read as `latin1` (byte-offset-stable, lossless) and scanned
— no Java deserializer, no new deps, and it is **not** SQLite so `node:sqlite` is
not used. Each assistant reply is a `{"__first__":{"type":"Subgraph",…}}` blob.
`extractResponseText` recovers the reply by unescaping one level at a time and,
at the first depth where the record markers appear bare, reading the reply
**structurally** (the payload is parsed as a delimited JSON-string literal, so a
reply containing its own quotes is never truncated).

**Two turn shapes, both handled** (a blob is one or the other — verified across
every observed store that they never coexist):

- **Ask mode** — the reply is a `Markdown` record's `text`.
- **Agent / plan mode** (agent sessions, `/plan …`, e.g. in PyCharm) — the reply
is the `reply` field of an `AgentRound` record; here the `Markdown` records
hold the *user's* prompt instead. The mode is decided by the **presence** of an
`AgentRound` record, and only its `reply` is read — so an agent turn with an
empty reply (a failed turn or a pure tool-call round) is billed **$0** rather
than falling back to the prompt. A multi-round blob contributes every non-empty
round's reply.

Sidecar records that plan/agent mode also writes — `Thinking` (chain-of-thought),
`PendingChanges` (proposed code diff, stored under `content` not `data`),
`AskQuestion`, `Notification`, `SubTurn`, and file-read `text` results — are
**not** billable assistant output and are deliberately skipped. User prompts are
the simpler `{"<uuid>":{"type":"Value",…}}` value-maps.

**Old plugin format (≤1.5.x, e.g. 1.5.59-243).** Older plugins do not write
per-turn `__first__`/Subgraph blobs at all — they store the whole session as ONE
binary-framed outer Nitrite document of UUID-keyed `Value` entries, with the
`AgentRound` records one escaping level deeper. When the Subgraph scan finds no
turns but the raw file contains `AgentRound` text, a fallback locates that outer
document (`extractJetBrainsDbTurns`), runs it through the same
`extractResponseText` depth-unescape, and emits **one session-level call** per
document (all rounds' replies joined). Cost and tokens are correct; only the
per-turn call-count granularity is coarser than the new format — an accepted
tradeoff for legacy data. The fallback is gated on the new-format scan yielding
nothing, so current sessions are never affected or double-counted.

(Store dirs may also contain a legacy `00000000000.xd` Xodus log from older
plugin versions. On every installation observed it is either empty or shadowed
by the `.db`, so CodeBurn reads only the `.db`. If a real `.xd`-only session ever
surfaces, add a reader with a captured fixture.)

- **No token accounting.** No store records token counts. Output tokens are
**estimated** from the reply text via `estimateTokens` (`CHARS_PER_TOKEN = 4`,
as for Cursor and legacy Copilot JSONL); input tokens are 0; every JetBrains
call is marked `costIsEstimated: true`.
- **Errored turns.** A failed generation ("Sorry, an error occurred …") is stored
as an assistant blob with an error status and no reply text; it is detected and
billed **$0** (not conflated with an empty success). In agent mode a failed turn
has an empty `AgentRound` reply — the parser does not fall back to the prompt
`Markdown`, so the user's words are never billed as the assistant's output.
- **Per-turn model.** The model varies per turn within one `.db`. It is recovered
from inside the assistant blob when present, else a store-wide default, else a
generic Copilot bucket. Dotted Claude names are normalised to canonical ids
(`claude-opus-4.5` → `claude-opus-4-5`); GPT/Gemini names kept verbatim.
- **Duplicates.** The store keeps several byte-copies of each reply (original,
lowercased, revisions); assistant turns are de-duplicated by reply content.
- **One `.db` holds many chat tabs.** A single store `.db` contains multiple
conversations, each with an internal GUID and an evolving title
(`New Agent Session` → auto-name → final title). CodeBurn recovers the
`GUID → title` map (`extractJetBrainsConversations`, keeping the latest
non-default title), attributes each turn to the nearest preceding conversation
GUID, and emits **one session per conversation** (not one per `.db`). Reply
content is de-duplicated per conversation.
- **Project.** Resolved in three tiers, most authoritative first:
1. **`projectName` field (plugin 1.12+).** Recent plugins serialize the repo
label directly on the session doc (`extractJetBrainsProjectName`) — the
JetBrains analogue of the OTel source's `github.copilot.git.repository`.
**Cross-kind join:** the billable turns live in `chat-agent-sessions`, but
the `projectName` is usually written only into the sibling
`chat-sessions` / `chat-edit-sessions` store. Discovery
(`resolveJetBrainsProjectNames`) joins them by **store id** so the agent
session inherits the label from whichever store recorded it. Read
length-prefixed (Java `TC_STRING`) so an embedded quote/newline can't
truncate it.
2. **`.git` walk-up (older plugins / no `projectName`).** For each `file://`
URI a chat referenced, walk UP the real filesystem to the nearest ancestor
containing a `.git` and use that repo's basename (e.g. `pinot`).
3. **`copilot-jetbrains`** bucket when neither signal exists (chat referenced
no files and no `projectName` was recorded, or the repo no longer exists on
disk).

The conversation **title** is a chat-thread name, NOT a project — it is the
session label (`userMessage`) and deliberately kept out of `project` so it does
not pollute the By-Project view. Note that `bg-agent-sessions/` (a newer kind
dir holding `copilot-agent-snapshots.db` / `copilot-session-metadata.db`) is
**not** scanned: those DBs carry file snapshots and metadata, not billable
turns, and the same session's turns are already read from
`chat-agent-sessions`.
- **Override the root** with `CODEBURN_COPILOT_JETBRAINS_DIR`.

## Caching

None for the JSONL sources. The OTel source uses a durable cache (see above).

## Deduplication

Legacy JSONL and transcript sessions dedupe per `messageId`. Core chat sessions dedupe per `copilot-chatsession:<sessionId>:<requestId>`, and are not discovered when an OTel source is present.
Legacy JSONL and transcript sessions dedupe per `messageId`. Core chat sessions dedupe per `copilot-chatsession:<sessionId>:<requestId>`, and are not discovered when an OTel source is present. JetBrains `.db` turns dedupe per `copilot:jb:<conversationId>:<turnIndex>` (a per-conversation index, plus reply-content dedup within each conversation). These sources otherwise touch disjoint locations from the VS Code / CLI sources.

If a workspace hash contains at least one `chatSessions/*.jsonl` file, the provider skips that hash's legacy `GitHub.copilot-chat/transcripts/` directory. The core chat session journal is the modern token-bearing source for the same conversations, so reading both would inflate call counts.

Expand Down
7 changes: 5 additions & 2 deletions src/fs-utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ function notice(msg: string): void {
process.stderr.write(`codeburn: ${msg}\n`)
}

export async function readSessionFile(filePath: string): Promise<string | null> {
export async function readSessionFile(
filePath: string,
encoding: BufferEncoding = 'utf-8'
): Promise<string | null> {
let size: number
try {
size = (await stat(filePath)).size
Expand All @@ -44,7 +47,7 @@ export async function readSessionFile(filePath: string): Promise<string | null>
}

try {
return await readFile(filePath, 'utf-8')
return await readFile(filePath, encoding)
} catch (err) {
warn(`read failed for ${filePath}: ${(err as NodeJS.ErrnoException).code ?? 'unknown'}`)
return null
Expand Down
Loading
Loading