Skip to content

feat(copilot): track GitHub Copilot JetBrains IDE usage#608

Merged
iamtoruk merged 5 commits into
getagentseal:mainfrom
NihalJain:feat/jetbrains-copilot-tracking
Jul 5, 2026
Merged

feat(copilot): track GitHub Copilot JetBrains IDE usage#608
iamtoruk merged 5 commits into
getagentseal:mainfrom
NihalJain:feat/jetbrains-copilot-tracking

Conversation

@NihalJain

@NihalJain NihalJain commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What & why

The JetBrains Copilot plugin (IntelliJ, PyCharm, RubyMine, …) stores its chat/agent sessions under ~/.config/github-copilot/<ide>/<kind>/<storeId>/ — a location none of the existing Copilot sources (CLI JSONL, VS Code chat sessions/transcripts, OTel SQLite) read. As a result all JetBrains Copilot usage was silently uncounted in every CodeBurn report. This adds a reader for that store so those sessions are discovered, priced, and attributed to the right project.

Fixes #211

How it works

  • Reader. The store's session content is a Nitrite .db — an H2 MVStore of Java-serialized documents. It is scanned as latin1 for byte-offset stability: no Java deserializer, no new dependency, and it is not SQLite so node:sqlite is not involved.
  • Reply text. Assistant replies live in nested-escaped {"__first__":{"type":"Subgraph"…}} blobs. The text is recovered by unescaping one level at a time and, at the depth where the Markdown record's data field is a well-formed one-level-escaped JSON document, reading it structurally — so a reply containing its own quotes is never truncated or duplicated (which would otherwise inflate the estimate).
  • Tokens/cost. The store records no token counts, so output tokens are estimated from the reply text (CHARS_PER_TOKEN = 4, re-decoded latin1→utf8 so multibyte replies count by codepoint) and every call is marked costIsEstimated. Failed generations (error status, no reply) are billed $0.
  • Sessions. One .db holds many chat tabs; turns are grouped back to their conversation GUID so the UI shows one session per tab, deduped by reply content per conversation.
  • Project attribution, most authoritative first:
    1. the plugin-recorded projectName field (JetBrains Copilot 1.12+), joined across kind dirs by store id — the billable turns live in chat-agent-sessions, but the label is usually written into the sibling chat-sessions/chat-edit-sessions store. Read length-delimited and re-decoded latin1→utf8 so non-ASCII repo names round-trip.
    2. the .git repo root of a referenced file:// path.
    3. a generic copilot-jetbrains bucket when neither signal exists. The conversation title is a chat-thread name, not a project, so it is kept out of the project field and surfaced as the session label instead.

Override the JetBrains github-copilot root with
CODEBURN_COPILOT_JETBRAINS_DIR.

Docs

  • docs/providers/copilot.md — full JetBrains section (store layout, latin1 scan, reply extraction, projectName precedence + cross-kind join).
  • docs/providers/README.md — Copilot storage updated to note the Nitrite .db.

How to verify

  • npm test -- copilot and npx tsc --noEmit (fixtures reproduce the real nested-escaped .db framing, including quote- and multibyte-bearing replies).
  • End to end against a real install: CODEBURN_CACHE_DIR=$(mktemp -d) node dist/cli.js status --provider copilot \ --period all --format menubar-json — JetBrains sessions appear By-Project under their real repo names.
  • Set CODEBURN_COPILOT_JETBRAINS_DIR to a fixture root to parse a controlled store without touching the real config dir.

Summary

Testing

  • I have tested this locally against real data (not just unit tests)
  • npm test passes
  • npm run build succeeds

For new providers only:

  • I installed the tool and generated real sessions by using it
  • npm run dev -- today shows correct costs and session counts for this provider
  • npm run dev -- models --provider <name> shows correct model names and pricing
  • Screenshot or terminal output attached below proving it works with real data

NihalJain added 2 commits July 3, 2026 16:11
## What & why

The JetBrains Copilot plugin (IntelliJ, PyCharm, RubyMine, …) stores its
chat/agent sessions under `~/.config/github-copilot/<ide>/<kind>/<storeId>/` —
a location none of the existing Copilot sources (CLI JSONL, VS Code chat
sessions/transcripts, OTel SQLite) read. As a result all JetBrains Copilot
usage was silently uncounted in every CodeBurn report. This adds a reader for
that store so those sessions are discovered, priced, and attributed to the
right project.

## How it works

- **Reader.** The store's session content is a Nitrite `.db` — an H2 MVStore of
  Java-serialized documents. It is scanned as `latin1` for byte-offset
  stability: no Java deserializer, no new dependency, and it is not SQLite so
  `node:sqlite` is not involved.
- **Reply text.** Assistant replies live in nested-escaped
  `{"__first__":{"type":"Subgraph"…}}` blobs. The text is recovered by
  unescaping one level at a time and, at the depth where the Markdown record's
  `data` field is a well-formed one-level-escaped JSON document, reading it
  structurally — so a reply containing its own quotes is never truncated or
  duplicated (which would otherwise inflate the estimate).
- **Tokens/cost.** The store records no token counts, so output tokens are
  estimated from the reply text (`CHARS_PER_TOKEN = 4`, re-decoded
  latin1→utf8 so multibyte replies count by codepoint) and every call is marked
  `costIsEstimated`. Failed generations (error status, no reply) are billed $0.
- **Sessions.** One `.db` holds many chat tabs; turns are grouped back to their
  conversation GUID so the UI shows one session per tab, deduped by reply
  content per conversation.
- **Project attribution**, most authoritative first:
  1. the plugin-recorded `projectName` field (JetBrains Copilot 1.12+), joined
     across kind dirs by store id — the billable turns live in
     `chat-agent-sessions`, but the label is usually written into the sibling
     `chat-sessions`/`chat-edit-sessions` store. Read length-delimited and
     re-decoded latin1→utf8 so non-ASCII repo names round-trip.
  2. the `.git` repo root of a referenced `file://` path.
  3. a generic `copilot-jetbrains` bucket when neither signal exists.
  The conversation title is a chat-thread name, not a project, so it is kept
  out of the project field and surfaced as the session label instead.

Override the JetBrains github-copilot root with
`CODEBURN_COPILOT_JETBRAINS_DIR`.

## Docs

- `docs/providers/copilot.md` — full JetBrains section (store layout, latin1
  scan, reply extraction, projectName precedence + cross-kind join).
- `docs/providers/README.md` — Copilot storage updated to note the Nitrite .db.

## How to verify

- `npm test -- copilot` and `npx tsc --noEmit` (fixtures reproduce the real
  nested-escaped .db framing, including quote- and multibyte-bearing replies).
- End to end against a real install:
  `CODEBURN_CACHE_DIR=$(mktemp -d) node dist/cli.js status --provider copilot \
     --period all --format menubar-json`
  — JetBrains sessions appear By-Project under their real repo names.
- Set `CODEBURN_COPILOT_JETBRAINS_DIR` to a fixture root to parse a controlled
  store without touching the real config dir.
…trix

The README "Data location" support matrix listed GitHub Copilot as only the
legacy CLI and VS Code transcript sources. Update the row to reflect all
sources the provider actually reads — the OpenTelemetry `agent-traces.db`
(preferred when present) and the JetBrains IDE Nitrite `.db` — and how the
project is resolved. Links to docs/providers/copilot.md for the full detail.
@NihalJain NihalJain force-pushed the feat/jetbrains-copilot-tracking branch from 030c51f to ccc9deb Compare July 3, 2026 10:54
JetBrains Copilot has two turn shapes in the Nitrite .db:

- ask mode — the reply is a `Markdown` record's `text`;
- agent / plan mode (e.g. PyCharm agent sessions, `/plan …`) — the reply is the
  `reply` field of an `AgentRound` record, and the `Markdown` record instead
  holds the USER's prompt.

extractResponseText only read Markdown, so agent-mode turns yielded no reply
text: they were discovered (session/turn counts showed up) but priced at $0
because output tokens came out zero. On this machine that silently
under-counted a PyCharm session ($0 → $0.35) and several IntelliJ agent turns.

Determine the mode by the PRESENCE of an `AgentRound` record and read only that
record's `reply` (collecting every non-empty round in a multi-round blob).
Crucially, an agent blob whose reply is empty — a failed turn or a pure
tool-call round — does NOT fall back to the Markdown record, so a user prompt
is never mistaken for the assistant's output; such turns bill $0 as before.
Ask-mode blobs (no AgentRound) keep reading Markdown. Plan mode's sidecar
records — Thinking, PendingChanges (proposed diff, under `content`), AskQuestion,
Notification, SubTurn, and file-read `text` results — are never read as output.
Verified across all local stores: the two reply shapes never coexist in one
blob, so the split is unambiguous.

Tests: agent-mode reply extraction (ignoring the prompt Markdown), pure
tool-call rounds → $0, multi-round collection, and a failed agent turn → $0.
docs/providers/copilot.md documents both turn shapes and the ignored sidecar
records.
@NihalJain NihalJain marked this pull request as draft July 3, 2026 12:21
…≤1.5.x)

JetBrains Copilot plugin ≤1.5.x (e.g. 1.5.59-243) stores all session turns
inside ONE large binary-framed outer Nitrite document, rather than the
per-turn {"__first__":{"type":"Subgraph",...}} blobs introduced in later
plugins (≥1.12.x, e.g. 1.12.1-251).

In the old format each assistant turn is a UUID-keyed Value entry whose
value field contains a JSON-string-escaped AgentRound record:

  {"<uuid>":{"type":"Value","value":"{\"type\":\"AgentRound\",
    \"data\":\"{...reply...}\"}"}, ...}

The extractResponseText depth-unescape loop already handles this one extra
level of escaping; the only gap was that extractJetBrainsDbTurns never fed
it the outer document — it only scanned for __first__/Subgraph blobs, which
the old plugin never writes.

Add a fallback that activates when the Subgraph scan produces zero turns but
'AgentRound' text is present in the raw file (old-format signal). It locates
the binary-framed outer document (UUID-keyed Value entry, hex matched
case-insensitively so an uppercase UUID does not fall through to $0), extracts
it with matchJsonObject, and passes it to extractResponseText. Because the outer
document holds every turn in one blob, this emits ONE session-level call per
document (all rounds' replies joined): cost/tokens are correct, only the
per-turn call-count granularity is coarser — an accepted tradeoff for legacy
data. MVStore keeps two identical collection copies; seenReplies dedupes them.

The fallback is guarded by turns.length === 0 so new-format sessions (whose
Subgraph scan succeeds) are completely unaffected and never double-counted.

Tests: old-format doc with multiple AgentRound rounds → 1 call whose token
count equals the two non-empty replies joined (the empty tool-call round is
excluded); an uppercase-UUID variant (fails without the case-insensitive
match); and a guard that new-format Subgraph turns are not double-counted.
docs/providers/copilot.md documents the old format and the one-call-per-session
limitation.
@NihalJain NihalJain force-pushed the feat/jetbrains-copilot-tracking branch from ee66693 to cd07707 Compare July 3, 2026 12:56
@NihalJain

Copy link
Copy Markdown
Contributor Author

Screenshot (Before) with npx codeburn@0.9.15:
Screenshot 2026-07-03 at 6 33 13 PM

Screenshot (After) this patch:
Screenshot 2026-07-03 at 6 33 54 PM

@NihalJain NihalJain marked this pull request as ready for review July 3, 2026 13:04
@NihalJain

Copy link
Copy Markdown
Contributor Author

@iamtoruk

…nv var in tests

Maintainer follow-up:

- Derive JetBrains dedup keys from the reply content (sha256 prefix plus a
  per-hash occurrence counter) instead of the blob's scan position. Copilot
  is a durable provider: cached turns are never deleted and a re-parse
  appends any unseen key, while MVStore compaction can rewrite the store
  with blobs in a different byte order. With positional keys, a rewrite
  that moves a new blob ahead of an old one hands the new turn the old
  key (skipped as seen) and re-emits the old turn under a fresh index,
  double-billing it. Covered by a regression test that fails on the
  positional scheme.
- Add CODEBURN_COPILOT_JETBRAINS_DIR to the env-isolation cleared list so
  a developer's real JetBrains store never bleeds into fixture tests.
@iamtoruk

iamtoruk commented Jul 5, 2026

Copy link
Copy Markdown
Member

This is excellent work. The structural read of the escaped blobs, the agent-mode handling that never bills the user's prompt as output, the errored-turn $0 path, and the cross-kind projectName join are all the right calls, and the docs are genuinely useful. Verified locally: full suite and typecheck pass on this branch, and again after merging current main into it.

I pushed one maintainer commit (d2edf8c) with two fixes so this can merge now:

  1. Content-derived dedup keys. The keys were positional (conversation + scan index). Copilot is a durable provider: cached turns are never deleted, and a re-parse appends any key it has not seen. MVStore compaction can rewrite the store with blobs in a different byte order, and a rewrite that moves a new blob ahead of an old one hands the new turn the old turn's key (skipped as already seen) and re-emits the old turn under a fresh index, double-billing it while the new turn goes uncounted. The keys now derive from the reply content (sha256 prefix plus a per-hash occurrence counter, which also keeps errored turns with empty replies distinct). There is a regression test that reproduces the compaction scenario and fails on the positional scheme.

  2. Test isolation for the new env var. CODEBURN_COPILOT_JETBRAINS_DIR is now in the env-isolation cleared list, so a developer who has it set in their shell cannot leak their real JetBrains store into the pre-existing copilot fixture tests.

Follow-ups worth their own issue, none blocking: discovery re-reads every .db in full on each scan for the projectName join (worth memoizing by path and mtime, the menubar rescans often); the conversation-title regex only matches printable ASCII, so non-Latin titles fall out of session grouping; the model token list will need topping up as Copilot adds models (it currently lacks a few recent ones); and the git walk-up never resolves Windows file URIs, though the projectName tier covers Windows users on current plugins. Happy to file that issue with details if you want to pick any of them up.

One accepted limitation worth knowing about: the store has no per-turn timestamps, so turns are stamped with the file mtime, and the first import of a long-lived active store lands its history on one day. The durable cache then freezes each turn's first-seen stamp, so it self-corrects going forward.

Thanks for closing the oldest gap in the provider matrix. Merging once CI is green.

@iamtoruk iamtoruk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally: full suite and typecheck on the branch head, again after merging current main, and the new compaction regression test fails on the pre-fix positional keys and passes on the content-derived ones.

@iamtoruk iamtoruk merged commit d41ca11 into getagentseal:main Jul 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add token usage for copilot in Intellij IDEA

2 participants