feat(session): persist boot context across reconnect and switch#360
Conversation
Boot context pills disappeared on iOS reconnect and session switch because the boot_context WS message was only sent once (fire-and-forget) and never persisted. This adds a three-layer persistence strategy: - ManagedSession in-memory cache (hot path — instant replay on reconnect) - SQLite sessions table agent_name + boot_context columns (cold path) - Re-send on handleReconnect and handleSwitchSession Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimakis
left a comment
There was a problem hiding this comment.
Centaur Review
Found 7 issue(s) (5 warning).
server/ws-handler-v2.ts
Solid implementation of boot_context persistence with correct serialization paths. Main concern: replayed boot_context messages lack sessionId, which breaks the multiplexed protocol contract and can cause cross-session bleed during multi-session reconnect. Missing test coverage for the new replay behaviors.
- 🟡 bugs (L283): Reconnect boot_context replay lacks
sessionId. The reconnect handler loops over all tracked sessions and sends boot_context for every running session. These messages have nosessionId, so the client's event filter treats them as global and applies each to the current messages state (SET_BOOT_CONTEXT overwrites). If a client tracks multiple running sessions, a non-active session's boot_context will overwrite the active session's. Fix: addsessionId: entry.sessionIdto the replayed boot_context message, matching how other per-session events are tagged.[fixable] - 🟡 bugs (L386): Same sessionId omission on switch_session boot_context replay. Both the hot path (in-memory) and cold path (JSON.parse) send boot_context without a
sessionId. This works today because switchSession resets messages state before the reply arrives, but it's inconsistent with the multiplexed protocol where all session-scoped events should carrysessionIdfor client demuxing.[fixable] - 🟡 unsafe_assumptions (L396): Cold-path JSON.parse has no shape validation.
sessionMeta.bootContextis parsed and spread directly into the WS message. If the stored JSON was written by an older code version with a different shape, or is corrupted, the client receives an unvalidated payload. The protocol-parser on the client side is defensive (checks each field type), so this isn't exploitable, but a basic typeof check onparsed(must be a non-null object) would prevent spreading a string or array into the message.[fixable] - 🔵 style (L401): Silent catch on JSON.parse — project conventions use
log.warn()for recoverable error paths. Adding a warn log with sessionId and the error message would help diagnose boot_context replay failures.[fixable]
server/__tests__/ws-handler-v2.test.ts
Solid implementation of boot_context persistence with correct serialization paths. Main concern: replayed boot_context messages lack sessionId, which breaks the multiplexed protocol contract and can cause cross-session bleed during multi-session reconnect. Missing test coverage for the new replay behaviors.
- 🟡 missing_tests: No test coverage for boot_context replay in handleReconnect or handleSwitchSession. Both hot path (in-memory ManagedSession.bootContext) and cold path (EventStore JSON parse) are untested. Given the project's TDD mandate, these behaviors should have tests: (1) reconnect sends cached boot_context for running sessions, (2) switch_session sends boot_context from registry (hot) or EventStore (cold), (3) switch_session handles invalid JSON gracefully.
[fixable]
server/__tests__/event-store.test.ts
Solid implementation of boot_context persistence with correct serialization paths. Main concern: replayed boot_context messages lack sessionId, which breaks the multiplexed protocol contract and can cause cross-session bleed during multi-session reconnect. Missing test coverage for the new replay behaviors.
- 🟡 missing_tests: No test for agentName and bootContext persistence round-trip in EventStore. The existing upsertSession tests cover goalId, wtId, branch, cwd, etc. — the new fields should follow suit with insert, update, and retrieval assertions.
[fixable]
packages/protocol/src/event-store.ts
Solid implementation of boot_context persistence with correct serialization paths. Main concern: replayed boot_context messages lack sessionId, which breaks the multiplexed protocol contract and can cause cross-session bleed during multi-session reconnect. Missing test coverage for the new replay behaviors.
- 🔵 style (L287): Migration method
migrateBootContextalso adds theagent_namecolumn. Consider renaming tomigrateBootContextAndAgentNameormigrateSessionAgentFieldsto accurately reflect its scope.[fixable]
| running, | ||
| }); | ||
|
|
||
| // Re-send cached boot_context so pills reappear after reconnect |
There was a problem hiding this comment.
🟡 bugs: Reconnect boot_context replay lacks sessionId. The reconnect handler loops over all tracked sessions and sends boot_context for every running session. These messages have no sessionId, so the client's event filter treats them as global and applies each to the current messages state (SET_BOOT_CONTEXT overwrites). If a client tracks multiple running sessions, a non-active session's boot_context will overwrite the active session's. Fix: add sessionId: entry.sessionId to the replayed boot_context message, matching how other per-session events are tagged. [fixable]
| // Hot path: running session in SessionRegistry (in-memory cache). | ||
| // Cold path: ended session — read serialized JSON from EventStore. | ||
| const found = ctx.sessionRegistry.findBySessionId(msg.sessionId); | ||
| if (found?.session?.bootContext) { |
There was a problem hiding this comment.
🟡 bugs: Same sessionId omission on switch_session boot_context replay. Both the hot path (in-memory) and cold path (JSON.parse) send boot_context without a sessionId. This works today because switchSession resets messages state before the reply arrives, but it's inconsistent with the multiplexed protocol where all session-scoped events should carry sessionId for client demuxing. [fixable]
| const parsed = JSON.parse(sessionMeta.bootContext); | ||
| ctx.connRegistry.get(connectionId)?.transport.send({ | ||
| type: 'boot_context', | ||
| ...parsed, |
There was a problem hiding this comment.
🟡 unsafe_assumptions: Cold-path JSON.parse has no shape validation. sessionMeta.bootContext is parsed and spread directly into the WS message. If the stored JSON was written by an older code version with a different shape, or is corrupted, the client receives an unvalidated payload. The protocol-parser on the client side is defensive (checks each field type), so this isn't exploitable, but a basic typeof check on parsed (must be a non-null object) would prevent spreading a string or array into the message. [fixable]
| } catch { | ||
| // Invalid JSON — skip | ||
| } | ||
| } |
There was a problem hiding this comment.
🔵 style: Silent catch on JSON.parse — project conventions use log.warn() for recoverable error paths. Adding a warn log with sessionId and the error message would help diagnose boot_context replay failures. [fixable]
| } | ||
| } | ||
|
|
||
| private migrateBootContext(db: Database.Database): void { |
There was a problem hiding this comment.
🔵 style: Migration method migrateBootContext also adds the agent_name column. Consider renaming to migrateBootContextAndAgentName or migrateSessionAgentFields to accurately reflect its scope. [fixable]
dimakis
left a comment
There was a problem hiding this comment.
Centaur Review
Found 7 issue(s) (4 warning).
server/ws-handler-v2.ts
Solid feature with correct serialization boundaries, but handleReconnect is missing the cold path for ended sessions (boot_context pills won't appear after reconnecting to a finished session), and there are no tests for the new reconnect/switch replay logic.
- 🟡 bugs (L284): handleReconnect only replays boot_context for running sessions (hot path). Non-running sessions have no cold path — unlike handleSwitchSession which falls back to EventStore's serialized bootContext. Since boot_context is sent via raw
send()(notsendOrBuffer), it is NOT stored in the EventStore event stream, so reconnecting to an ended session will never show boot context pills. Add anelsebranch that readsctx.eventStore.getSession(entry.sessionId)?.bootContextand parses it, mirroring the cold path in handleSwitchSession (lines 391–401).[fixable] - 🟡 unsafe_assumptions (L393): The cold path in handleSwitchSession does
JSON.parse(sessionMeta.bootContext)and spreads the result into a WS message with no shape validation. If the stored JSON has atypekey (which bootContext payloads don't, but could via corruption or future changes), it would overwrite thetype: 'boot_context'set on line 394. Consider parsing into a variable and deletingparsed.typebefore spreading, or validating the parsed shape.[fixable] - 🔵 style (L398): The
catchblock silently swallows JSON parse errors with no logging. Addlog.warn('failed to parse stored bootContext', { sessionId: msg.sessionId })for debuggability — corrupted bootContext would otherwise be invisible.[fixable] - 🟡 missing_tests: No tests cover boot_context replay on reconnect or session switch. Key scenarios to test: (1) handleReconnect re-sends bootContext from in-memory ManagedSession for running sessions, (2) handleSwitchSession re-sends from in-memory cache (hot path), (3) handleSwitchSession re-sends from EventStore JSON (cold path), (4) cold path gracefully handles invalid JSON. The existing test files for ws-handler-v2 and event-store have no bootContext coverage.
[fixable]
server/chat.ts
Solid feature with correct serialization boundaries, but handleReconnect is missing the cold path for ended sessions (boot_context pills won't appear after reconnecting to a finished session), and there are no tests for the new reconnect/switch replay logic.
- 🟡 bugs (L733): The resume upsert at line 724–734 persists
agentNamebut notbootContext. On resume, if the prior session had a cached bootContext in the EventStore, this upsert overwrites session metadata without preserving bootContext. The field won't be lost (SQL UPDATE only sets provided fields), but this is an asymmetry worth noting — the resume path reconstructs the boot context from scratch (the IIFE runs again), so the stale stored value will eventually be overwritten. No data loss, but if the boot context IIFE fails on resume, the old value persists, which is actually correct. No fix needed — noting for clarity. - 🔵 style (L784): The fallback boot_context object is duplicated three times (lines 784–792, 830–845, 921–928) with identical shape but verbose inline type annotations (
[] as Array<{ source: string; heading: string; tokens: number; content: string }>). Extract a helper likemakeEmptyBootContext(tokenBudget: number)to reduce duplication and the risk of shape drift between the three copies.[fixable]
packages/protocol/src/event-store.ts
Solid feature with correct serialization boundaries, but handleReconnect is missing the cold path for ended sessions (boot_context pills won't appear after reconnecting to a finished session), and there are no tests for the new reconnect/switch replay logic.
- 🔵 missing_tests: The migrateBootContext migration and the bootContext/agentName fields in upsertSession and rowToSession have no test coverage. A round-trip test (upsert with bootContext JSON string → getSession → verify deserialized fields) would prevent regressions in the persistence layer.
[fixable]
| }); | ||
|
|
||
| // Re-send cached boot_context so pills reappear after reconnect | ||
| if (found && running && found.session?.bootContext) { |
There was a problem hiding this comment.
🟡 bugs: handleReconnect only replays boot_context for running sessions (hot path). Non-running sessions have no cold path — unlike handleSwitchSession which falls back to EventStore's serialized bootContext. Since boot_context is sent via raw send() (not sendOrBuffer), it is NOT stored in the EventStore event stream, so reconnecting to an ended session will never show boot context pills. Add an else branch that reads ctx.eventStore.getSession(entry.sessionId)?.bootContext and parses it, mirroring the cold path in handleSwitchSession (lines 391–401). [fixable]
| }); | ||
| } else if (sessionMeta.bootContext) { | ||
| try { | ||
| const parsed = JSON.parse(sessionMeta.bootContext); |
There was a problem hiding this comment.
🟡 unsafe_assumptions: The cold path in handleSwitchSession does JSON.parse(sessionMeta.bootContext) and spreads the result into a WS message with no shape validation. If the stored JSON has a type key (which bootContext payloads don't, but could via corruption or future changes), it would overwrite the type: 'boot_context' set on line 394. Consider parsing into a variable and deleting parsed.type before spreading, or validating the parsed shape. [fixable]
| type: 'boot_context', | ||
| ...parsed, | ||
| }); | ||
| } catch { |
There was a problem hiding this comment.
🔵 style: The catch block silently swallows JSON parse errors with no logging. Add log.warn('failed to parse stored bootContext', { sessionId: msg.sessionId }) for debuggability — corrupted bootContext would otherwise be invisible. [fixable]
| ...(worktreePath ? { wtId } : {}), | ||
| ...(options.telosTaskId ? { telosTaskId: options.telosTaskId } : {}), | ||
| ...(existingMeta ? { updatedAt: existingMeta.updatedAt } : {}), | ||
| agentName, |
There was a problem hiding this comment.
🟡 bugs: The resume upsert at line 724–734 persists agentName but not bootContext. On resume, if the prior session had a cached bootContext in the EventStore, this upsert overwrites session metadata without preserving bootContext. The field won't be lost (SQL UPDATE only sets provided fields), but this is an asymmetry worth noting — the resume path reconstructs the boot context from scratch (the IIFE runs again), so the stale stored value will eventually be overwritten. No data loss, but if the boot context IIFE fails on resume, the old value persists, which is actually correct. No fix needed — noting for clarity.
| send(transport, { | ||
| type: 'boot_context', | ||
| source: 'local-fallback', | ||
| const fallback = { |
There was a problem hiding this comment.
🔵 style: The fallback boot_context object is duplicated three times (lines 784–792, 830–845, 921–928) with identical shape but verbose inline type annotations ([] as Array<{ source: string; heading: string; tokens: number; content: string }>). Extract a helper like makeEmptyBootContext(tokenBudget: number) to reduce duplication and the risk of shape drift between the three copies. [fixable]
dimakis
left a comment
There was a problem hiding this comment.
Centaur Review
Found 6 issue(s) (3 warning).
server/ws-handler-v2.ts
Solid feature with correct serialization flow, but handleReconnect is missing the cold-path fallback for ended sessions (asymmetric with handleSwitchSession), and no test coverage was added for the new persistence or replay logic.
- 🟡 bugs (L284): handleReconnect only replays boot_context from in-memory SessionRegistry (hot path) when
runningis true, but does NOT fall back to EventStore for ended sessions. In contrast, handleSwitchSession (line 391) correctly implements a cold path fallback viasessionMeta.bootContext. A user reconnecting after a session has ended will not see boot_context pills, even though the data is persisted in the EventStore. The fix would be to add anelsebranch that reads fromctx.eventStore.getSession(entry.sessionId)?.bootContextand parses it, mirroring the switch_session logic.[fixable] - 🔵 unsafe_assumptions (L393): JSON.parse of
sessionMeta.bootContextcould be exploited if the database value is tampered with or corrupted, resulting in arbitrary object shapes being spread into a WebSocket message. Consider validating the parsed shape (e.g., checking for expected keys likesource,sourceCount) before sending to the client, or at minimum using a Zod schema. Low risk since the DB is local/trusted, but worth noting for defense-in-depth.[fixable]
server/__tests__/ws-handler-v2.test.ts
Solid feature with correct serialization flow, but handleReconnect is missing the cold-path fallback for ended sessions (asymmetric with handleSwitchSession), and no test coverage was added for the new persistence or replay logic.
- 🟡 missing_tests: No tests cover boot_context replay on reconnect or session switch. The existing ws-handler-v2 test suite has no assertions for bootContext. Key scenarios to cover: (1) reconnect with running session replays boot_context, (2) switch to running session uses hot path, (3) switch to ended session uses cold path (JSON.parse from EventStore), (4) malformed JSON in EventStore is gracefully skipped.
[fixable]
packages/protocol/__tests__/event-store.test.ts
Solid feature with correct serialization flow, but handleReconnect is missing the cold-path fallback for ended sessions (asymmetric with handleSwitchSession), and no test coverage was added for the new persistence or replay logic.
- 🟡 missing_tests: No tests cover the new
agent_nameandboot_contextcolumns — neither the migration (migrateBootContext) nor the upsert/read round-trip for these fields. The existing event-store test suite has no bootContext assertions.[fixable]
server/chat.ts
Solid feature with correct serialization flow, but handleReconnect is missing the cold-path fallback for ended sessions (asymmetric with handleSwitchSession), and no test coverage was added for the new persistence or replay logic.
- 🔵 style (L784): The fallback boot_context object is constructed identically in three places (import error at ~784, invalid compile result at ~828, and catch block at ~918), each with verbose inline type annotations like
[] as Array<{ source: string; heading: string; tokens: number; content: string }>. Extract a sharedmakeFallbackBootContext(tokenBudget: number)helper to eliminate the repetition and reduce the diff noise.[fixable] - 🔵 style (L849): The pattern
const s = registry.get(clientId); if (s) s.bootContext = fallbackPayload;is repeated three times in the boot context compilation block (lines ~795, ~849, ~929). This could be a single helper or moved after the try/catch to always set whichever payload was sent.[fixable]
| }); | ||
|
|
||
| // Re-send cached boot_context so pills reappear after reconnect | ||
| if (found && running && found.session?.bootContext) { |
There was a problem hiding this comment.
🟡 bugs: handleReconnect only replays boot_context from in-memory SessionRegistry (hot path) when running is true, but does NOT fall back to EventStore for ended sessions. In contrast, handleSwitchSession (line 391) correctly implements a cold path fallback via sessionMeta.bootContext. A user reconnecting after a session has ended will not see boot_context pills, even though the data is persisted in the EventStore. The fix would be to add an else branch that reads from ctx.eventStore.getSession(entry.sessionId)?.bootContext and parses it, mirroring the switch_session logic. [fixable]
| }); | ||
| } else if (sessionMeta.bootContext) { | ||
| try { | ||
| const parsed = JSON.parse(sessionMeta.bootContext); |
There was a problem hiding this comment.
🔵 unsafe_assumptions: JSON.parse of sessionMeta.bootContext could be exploited if the database value is tampered with or corrupted, resulting in arbitrary object shapes being spread into a WebSocket message. Consider validating the parsed shape (e.g., checking for expected keys like source, sourceCount) before sending to the client, or at minimum using a Zod schema. Low risk since the DB is local/trusted, but worth noting for defense-in-depth. [fixable]
| send(transport, { | ||
| type: 'boot_context', | ||
| source: 'local-fallback', | ||
| const fallback = { |
There was a problem hiding this comment.
🔵 style: The fallback boot_context object is constructed identically in three places (import error at ~784, invalid compile result at ~828, and catch block at ~918), each with verbose inline type annotations like [] as Array<{ source: string; heading: string; tokens: number; content: string }>. Extract a shared makeFallbackBootContext(tokenBudget: number) helper to eliminate the repetition and reduce the diff noise. [fixable]
| }; | ||
| send(transport, { type: 'boot_context', ...fallback }); | ||
| const s = registry.get(clientId); | ||
| if (s) s.bootContext = fallback; |
There was a problem hiding this comment.
🔵 style: The pattern const s = registry.get(clientId); if (s) s.bootContext = fallbackPayload; is repeated three times in the boot context compilation block (lines ~795, ~849, ~929). This could be a single helper or moved after the try/catch to always set whichever payload was sent. [fixable]
Summary
boot_contextWS message was fire-and-forget, never persistedagent_name+boot_contextcolumns (cold), re-send on reconnect/switchboot_contextmessagesTest plan
🤖 Generated with Claude Code