Skip to content

SOW-0005: opencode adapter (read-only live SQLite source)#30

Merged
ktsaou merged 13 commits into
masterfrom
sow-0005-opencode-adapter
May 31, 2026
Merged

SOW-0005: opencode adapter (read-only live SQLite source)#30
ktsaou merged 13 commits into
masterfrom
sow-0005-opencode-adapter

Conversation

@ktsaou
Copy link
Copy Markdown
Member

@ktsaou ktsaou commented May 31, 2026

Adds the opencode source adapter — the 5th and final adapter — projecting OpenCode's live, concurrently-written, multi-GB SQLite session store (~/.local/share/opencode/opencode.db) onto the canonical event model, strictly read-only, registered and auto-discovered by the ingester.

What it does

  • Read-only by construction. Own connection helper opens mode=ro + _pragma=query_only(true) + _txlock=deferred, allowlist DSN (discards all caller params), short BEGIN DEFERRED per delta page; warnings buffered and flushed after the read tx closes (no WAL-pin). No reachable write-path pragma / VACUUM / ATTACH / write Exec. Six write-probe tests + an allowlist-DSN test pin it.
  • Incremental tailing via a per-table watermark cursor (MaxIDSeen monotonic insert-detect + MaxTimeUpdatedMs/ID paging position), a poll loop (2s idle / 500ms active / 250ms post-WAL-fsnotify; gated MAX(time_updated) with a 60s safety net), and a boundary-bucket re-scan that closes the same-millisecond in-place-update class (property/stress-tested).
  • Row-tree → canonical mapper: session→turn→op synthesis, sub-agent linkage (parent_id + tool='task'→child-session topology), per-LLM-op token deltas from cumulative step-finish totals (clamped), multi-provider provider_alias, terminal status, PayloadRefs, schema-drift tolerance (PRAGMA table_info + dynamic SELECT across ~30 migrations), __drizzle_migrations schema hash + /api/health source probe.
  • Checkpoint-after-emit: the cursor is persisted only after an affected session's full tree is reloaded, mapped, and emitted (idempotent re-emission absorbed by the ingester).

Quality

  • 5 chunk commits (A–E) + 7 review-fix commits; 8 external-review rounds converged (3 reviewers in parallel each round, merge-ready).
  • Gates: golangci-lint 0, gosec 0, whole-module go test -race pass, opencode coverage 92.7%, FuzzDecode* clean, same-ms stress -count=5 -race clean, files ≤400 lines, secret + AI-attribution scans pass.
  • 9 golden scenarios (happy, sub-agent+task-child, multi-provider, schema-drift, cumulative-token, nested sub-agent, failed-tool, failed-assistant, file-attachment) + fuzz + restart/resume + boundary/idle integration tests.

Scope

Purely additive: new internal/adapters/opencode/ package + a blank-import + auto-discovery probe in cmd/ai-viewer-ingest/. No internal/canonical/ingest/store/presenter or sibling-adapter changes.

Deferred (filed as follow-up SOWs — canonical-surface changes out of this adapter's additive scope)

  • SOW-0023 — sessions.provider/provider_alias carrier (canonical event + writer)
  • SOW-0024 — generalized per-source row counts in /api/health
  • SOW-0025 — canonical attachment PayloadKind (file parts currently emit an INF LogEntry)

Full design, acceptance criteria, and the 8-round review history: .agents/sow/done/SOW-0005-20260526-opencode-adapter.md.

ktsaou added 13 commits May 30, 2026 18:13
…introspection

First slice of the opencode adapter (SOW-0005, the SQLite-backed source).

- conn.go: an own read-only connection helper (NOT store.OpenReader, which
  targets ai-viewer's own DB) — DSN file:<abs>?mode=ro&_pragma=query_only(true)
  &_pragma=busy_timeout(5000), MaxOpenConns(2), with a strip-pass that removes
  any caller-supplied colliding _pragma so the read-only guard can't be
  overridden. Layered defense: OS mode=ro O_RDONLY + SQL query_only(true).
- cursor.go: per-table watermark cursor {Version, SchemaHash,
  Tables:{MaxID, MaxTimeUpdatedMs}} with After() requiring >=1 table to advance
  and none to regress (codex discipline), version-gated ParseCursor. No byte
  offsets (this is a DB, not a JSONL stream).
- types.go: typed session/message/part/session_message row structs + the
  discriminated message.data (user|assistant) and part.data 12-variant type
  union, forward-compat tolerant (unknown type/column never hard-fails).
- store.go: PRAGMA table_info introspection -> dynamic per-table SELECT that
  omits columns absent in an older schema (AC#5), with missing-column detection.

Read-safety (AC#2): the 6 write-probes are each asserted to not mutate
opencode.db (error for INSERT/UPDATE/DELETE/VACUUM; no-op sentinel for
wal_checkpoint under mode=ro; side-file block for ATTACH 'rwc') plus a
byte-untouched read-back — verified on modernc.org/sqlite, stronger than the
AC's original "all error" wording (reconciled in the SOW). mapper/delta-queries/
tailer/adapter wiring land in later chunks. Gates green: gofmt/vet/golangci(0)/
gosec(0); race tests pass at 94.3% coverage; files <=333 lines.
…al events)

Pure, re-emittable row->event synthesis for one session's message+part tree
(the tailer in a later chunk re-feeds an affected session's tree on any
change; idempotent upserts + the idempotent catalog absorb re-emission).

- session row -> SessionStarted (Kind sub_agent when parent_id set + parent/
  root linkage, else root); terminal status: time_archived -> completed,
  else last assistant data.error -> failed (ErrorClass=error.name), else
  running (no per-session terminal, like claude-code/codex).
- assistant message -> Turn (seq by time_created,id); per-turn tokens = delta
  from the previous assistant message's cumulative totals (FLAGGED in-code as
  implementer-verify-on-live-DB; only the step-level cumulative pattern is
  independently confirmed); cost verbatim; cache via TurnFinalizedEvent.
- parts: step-start/finish -> LLM op with computeStepDeltas (step-finish
  tokens are cumulative WITHIN a message -> per-op deltas; input 100,250,410
  -> 100,150,160, AC#3); reasoning -> reasoning op; tool -> tool op (namespace
  derived) and, for tool='task' with state.metadata.sessionId, ALSO a session
  op as topology parent (AC#4); text/file -> PayloadRef (no op); patch -> LLM
  op extras; compaction -> INF log; retry -> WRN log; unknown -> skip + WARN.
- provider alias (AC#7): ProviderAlias = data.providerID verbatim; Provider =
  best-effort canonical (local alias map; default = alias) so the catalog
  provider row seeds.

PayloadRef URI construction is injected (WithPayloadURIBuilder seam) so the
mapper stays DB-agnostic; Chunk D wires the opencode-sqlite:// builder. Spec
firmed: reasoning-kind + text/reasoning/file PayloadRef rows + the mapper/URI
seam subsection. Gates green: golangci(0)/gosec(0)/vet; race tests pass at
96.1% coverage; files <=332 lines. (Noted: internal/canonical/providers.go
referenced by the spec does not exist; the alias map lives locally pending a
future centralization SOW.)
Replaces the JSONL adapters' byte-offset stream+fsnotify with a SQLite
delta-query + watermark-poll model for the live opencode DB. All reads are
strictly read-only (chunk-A openReadOnly; every page in its own short
BeginTx{ReadOnly:true}), so the live concurrent writer is never blocked and
the WAL is never pinned.

store_query.go:
- per-table paged delta query via chunk-A buildSelect:
  WHERE time_updated > :u OR (time_updated = :u AND id > :id)
  ORDER BY time_updated, id LIMIT 1000, paging until a short page.
- old-schema fallback (buildSelectByID in store.go) for a table lacking
  time_updated: WHERE id > :id ORDER BY id; watermark advances on MaxID only.
- cheap PK-indexed maxID; expensive UNINDEXED maxTimeUpdated (gated by the
  tailer). #nosec G202 on the two MAX() aggregates is constant-only (fixed
  trackedTables name via quoteIdent; a table name cannot bind as a parameter).
- affected-session derivation: session->own id; message/session_message->
  session_id; part->denormalized session_id (fallback message_id->session_id
  lookup for a hypothetical old schema); de-duplicated first-seen set.

store_load.go: loadSession + loadSessionTree -> ordered []messageWithParts
(messages by time_created,id; parts by their order column) fed whole to the
chunk-B mapSession (the per-turn cumulative-token delta needs the full
ordered message list, so partial reload would miscount).

tailer.go + tailer_changes.go:
- scanLoop: cold/resume backfill; pages every table from the cursor, reloads
  each affected session's full tree, maps, emits; SourceProgress every ~1000
  rows + at end; records a present-column schema fingerprint into the cursor
  (chunk D swaps in the __drizzle_migrations hash without changing watermark
  semantics).
- tailLoop: poll cadence 2s idle / 500ms active / 250ms floor for 5s after an
  opencode.db-wal fsnotify mtime event (wakeup hint only; missing WAL or watch
  error is non-fatal -> pure timer polling).
- MAX(time_updated) gate (the load-bearing AC#6 property): pure
  shouldProbeTimeUpdated(now,lastWALEvent,lastProbe,60s) =
  lastWALEvent.After(lastProbe) || now.Sub(lastProbe) >= 60s. Idle steady
  state never issues the unindexed scan -- proven by the pure truth-table
  test AND a registered query-counting sql.Driver (0 MAX(time_updated) across
  5 idle polls; MAX(id) every poll).
- shared processChanges pipeline used by both loops; emitProgress mirrors the
  codex SourceProgressEvent shape; all channel sends are ctx-aware.

Resume pinned: half-scan -> persist cursor -> ParseCursor -> scan rest emits
the same content events as one cold scan (zero dupes, zero gaps). Spec
(adapter-opencode.md Read/Watch/Performance) reconciled to the firmed
behavior. Gates green: golangci(0)/gosec(0)/vet; race tests pass at 91.6%
coverage; new non-test files <=400 lines.

Carried to chunk D: (1) buildSelectByID is presently reachable only if
introspection's requiredColumns is relaxed (time_updated is required today);
kept as a tested drift safeguard pending the migration-history evidence chunk
D reads from __drizzle_migrations. (2) the schema-shape fingerprint is a
placeholder for that migration hash.
… payload URI

Wires chunks A/B/C into a registered canonical.Adapter and exposes the source
to the ingester, mirroring the codex integration. Purely additive: a new
"opencode" registry entry + an additive auto-discovery probe; no canonical/
ingest/store/presenter change.

adapter.go:
- Adapter{dbPath,sourceID,logger,onError,scanCursor} implementing
  canonical.Adapter; New rejects an empty DB path; Factory + init() ->
  adapters.Register("opencode", Factory).
- Scan records the final watermark on the instance even on ctx-cancel so a
  following Tail resumes from it; Tail uses that cursor, or (cold, no preceding
  Scan) snapshots current HEAD via maxID+maxTimeUpdated per table so it follows
  from now instead of replaying history. Re-emission absorbed by the idempotent
  upserts + idempotent catalog. coerceCursor treats an alien cursor as zero
  (full re-scan) — never skips data.

payloads.go: buildPayloadURI(partID,field) is the single home for the
opencode-sqlite://?part_id=&field= grammar (net/url-encoded). The chunk-B
mapper default now delegates here; output is byte-identical so mapper goldens
are unchanged. No resolver is built — the /api/payloads serving route is a
separate Phase-2 SOW and a parser now would be dead code.

migrations.go (AC#8 + real schema hash):
- readMigrations reads __drizzle_migrations.name ordered by id (a missing
  table is non-fatal -> empty + sentinel so a foreign/old DB degrades).
- schemaHash = length-prefixed sha256 of the ordered migration-name list
  (length-prefix removes the join-delimiter ambiguity). This REPLACES the
  chunk-C present-column placeholder: scanLoop/Tail record the real hash; a
  tail-time mismatch logs a structured WARN, re-reads, and CONTINUES without
  resetting watermarks (column drift stays per-column via the dynamic SELECT).
- ProbeStatus opens read-only and reports session/message/part COUNT(*) +
  latest migration for /api/sources + /api/health.

cmd/ai-viewer-ingest: named-import the opencode adapter + add the probe entry
(opencodeDBPath = ~/.local/share/opencode/opencode.db) + a case "opencode"
branch that logs counts + latest_migration; a ProbeStatus error logs
probe_error but still registers the source (discovery never fails on a count
error). sources.go split (464->364) into discovery.go to hold the budget.

#nosec G202 on the migration-name + COUNT queries is constant-only (fixed
table identifiers via quoteIdent, never user input). Specs reconciled:
deployment.md opencode row -> live (SOW-0005); adapter-opencode.md schema-hash
+ payload-URI home + cold-Tail + ProbeStatus firmed. Gates green:
golangci(0)/gosec(0)/vet both packages; race tests pass at opencode 91.8%
(no regression); new files <=400 lines.

Resolved chunk-C carry-forwards: (1) buildSelectByID kept — time_updated is in
the base Timestamps mixin on all four tracked tables across the observed
schema, so the fallback is a tested backward-compat safeguard, not dead code.
(2) the schemaFingerprint placeholder is fully removed.
…z + AC#5 INF

Pins the adapter's row-tree -> canonical-event projection with committed,
hand-verified golden scenarios, adds a data-JSON decode fuzz target, and
closes the AC#5 observability gap the golden work exposed. Adapter is now
feature-complete for SOW-0005.

Golden harness (mirrors codex): auto-discovers testdata/opencode/<scenario>/,
builds a throwaway temp SQLite DB from the scenario's human-readable
fixture.sql (no binary .db in git), opens it through the read-only adapter,
filters SourceProgress, and compares {kind,payload} JSONL to expected.jsonl
(opencode:<dbPath> SourceID substituted to <ROOT>; -update-golden regenerates).
All non-SourceProgress event Ts derive from fixture row timestamps, so goldens
are deterministic (verified with -count=3).

Five scenarios, each hand-verified against the spec by reading expected.jsonl
(not just re-running):
- a_happy: baseline session->turn->op tree + reasoning/text PayloadRef +
  tool BytesIn/Out + running session (no finalize).
- b_subagent_task: BOTH topology edges (a Kind=session op carrying
  ChildSessionNativeID AND a Kind=tool Name=task op in the same turn) + the
  child session_started Kind=sub_agent with parent/root linkage (AC#4).
- c_multi_provider: two providers surface verbatim as Provider/ProviderAlias
  on their ops (AC#7).
- d_schema_drift: an OLD schema (no cost/tokens_* columns) -> dynamic SELECT
  omits them, session emits empty Model/Agent, message-derived turn tokens
  intact; introspection ACCEPTS (degrades, never rejects) (AC#5).
- e_cumulative_tokens: four cumulative step-finish totals (100/250/410/400)
  -> per-op TokensIn deltas 100/150/160/0 on op_finalized (the 4th clamps
  from -10 to 0); turn rollup takes the final cumulative 400 (AC#3).

data_fuzz_test.go: FuzzDecodeMessageData + FuzzDecodePartData over the
types.go data-JSON decoders (user|assistant + 12 part $.type variants +
malformed/truncated/empty). 30s each, ~3M execs, zero panics.

AC#5 INF (gap found by the golden work, now closed): tableSchema.Missing was
computed but logged nowhere and Adapter.logger was unused. tailer.go now
logMissingColumns at introspection: one INFO per missing optional column
(table+column keys, deterministic order); logger threaded into scanLoop/
tailLoop and passed from adapter.go (Scan + Tail each emit once on the rare
old-schema path). The drift test asserts the exact missing-column set is
logged (golden_loghandler_test.go capture handler). Spec + SOW reconciled;
AC#3/#4/#5/#6/#7 marked DONE with test evidence.

Gates green: golangci(0)/gosec(0)/vet; whole-module race passes; opencode
coverage 92.5% (logMissingColumns/orDefaultLogger 100%); files <=400 lines.
…nups)

External review (codex decisive; glm/minimax clean) found real defects on the
live/concurrent-DB semantics. Each adjudicated against the spec + code before
fixing (not on reviewer convergence):

P1 — fixed:
- DATA LOSS: collectDeltas checkpointed SourceProgress mid-paging, advancing the
  persisted watermark before reloadAndEmit emitted the affected sessions'
  content; a crash/cancel between them skipped those sessions on restart (worst
  on cold backfill). Rewrote into tailer_batch.go batchProcessor: each bounded
  batch pages -> reloadAndEmit -> THEN promotes the committed cursor + emits
  progress; cancel/error returns the last content-committed cursor. Pinned by
  TestProcessChanges_CheckpointAfterEmit_NoLoss.
- read-safety: buildReadOnlyDSN only stripped name-colliding _pragma, so a
  crafted DSN's wal_checkpoint/optimize/_txlock=exclusive survived (mode=ro
  still blocked real mutation, but the contract requires them unreachable). Now
  an allowlist: discard all caller params, rebuild with mode=ro + _txlock=
  deferred + the read-only pragma set. Pinned by the malicious-DSN test.
- live turns: turnStatus/finalizeTurn finalized every assistant message;
  spec finalizes only when data.time.completed or a step-finish part exists.
  turnIsTerminal now gates TurnFinalizedEvent — running turns emit TurnStarted
  only (correct for the live tailer). Pinned by RunningTurnNotFinalized.
- $OPENCODE_DB: opencodeDBPath now resolves $OPENCODE_DB -> $XDG_DATA_HOME/
  opencode/opencode.db -> default (AC#8). Pinned by the resolution test.

P2 — fixed:
- nested sub-agents: resolveRootID walks the parent_id chain to the true tree
  root (depth-cap + cycle guard); g_nested_subagent golden pins grandchild
  Root=root, Parent=direct.
- orphan step-start force-closes the prior open LLM op as cancelled (spec
  Edge #5).
- silent parse failures (model JSON, task metadata, corrupt numeric cells) now
  route to onError/WARN with context (no silent zero).
- unknown session_message.type now WARNs (spec Edge #1).
- session tree loads under one bounded read transaction (no N+1 / no
  cross-snapshot).

P3: removed dead buildSelectByID (time_updated is required + universal);
recursive itoa -> strconv.Itoa; probe.Close on the error path.

Deferred shared-surface items (outside this adapter's additive scope) filed as
follow-ups: SOW-0023 (sessions.provider/provider_alias need a SessionStarted
field + writer mapping in canonical/ingest); SOW-0024 (generalized per-source
row counts in /api/health). SOW AC#7/#8 notes + Reviews section updated.

Gates green: golangci 0, gosec 0, whole-module race passes, opencode coverage
92.3%, FuzzDecode* clean; files <=400 lines.
codex confirmed the round-1 fixes all hold, then found deeper defects on the
live/concurrent-DB + canonical-status surfaces. Each adjudicated against the
spec + code before fixing.

P1 — fixed:
- cursor conflation -> permanent expensive idle scans: the watermark set MaxID
  to the last-paged row by (time_updated,id) order, but detectChange compared
  MAX(id) > MaxID for cheap insert detection. An in-place update of an OLD row
  regressed MaxID, so MAX(id) stayed greater forever and every idle poll ran the
  unindexed (time_updated,id) delta sort on the live multi-GB DB (defeating
  AC#6's gating). Split TableWatermark into MaxIDSeen (monotonic, insert
  detection) + MaxTimeUpdatedMs/MaxTimeUpdatedID (time-ordered paging). Bumped
  cursorVersion 1->2; an old/unknown cursor resets to zero (idempotent
  re-scan). Pinned by TestP1A_OldRowUpdateDoesNotReArmIdleScan (counting driver
  asserts zero MAX(time_updated) after an old-row update).
- sticky session failure: failError was set on any error turn and never cleared,
  so a session that errored then recovered finalized as failed. Now tracks the
  LAST assistant turn's terminal state (set on error, clear on a clean turn).
- non-canonical op status "error": opencode tool state error mapped to the
  literal "error", not a canonical status (running|completed|failed|cancelled|
  truncated). Now maps to "failed" with detail in ErrorClass/ErrorMessage;
  audited every emitter for the same leak. New h_failed_tool golden pins it.

P2 — fixed:
- terminal predicate now uses data.Error != nil (presence), not Error.Name != "".
- loadSessionTree N+1 -> one part query per session (WHERE session_id=? grouped
  in memory; old-schema IN(...) fallback), single bounded read tx.
- migrationsTablePresent returns (bool,error); only a clean count-0 is soft, all
  other faults propagate (no swallowed corruption/ctx-cancel).
- patch-enrichment OpStarted re-emit now carries the full op identity
  (Name/Model/Provider/ProviderAlias) so the writer's unconditional ops.name
  update cannot blank it.
- time_compacting (spec Edge #8) now read; a session mid-compaction is skipped
  until the column clears, then re-emitted.
- token-delta + ms->us arithmetic saturates + WARNs on overflow (crafted DB
  values can no longer wrap).

P3 — fixed: spec read-strategy DSN aligned to the code allowlist (no
journal_mode(WAL)); SourceProgress now emitted from one layer only (the batch
processor's checkpoint-after-emit). (The stale d_schema_drift INF comment was
already corrected.)

Files split to hold the <=400 budget: store_scan.go, tailer_wal.go,
mapper_emitters.go. Gates green: golangci 0, gosec 0, whole-module race passes,
opencode coverage 92.9%, FuzzDecode* clean. Regenerating goldens changed only
fixture.sql (added time_compacting); zero expected.jsonl drift on the 6 prior
scenarios -> the fixes are emission-equivalent for the happy paths.
codex confirmed all prior fixes hold + read-safety/checkpoint/goldens good, then
found two live-concurrency correctness gaps. Each adjudicated on ground truth.

P1 — fixed:
- same-millisecond in-place update could be missed forever: with the cursor at
  (T, highID), an already-seen low-id row updated in place at the SAME ms T moved
  neither MAX(id) nor MAX(time_updated)>T, and the delta's (tu=T AND id>highID)
  excluded it -> permanent skip. Added a bounded boundary-bucket re-scan
  (tailer_boundary.go): on a WAL-driven probe where no detector advanced, re-scan
  rows with time_updated == cursor.MaxTimeUpdatedMs, re-emit their sessions
  idempotently WITHOUT advancing the cursor. Gated on walDriven && !changed so the
  idle path stays free (AC#6 preserved) and a cold-Tail snapshot never replays.
  Pinned by TestP1_1_BoundaryUpdateReEmitted.
- time_compacting TOCTOU: the compaction check and the message/part tree load ran
  in separate transactions, so compaction starting between them could emit a
  partial tree. loadAndMapSession now reads the session row, checks
  time_compacting, resolves the root, and loads the tree in ONE read-only
  snapshot tx (also closes the round-1 cross-snapshot gap). loadSession/
  loadSessionTree/resolveRootID take a roQuerier (*sql.DB or *sql.Tx). Pinned by
  TestP1_2_CompactingSkippedAtomically + TestP1_2_TreeLoadRunsInCallerTx.

P2 — fixed:
- overflow fix completed: msToMicrosWarn WARNs on clamp; CtxUsed (input+cache.read)
  uses saturating addClampWarn. Pinned by TestP2_1_*.
- malformed message/part JSON now also routes through the adapter OnError path
  (-> SourceErrorEvent -> /api/health parse_errors) in addition to the session
  LogEntry, so a corrupt DB degrades health. Pinned by TestP2_2_*.
- unbounded tree load: a session over a generous bound (100k msgs/parts) emits a
  WARN (surfaced, never silently truncated) and is still processed in full (the
  whole ordered tree is required for per-turn token deltas — documented as an
  intentional constraint).
- documented that the ingest CLI's opencode source is a filesystem path; the
  file:/:memory: DSN forms are adapter programmatic/test use only.

P3 — fixed: removed the dead old-schema part fallback (part.session_id is a
required introspection column, so it was unreachable); the opencode
auto-discovery probe now requires a regular file (a directory named opencode.db
no longer registers). Pinned by TestAutoDiscover_OpencodeDirectoryNotRegistered.

Gates green: golangci 0, gosec 0, whole-module race passes, opencode coverage
92.4%, FuzzDecode* clean; production files <=400 lines; -update-golden rewrote
zero bytes (fixes are emission-equivalent on normal data).
codex confirmed rounds 1-3 hold + read-safety/checkpoint/goldens good, found one
P1 completeness gap in the round-3 same-ms fix + tightening P2/P3.

P1 — fixed: the boundary-bucket re-scan (round-3) fired only on a WAL-driven
probe; the 60s safety-net probe did not run it, so a same-ms in-place update was
stranded forever when the WAL hint was missed (dropped event / watcher-setup
failure / timer-only polling) -- violating the spec's safety-net guarantee. Added
a priorProbe flag to pollState: the boundary re-scan now runs on
probed && (walDriven || priorProbe), so the safety-net probe triggers it too,
while a cold-Tail's FIRST probe never replays its snapshot boundary. AC#6 idle
property + idempotency + no-cursor-advance preserved; also covers time_compacting
clearing at the boundary ms. Pinned by the rewritten boundary tests +
TestP1_1_CompactingClearsAtBoundaryReSurfacesOnSafetyNet.

P2 — fixed:
- delta scanners now .withWarn; corrupt OPTIONAL cells WARN+degrade, but the
  REQUIRED cursor columns (id, time_updated) ERROR so a poisoned watermark can
  never be persisted.
- every emitted-event timestamp routes through msToMicrosWarn (was silent at
  most emitters).
- non-canonical PayloadKind "user_attachment" removed: a file part now emits an
  INF LogEntry with {filename,url,mime} extras (canonical-clean, no loss); a
  first-class canonical attachment kind is deferred to follow-up SOW-0025.
- (spec) subtask/agent/snapshot parts documented as intentionally-ignored v1
  no-ops (zero observed); per-batch full-tree re-emit documented as an accepted
  crash-safety tradeoff.

P3 — fixed: ProbeStatus uses a bounded 10s context (was context.Background());
bare filesystem paths are opaque (query split only for file:/:memory: URI forms,
so a path containing '?' opens literally).

Filed follow-up SOW-0025 (canonical attachment PayloadKind). Gates green:
golangci 0, gosec 0, whole-module race passes, opencode coverage 92.5%,
FuzzDecode* clean; production files <=400 lines; goldens carry only canonical
PayloadKinds.

codex P1 trend: 4 -> 3 -> 2 -> 1 across review rounds (converging).
codex round 5 found ZERO P1 (the P1 trend reached 0: 4->3->2->1->0) and declared
the adapter merge-ready after two P2s + two trivial P3s; glm and minimax both
"production-ready / would merge". Final substantive fixes:

P2 — fixed:
- no warn/error/content emission while a source-DB read transaction is open
  (WAL-pin risk on the live multi-GB DB): a new warnSink buffers parse/numeric/
  schema warnings DURING the delta-page and full-session-tree read txns; the tx
  is committed/rolled back FIRST, then the buffered warnings flush through
  onError and the pure mapper runs + content events emit — so neither warnings
  nor content are delivered under an open source tx. A fatal row error still
  aborts the page before cursor advance, surfaced post-tx. Pinned by a
  discriminating single-connection probe test (verified it FAILS when a warn is
  re-introduced under the open tx).
- required OWNERSHIP-id columns now error on corrupt: message.session_id,
  part.message_id, part.session_id, session_message.session_id read via
  requiredOwner — an empty/corrupt owning id aborts the page (cursor not
  advanced) instead of silently dropping affectedSet.add("") and advancing past
  it (a health-invisible cursor gap). session_message.type keeps its unknown-
  type WARN (not an ownership id). Pinned across all four (table, col) pairs.

P3 — fixed:
- failed session finalize now populates ErrorMessage from data.error.data.message
  (opencode's AssistantError union is {name, data:{message,...}}; every shipping
  variant except MessageOutputLengthError carries data.message — confirmed
  read-only against the live DB: 422/422 error messages populate it). New
  i_failed_assistant golden pins Status=failed + ErrorClass + ErrorMessage.
  (TurnFinalizedEvent has no ErrorMessage field, so the turn carries ErrorClass
  only — documented.)
- updated the stale mapper comment: the message-level cumulative->delta token
  behavior is pinned by the e_cumulative_tokens golden + TestComputeStepDeltas
  (was "not independently confirmed").

tailer_changes.go split (the warnSink threading pushed it over budget): poll-
cadence state machine + cursor-shaping helpers extracted to tailer_pollstate.go.
Specs reconciled (adapter-opencode.md + canonical-events.md opencode terminal-
signal note). Gates green: golangci 0, gosec 0, whole-module race passes,
opencode coverage 92.4%, FuzzDecode* clean; production files <=400 lines;
committed goldens byte-identical (only the new i_failed_assistant added).
codex round 6 confirmed the read-safety/checkpoint/round-5 tx-restructure are
correct, found a deeper case of the same-ms boundary problem + tightening items.

P1 — DEFINITIVE same-ms boundary fix: the round-3/4 boundary re-scan ran only on
the changed==false branch, so a same-ms in-place update of a low-id row was
stranded forever when a co-occurring normal change advanced the cursor past the
boundary ms (zero-gaps violation). pollOnce now runs the boundary-bucket re-scan
against the PRE-ADVANCE cursor on every gated probe (WAL event or 60s net),
BEFORE processChanges advances the watermark, regardless of changed — covering
all same-ms cases (same-ms-only, same-ms + co-occurring forward change, and the
missed-WAL safety net) at once. A new boundaryReal warm/cold flag on pollState
keeps a cold-Tail HEAD snapshot from replaying its boundary on the first probe
(the unconditional reorder otherwise broke TestAdapter_TailColdSnapshot). Pinned
by TestP1_R6_CoOccurringForwardChangeDoesNotStrandBoundaryUpdate (proven
load-bearing: reverting the reorder or flipping boundaryReal cold makes it FAIL)
+ TestP1_R6_ColdFirstProbeStillGuardsBoundaryReplay; all prior boundary/idle/
cold-snapshot tests stay green.

P2 — fixed:
- bogus tool_response PayloadRef for failed tools: toolTerminal returned
  hasOutput=true on state.error, so a failed-only-error tool emitted a
  PayloadRef to a non-existent state.output. Now keyed on state.output != "";
  failed-tool detail stays in OpFinalized ErrorMessage. h_failed_tool golden
  regenerated (bogus ref gone).
- SessionUpdatedEvent spec/code drift: amended adapter-opencode.md to document
  that opencode applies metadata updates via idempotent SessionStarted
  re-emission (whole-row re-read each delta; writer COALESCEs), so unlike the
  siblings' single-field backfill it has no mid-stream gap; last_activity_ts is
  driven by the latest turn/op Ts. No SessionUpdatedEvent emitted.

P3 — fixed: retry log now includes error.name (decoded into partError); removed
the dead message-lookup fallback in resolvePartSession (part.session_id is a
required introspection column, so it was unreachable — simplified to read the
required column, erroring on empty); added j_file_attachment golden pinning the
file -> INF LogEntry path end-to-end (zero PayloadRefs, canonical-clean).

Gates green: golangci 0, gosec 0, whole-module race passes, opencode coverage
92.8%, FuzzDecode* clean; production files <=400 lines; only h_failed_tool
(corrected) + j_file_attachment (new) goldens changed, other 7 byte-identical.

codex P1 trend: 4 -> 3 -> 2 -> 1 -> 0 -> 1 (the recurring same-ms boundary thread,
now addressed definitively).
codex round 7 found the same-ms boundary gap a 4th time (a different detection
path each round) + 4 more. This round closes the CLASS, not one case, and adds
a same-ms stress/property test as the guard.

P1 — fixed:
- DEFINITIVE same-ms boundary fix: the boundary re-scan was gated on
  detectChange's `probed` flag, but the cheap MAX(id) path returns
  changed=true/probed=false and short-circuits, so a true insert co-occurring
  with a same-ms in-place update of a low-id row let the cursor advance past the
  boundary -> the update was lost. The re-scan now runs against the PRE-ADVANCE
  cursor whenever `boundaryReal && (changed || probeGateOpen)` — covering every
  detection path (cheap MAX(id) insert, gated MAX(time_updated), WAL event, 60s
  safety net) at once. Idle (changed=false + gate closed) still does zero
  expensive work (AC#6). Removed the now-redundant priorProbe flag; boundaryReal
  is the single cold-Tail guard, applied on ALL re-scan paths. Pinned by the
  exact cheap-path case + a deterministic-seed same-ms STRESS test (random
  insert/in-place-update interleavings incl. missed-WAL safety-net cycles) that
  FAILS against the old trigger and passes -count=5 -race.
- reloadAndEmit no longer swallows generic errors: only errSessionGone and the
  time_compacting pause are skip-and-continue; any other load/map/commit error
  propagates so commitBatch does NOT promote the cursor (the rows retry) —
  restoring the checkpoint-after-emit invariant.

P2 — fixed:
- cold-Tail boundaryReal guard now applied on the changed==false path too (a
  cold Tail's first WAL/safety-net probe no longer replays the HEAD-snapshot
  boundary).
- watchWAL goroutine is now awaited: closeWatch (sync.WaitGroup + sync.Once)
  waits for the watcher goroutine to exit before returning, so Tail's deferred
  closeWatch guarantees no late onError races with the source's close(events)
  (send-on-closed-channel panic).
- full-tree load scanners now validate required ownership ids (message_id/
  session_id): a corrupt id surfaces a post-tx WARN instead of silently
  dropping parts under out[""] (mirrors the round-5 delta-path guard).

store_load.go split (P2-3 pushed it over budget): the columnIndex + scanDest
decoder extracted to store_scandest.go. Spec Watch-Strategy section updated
(unified trigger + error-propagation rule). Gates green: golangci 0, gosec 0,
whole-module race passes, opencode coverage 92.7%, FuzzDecode* clean, same-ms
stress -count=5 -race clean; production files <=400 lines; goldens unchanged.

codex P1 trend: 4 -> 3 -> 2 -> 1 -> 0 -> 1 -> 2 (the same-ms thread, now closed
by class + stress test rather than per-case patching).
…verged

The 5th and final source adapter. opencode's live multi-GB SQLite store ->
canonical events, strictly read-only, registered + auto-discovered. 5 chunk
commits (A-E) + 7 review-fix commits across 8 external-review rounds; codex +
glm + minimax all merge-ready (codex round 8: "no actionable P1 or P2, I would
merge"). All 8 acceptance criteria met with test evidence. Gates green:
golangci 0, gosec 0, whole-module race pass, opencode coverage 92.7%,
FuzzDecode* + same-ms stress (-count=5 -race) clean.

The recurring same-ms incremental-cursor boundary (codex P1 across rounds
3/4/6/7) was closed structurally in round 7 — one re-scan trigger across all
detection paths + a property/stress test, not per-case patches.

Reviews 1-8, Outcome, and Lessons recorded; moved current/ -> done/. Deferred
canonical-surface follow-ups filed: SOW-0023 (session provider columns),
SOW-0024 (per-source /api/health counts), SOW-0025 (canonical attachment
PayloadKind).
@ktsaou ktsaou merged commit fda851d into master May 31, 2026
6 checks passed
@ktsaou ktsaou deleted the sow-0005-opencode-adapter branch May 31, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant