fix(ingest): op duration_us from persisted start_ts (+ migration 0005 backfill)#31
Merged
Conversation
…lize Ts ops.duration_us was computed as OpFinalizedEvent.EndTs - OpFinalizedEvent.Ts. A finalize event sorts after its OpStarted (events are ordered by Ts), so Finalized.Ts is the op end (== EndTs) and the subtraction collapsed to ~0 for every spec-conformant adapter, persisting duration_us = 0 everywhere. This flattened the session-detail Duration column, the topology duration metric, and the catalog_*.total_duration_us rollups. stats.go was unaffected because it already derives end_ts - start_ts directly. Derive duration from the persisted start_ts (the matching OpStarted's Ts) at finalize; leave NULL for an orphan finalize with no recorded start. Hoist the existing op lookup out of the pricing branch so start_ts is read once for both pricing and duration. Add migration 0005 (data-only, schema-version-neutral): backfill ops.duration_us = end_ts - start_ts for historical rows and recompute catalog_models / catalog_tools.total_duration_us from the corrected values. Migrations are tracked by filename in _schema_migrations, so the marker schema_meta.version stays 4 (no schema shape change), keeping the serve binary's exact-equality CheckSchema satisfied. The bug was masked by test fixtures that set Finalized.Ts == OpStarted.Ts (the start); make them spec-conformant (Finalized.Ts = end) and add tests pinning the start_ts-derived duration, the orphan-finalize NULL case, and the migration backfill + rollup recompute. Specs: data-model.md and ingester.md document the duration source and the migration. SOW-0026.
…n recompute Address review findings on the op-duration fix: - end_ts and duration_us are written from a single validity gate via COALESCE, so a corrective re-finalize carrying EndTs=0 (or a clock-skewed EndTs < start_ts) preserves the previously recorded good end_ts and duration instead of clobbering end_ts to NULL while duration stayed set. Tests now assert end_ts preservation and cover the clock-skew re-finalize case. - Migration 0005 catalog_models recompute mirrors the live path exactly: drop the always-true `o.name IS NOT NULL` and add explicit non-empty provider/model guards (no behavioral change; the join to non-empty PKs already enforced it). - Add a migration-vs-live parity test: the 0005 recompute reproduces the totals the live incremental path accumulates for the same event set. - Make the remaining ingest test fixtures spec-conformant (Finalized.Ts = end) so the old masking pattern is gone from the suite. - Docs/comments: record migration 0005 in the data-model migration history; clarify the presenter SchemaVersion comment (schema-shape vs data-only migrations); correct the statsTotals comment to state it sums session end_ts - start_ts, distinct from ops.duration_us. SOW-0026.
…ated DB ai-viewer-serve runs no migrations and gates startup solely on schema_meta.version (CheckSchema, exact-equality). With migration 0005 left version-neutral, a pre-0005 v4 store passed serve startup and could hand out the stale duration_us=0 rows 0005 repairs. Bump schema_meta.version to '5' and presenter.SchemaVersion to 5 so serve refuses an un-migrated store via the existing mechanism (a version bump has no re-ingest side effect; the ingester never reads schema_meta). Every migration now bumps the version in lockstep — no version-neutral special case. - 0005 also tightens the backfill WHERE to start_ts > 0 AND end_ts > 0, mirroring the writer's finalize gate exactly. - Strengthen the skew migration test to assert the exact unchanged duration (skew rows stay 0), not merely non-negative. - embed-smoke.sh waits for 0005_op_duration_backfill.sql (the version-setting migration); sweep all schema-version assertions to 5. - Document the duration-computed-at-finalize contract: the op-insert start_ts = MIN(...) can lower start_ts on a later re-emit without recomputing duration, so adapters must emit the authoritative start at/before finalize (real adapters do). SOW-0027 tracks a defensive recompute / generated-column. SOW-0026.
Migration 0005 bumped schema_meta.version to 5, but the public spec examples still showed 4: the /api/health samples in rest-api.md and observability.md and the schema_meta example in data-model.md. Update all three to 5 to match presenter.SchemaVersion and the emitted health payload. Also narrow the "every migration bumps the version" wording (presenter.go, the 0005 SQL header, the 0004 test comment, data-model.md): only migrations whose outcome serve reads or validates bump schema_meta.version in lockstep with the binary; 0002_source_progress.sql is ingester-only and stays version-neutral. Comment/doc only — no logic change. SOW-0026.
- presenter.md: the startup schema check refuses ANY version mismatch (!=), not only version > supported — SOW-0026 relies on a v5 binary rejecting an older pre-0005 (v4) store so it never serves stale durations. - embed-smoke.sh: drop the "every migration bumps the version" overstatement (0002_source_progress is version-neutral); the latest serve-relevant migration sets the version this binary expects. - Close SOW-0026 (op duration_us from persisted start_ts): Status completed, four external-review rounds converged, moved to done/. SOW-0026.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ops.duration_uswas persisted as 0 for every op. The ingest writer computed it asOpFinalizedEvent.EndTs - OpFinalizedEvent.Ts, but a finalize event sorts after its OpStarted (events are ordered byTs), soFinalized.Tsis the op end (==EndTs) and the subtraction collapsed to ~0 for every spec-conformant adapter.Impact: flattened the session-detail Duration column, the topology
metric=durationnode sizing, and thecatalog_*.total_duration_usrollups.stats.gowas unaffected (it already derivesend_ts - start_tsdirectly), which is why two surfaces silently disagreed.data-model.mdalready specifiedend_ts - start_ts, so this was code-vs-spec drift.The bug was masked because test fixtures set
Finalized.Ts == OpStarted.Ts(the start), makingEndTs - Tscoincidentally equal the real duration in tests only.Fix
internal/ingest/writer.goapplyOpFinalized: deriveduration_us = EndTs - persisted start_ts; leave NULL for an orphan finalize (no recorded start). The existing op lookup is hoisted out of the pricing branch sostart_tsis read once for both pricing and duration.internal/store/migrations/0005_op_duration_backfill.sql(data-only, schema-version-neutral): backfill historicalops.duration_usand recomputecatalog_models/catalog_tools.total_duration_usfrom the corrected values. Migrations are tracked by filename in_schema_migrations, soschema_meta.versionstays 4 (no schema shape change) and the serve binary's exact-equalityCheckSchemastill passes.Finalized.Ts = end); new tests pin the start_ts-derived duration, the orphan-finalize NULL case, and the migration backfill + rollup recompute.data-model.md+ingester.mddocument the duration source and the migration.Verification
go test -race(ingest + store) green; coverage 88.5% / 90.9%; gofmt/vet/golangci-lint/gosec clean; secret + attribution scans clean.SOW:
.agents/sow/current/SOW-0026-20260531-op-duration-from-start-ts.md