Skip to content

feat(cli): unified OFFLINE ao migrate — port legacy projects + sessions into the rewrite (direct-DB), with Claude transcript relocation #2129

@harshitsinghbhandari

Description

@harshitsinghbhandari

Implements the legacy side of the legacy → rewrite data migration. Spec source: aoagents/ReverbCode#247 (§1/§3 projects, §2 sessions). Fully investigated and decision-locked below so it can be built without re-investigation.

Supersedes PR #2127 (which added ao migrate as an online POST /api/v1/projects flow). We are switching to a single offline direct-DB tool that does projects and sessions, so #2127's POST approach is dropped.

Goal

One ao migrate command, run with the rewrite daemon stopped, that inserts the legacy project registry + per-project settings + non-terminated sessions directly into ~/.ao/data/ao.db, and relocates Claude transcripts so resumable sessions keep their context.

Why offline direct-DB (not the daemon API)

  • The rewrite has no session-import path: POST /api/v1/sessions only spawns a new live session (no id/agent_session_id/state fields); CreateSession auto-assigns id+num and is welded to the spawn pipeline; there is no InsertSessionVerbatim and no migrate endpoint. So sessions cannot be inserted online without rewrite-side Go work.
  • Projects do have an online endpoint (what feat(cli): add ao migrate to port legacy projects + settings into the rewrite daemon #2127 used), but mixing one online step (daemon running) and one offline step (daemon stopped) is incoherent. Unifying on offline = one command, one state.
  • Offline also aligns with fix: use JSONL-based activity detection in lifecycle manager #247 G7 ("import before the daemon serves traffic" — avoids bulk CDC fan-out to live SSE consumers).

Decisions (locked)

  • Offline direct-DB insert into $AO_DATA_DIR/ao.db (default ~/.ao/data/ao.db); daemon must be stopped.
  • Schema-pinned to v12 (hard guard; refuse on mismatch).
  • Session identity (fix: use JSONL-based activity detection in lifecycle manager #247 G1): legacy session id verbatim, num = trailing integer of the id.
  • Non-terminated sessions only — no terminated/history-row import.

Verified facts (don't re-investigate)

Direct-DB (rewrite migrations 0001–0012):

  • Daemon = single-writer pool over WAL (storage/sqlite/db.go) → run with daemon stopped; detect & refuse if the DB is locked (SQLITE_BUSY).
  • Schema guard: require SELECT MAX(version_id) FROM goose_db_version == 12, else refuse. Daemon runs goose.Up on every boot, so the DB must already exist → precondition: start the rewrite once (creates+migrates schema), then stop it.
  • Insert order projects → sessions, foreign_keys=ON (sessions.project_idprojects(id)). CDC triggers fire one harmless session_created change_log row per insert.
  • Sessions is_terminated=0; leave runtime_handle_id='' (drop dead legacy tmux handle) — the reaper skips handle-less rows (benign warning, no clobber).
  • better-sqlite3 is an optional dep of @aoagents/ao-core — load via lazy createRequire with a graceful fallback (pattern: packages/core/src/events-db.ts).
  • No-default NOT NULL columns to supply — sessions: id, project_id, num, activity_last_at, created_at, updated_at; projects: path, registered_at.

Projects (reuse #2127 mappers; now written via SQL):

Agent resume (EMPIRICALLY VERIFIED with claude -p):

  • Claude Code transcript: ~/.claude/projects/<slugified-cwd>/<uuid>.jsonl; claude --resume <uuid> is cwd-scoped. Verified: resume from a different dir fails ("No conversation found"); after copying the .jsonl into the new dir's slug bucket, full context returns (recalled a planted secret).
    • Slug = realpath(cwd) then .replace(/[^a-zA-Z0-9-]/g, "-") (agent-claude-code/.../activity-detection.ts:43-46).
    • Rewrite worktree = $AO_DATA_DIR/worktrees/{projectID}/{sessionID} (daemon/lifecycle_wiring.go:80 ManagedRoot=Join(DataDir,"worktrees"); gitworktree/workspace.go:441-450). The rewrite ignores migrated workspace_path and re-derives this (session_manager/manager.go:443-514).
    • So for each resumable Claude session: copy ~/.claude/projects/<legacy-slug>/<uuid>.jsonl~/.claude/projects/<slugify($AO_DATA_DIR/worktrees/{projectID}/{legacy-id})>/<uuid>.jsonl.
    • claudeSessionUuid is a raw metadata key in the legacy session JSON (not typed SessionMetadata).
  • Codex (codex resume <threadId>) and OpenCode (opencode --session <id>) resume by global id — path-independent; just carry the id into agent_session_id. (codexModel/restoreFallbackReason dropped, fix: use JSONL-based activity detection in lifecycle manager #247 G3.)

Session field mapping (#247 §2)

Read legacy projects/{id}/sessions/*.json (+ legacy {hash}-{id} form); parse CanonicalSessionLifecycle v2 (accept the lifecycle key or statePayload+stateVersion:"2"); skip 0-byte and *.corrupt-*; double-decode nested JSON-as-string fields.

rewrite column legacy source transform
id filename stem verbatim
num trailing integer of id parse
project_id owning project dir
issue_id issue '' if absent
kind role==="orchestrator"? orchestrator:worker
harness agent claude-code/codex/aider/opencode 1:1; unknown → ''
activity_state lifecycle.session.state working→active; not_started/idle/detecting/stuck→idle; needs_input→waiting_input
activity_last_at lifecycle.session.lastTransitionAtruntime.lastObservedAtcreatedAt
is_terminated 0 (non-terminated scope)
branch branch
workspace_path worktree informational (rewrite re-derives)
runtime_handle_id ''
agent_session_id claude→claudeSessionUuid, codex→codexThreadId, opencode→opencodeSessionId else ''
prompt userPrompt
display_name displayName '' if absent
first_signal_at backfill = activity_last_at mirrors migration 0010
created_at createdAtlifecycle.session.startedAt→mtime
updated_at lifecycle.session.lastTransitionAtcreatedAt

Only sessions whose lifecycle.session.state ∉ {done,terminated} and terminatedAt == null are migrated.

Components (proposed)

  • lib/migrate.ts — keep pure project mappers from feat(cli): add ao migrate to port legacy projects + settings into the rewrite daemon #2127.
  • lib/migrate-sessions.ts — read + map legacy sessions → rewrite rows (pure).
  • lib/migrate-claude.ts — legacy/rewrite slug computation + transcript copy.
  • lib/migrate-db.ts — offline SQLite writer: preconditions (DB exists / not locked / schema==12 / better-sqlite3 loadable), FK-ordered INSERT … ON CONFLICT(id) DO NOTHING.
  • commands/migrate.ts — offline orchestration; --dry-run; summary.

Testing

  • Pure mappers (project + session, state 8→4, agent_session_id selection, num parse, slug computation) — unit.
  • Offline writer against a temp DB built from the rewrite v12 migrations — integration (better-sqlite3).
  • Transcript relocation: slug computation + file copy with temp dirs (real claude --resume-after-copy already validated manually).

Acceptance criteria

  • ao migrate (daemon stopped) inserts projects + non-terminated sessions (verbatim session ids); idempotent on re-run.
  • Refuses clearly when DB missing / locked / schema != 12 / better-sqlite3 unavailable.
  • --dry-run prints the plan with no writes.
  • Codex & OpenCode sessions resume from carried ids; Claude sessions resume after transcript relocation.

Risks

  • Schema coupling to rewrite v12 (mitigated by the hard version guard).
  • Sole-writer bypass — acceptable for a one-time, daemon-stopped migration; never run with the daemon up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions