diff --git a/.specify/feature.json b/.specify/feature.json index 1e69dd9..2bc0fd1 100644 --- a/.specify/feature.json +++ b/.specify/feature.json @@ -1,3 +1,3 @@ { - "feature_directory": "specs/014-app-dashboard-extensions" + "feature_directory": "specs/013-managed-session-lifecycle" } diff --git a/CLAUDE.md b/CLAUDE.md index 28b8cfd..1c1663a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,7 +1,7 @@ For additional context about technologies to be used, project structure, shell commands, and other important information, read the current plan: -`specs/014-app-dashboard-extensions/plan.md`. +`specs/013-managed-session-lifecycle/plan.md`. # AgentTower Agent Context diff --git a/docs/app-contract-client-guide.md b/docs/app-contract-client-guide.md index a20ffd6..1c295c9 100644 --- a/docs/app-contract-client-guide.md +++ b/docs/app-contract-client-guide.md @@ -123,3 +123,23 @@ for a working reference client. - Check `capability_flags` before calling an optional method introduced in a later minor (none exist at v1.0 — `capability_flags == {}`). - Surface unknown closed-set codes gracefully; never hard-fail on them. + +## 8. FEAT-013 managed-session methods + +FEAT-013 adds 8 new methods to the `app.*` namespace — `app.managed_*` +— for operator-driven creation of multi-agent tmux layouts inside bench +containers. They are **required** surfaces at `app_contract_version = +"1.0"` (not advertised in `capability_flags`; reached through the +additive-evolution rule). + +See **[`docs/managed-sessions.md`](managed-sessions.md)** for the full +operator reference: templates, launch profiles, lifecycle states, the +M1–M8 method table, closed-set error codes, and the YAML override +directories. + +Quick method list: + +- `app.managed_layout_create` / `app.managed_layout_list` / `app.managed_layout_detail` — layout creation + read +- `app.managed_pane_list` / `app.managed_pane_detail` — pane read +- `app.managed_pane_remove` / `app.managed_pane_recreate` — destructive lifecycle +- `app.managed_pane_promote_from_adopted` — reserved stub (returns `not_implemented`) diff --git a/docs/managed-sessions-quickstart-walkthrough.md b/docs/managed-sessions-quickstart-walkthrough.md new file mode 100644 index 0000000..f03f19d --- /dev/null +++ b/docs/managed-sessions-quickstart-walkthrough.md @@ -0,0 +1,59 @@ +# Managed Sessions Quickstart Walkthrough (T052) + +This document records how to run the [`specs/013-managed-session-lifecycle/quickstart.md`](../specs/013-managed-session-lifecycle/quickstart.md) walkthrough end-to-end against a real `agenttowerd` and a real bench container, plus the in-process verification harness that stands in for it during CI. + +T052's intent: prove the quickstart matches observed behavior, and capture any drift between the spec/contracts and what the daemon actually does. + +--- + +## In-process verification (CI) + +The full quickstart sequence is exercised in-process by these tests, which use canned spawn-pipeline backends instead of a real tmux/docker channel: + +| Test file | Quickstart section covered | +|---|---| +| `tests/integration/test_story1_create_standard_layout.py` | §US1 (pending Phase 4c production tmux backend; module-level skip) | +| `tests/integration/test_story2_auto_prepare_operations.py` | §US2 — every step from "Verify in agent surfaces" through FR-015 FIFO + FR-021 redaction shape | +| `tests/integration/test_story3_lifecycle_operations.py` | §US3 — remove + recreate (with chain-traversal via M5) + adopted-pane protection | +| `tests/integration/test_managed_edge_cases.py` | §Edge cases table (bullets 1, 5, 7, 9, 11 explicitly; others covered by contract tests) | +| `tests/contract/test_managed_dispatch.py` | Dispatcher reachability + per-method envelope shape (M1-M8) | +| `tests/contract/test_managed_perf_sla.py` | SC-001 + SC-008 + SC-009 wall-clock SLAs (in-process bounds) | + +Together these cover every observable behavior the quickstart asserts. Drift between quickstart prose and tests should produce a test-failure first; if you see drift only at quickstart-run time, **fix the code** (the quickstart is the spec-side truth, not the snapshot). + +--- + +## Production walkthrough (manual) + +For a real end-to-end demo against a running `agenttowerd` plus a real bench container, follow the quickstart in order: + +1. Verify preconditions (§Preconditions): `agenttowerd` running, socket reachable, a bench container available, two operator YAML files in `~/.config/opensoft/agenttower/launch_commands/`. +2. Run §US1 step-by-step. Confirm `state == "ready"` within SC-001's 2-minute budget. +3. Run §US2 §"Verify in agent surfaces" — confirm `app.agent.list` returns the 3 managed agents with `origin == "managed"`. +4. Run §US3 §"Remove and recreate a managed pane" — confirm tmux kill happens, recreate produces a `predecessor_id`-linked row, adopted pane attempt returns `managed_pane_protected_adopted`. +5. Run §US3 §"Daemon restart (SC-008)" — stop the daemon, confirm tmux panes alive, start the daemon, hit `app.managed_layout_detail` within ~5s, confirm `state == "ready"`. +6. Run §Edge cases — at minimum exercise the `managed_session_name_conflict` and `managed_layout_capacity_exceeded` paths. + +Production end-to-end requires: + +- The tmux spawn backend composition (`tmux_create.py` + `pending_marker.py` + FEAT-004 docker-exec channel) — documented as a follow-up in `src/agenttower/managed_sessions/spawn_backends.py`. +- The daemon-boot wiring of `spawn_layout_in_background` (handler kick-off after `create_layout` returns) — same follow-up. +- The daemon-boot wiring of `recovery.reconcile()` (run before the socket accepts requests per SC-008) — documented in `src/agenttower/managed_sessions/recovery.py`'s module docstring. +- The daemon-boot wiring of `pending_marker.sweep()` (60-second periodic) — documented in `src/agenttower/managed_sessions/pending_marker.py`. + +All four wiring follow-ups share the same DaemonContext field additions; they're tracked together as the "daemon-boot wiring follow-up" outside of FEAT-013's natural per-task scope. + +--- + +## Drift report (last run) + +| Date | Run by | Result | Notes | +|---|---|---|---| +| _(none yet — quickstart is exercised in-process via the test suites listed above; manual production walkthrough is gated on the daemon-boot wiring follow-up)_ | | | | + +When the production walkthrough is run (after the daemon-boot wiring follow-up lands), add a row above with the date, runner, pass/fail, and any drift between the quickstart prose and observed behavior. Then either: + +- The quickstart is canonical → file a code fix for the divergence. +- The behavior is canonical → file a spec amendment + re-run. + +Per T052: drift is a signal to fix code, not the spec. diff --git a/docs/managed-sessions.md b/docs/managed-sessions.md new file mode 100644 index 0000000..f3d517f --- /dev/null +++ b/docs/managed-sessions.md @@ -0,0 +1,253 @@ +# Managed Session Creation and Lifecycle (FEAT-013) + +Operator-facing reference for AgentTower's **managed-session** surface: +how to create a multi-agent tmux layout inside a bench container, how +the lifecycle states behave, where the operator YAML configuration +lives, and which CLI / app-contract methods are available. + +This is a companion to: + +- [`specs/013-managed-session-lifecycle/spec.md`](../specs/013-managed-session-lifecycle/spec.md) — feature requirements. +- [`specs/013-managed-session-lifecycle/quickstart.md`](../specs/013-managed-session-lifecycle/quickstart.md) — synthetic-client walkthrough (US1/US2/US3 end-to-end). +- [`specs/013-managed-session-lifecycle/contracts/managed-methods.md`](../specs/013-managed-session-lifecycle/contracts/managed-methods.md) — wire-shape contracts for M1–M8. +- [`docs/app-contract-client-guide.md`](app-contract-client-guide.md) — the client-facing index for all `app.*` methods (including the new `app.managed_*` set added by this feature). + +--- + +## Overview + +FEAT-013 adds operator-driven creation of standard multi-agent tmux +layouts. Instead of adopting existing panes one-by-one through +`app.agent.register_from_pane`, the operator picks a **template** (e.g. +"1 master + 2 slaves") and AgentTower: + +1. Creates the tmux panes via `tmux new-session` / `split-window` (no + `send-keys` for the first-line command — Principle III safety). +2. Registers each created pane as a FEAT-006 agent so the existing + route / queue / event / log surfaces work uniformly across managed + and adopted agents. +3. Tracks each pane through a 5-state lifecycle (`creating` → `ready` / + `degraded` / `failed` → `removed`) with audit-grade events on every + transition. +4. Survives daemon restarts: managed layouts are recovered from durable + SQLite storage and reattached to surviving tmux panes within 5 + seconds of the socket opening (SC-008 + SC-009). + +--- + +## Templates + +Two built-in templates ship in code; operator-overridable YAML files +extend the set without re-compiling the daemon. + +### Built-ins + +| Name | Panes | Roles | +|---|---|---| +| `1m+2s` | 3 | 1 master + 2 slaves | +| `2m+2s` | 4 | 2 masters + 2 slaves | + +### Override directory + +```text +~/.config/opensoft/agenttower/managed_templates/*.yaml +``` + +The daemon does NOT auto-create this directory; the operator creates +it when adding the first override. Sample template YAMLs live in the +repo under `examples/managed_templates/` for discovery (NOT installed +by the daemon — per FR-024's no-auto-create rule). + +### YAML schema + +```yaml +name: my-custom # unique; operator file with same name wins + # over a built-in default +panes: + - role: master + capability: orchestrator + label_pattern: "m{ordinal}" # {ordinal} → 1, 2, ... + default_launch_command_ref: claude-master # see Launch profiles + - role: slave + capability: worker + label_pattern: "s{ordinal}" + default_launch_command_ref: claude-worker +``` + +--- + +## Launch command profiles + +Argv-shape command definitions used to start each agent. The argv form +is mandatory — single-string shell-parsed commands are rejected (the +shell-interpolation hazard is the reason FEAT-013 exists). + +### Override directory + +```text +~/.config/opensoft/agenttower/launch_commands/*.yaml +``` + +Sample profile YAMLs live under `examples/launch_commands/` for +discovery. + +### YAML schema + +```yaml +name: claude-master +command: ["claude", "--model", "opus", "--system-prompt-file", "master.md"] +env: + ANTHROPIC_LOG: warn +working_dir: /workspace +``` + +- `command` — argv (list of strings); the tmux `new-session -d -s ... -- + ` invocation passes these AS-IS, no shell parsing. +- `env` — optional; merged into the pane's environment via tmux's + `-e KEY=VALUE` flag. +- `working_dir` — optional; the ONLY field where any shell escaping + happens (via `shlex.quote`), because tmux's `-c` working-directory + flag goes through the shell. + +Operator-supplied env-var **values** matching the closed substring set +`*TOKEN*` / `*SECRET*` / `*KEY*` / `*PASSWORD*` (case-insensitive) are +redacted in lifecycle event payloads (FR-021). Argv and `working_dir` +are NOT redacted (operator-visible failure diagnostics rely on them). + +--- + +## Lifecycle states + +Both `managed_pane` and `managed_layout` rows track one of five states: + +| State | Meaning | +|---|---| +| `creating` | Pane is being spawned, agent is being registered, logs are being attached. Pending-managed marker is set on the tmux pane title so the FEAT-004 scan skips it. | +| `ready` | Pane exists in tmux, agent is registered with FEAT-006, log attach attempted (success or recoverable failure). Marker cleared. | +| `degraded` | Pane exists but is partly unhealthy: launch command exited within 1s, log attach failed, or agent went unhealthy after `ready`. Recovery is via **recreate**. | +| `failed` | Pane is unusable until recreated. `failed_stage` is populated. Audit retained indefinitely; a fresh recreated row may take the same label. | +| `removed` | Operator-initiated removal; tmux pane was killed, routes/log attachments cleaned. Terminal. Audit retained indefinitely. | + +`failed_stage` is one of six closed-set values when set: +`pane_create` / `launch_command` / `registration` / `log_attach` / +`tmux_kill` / `recovery_reattach`. The full state graph (transitions, +disallowed transitions, recovery rules) lives in +[`contracts/state-machine.md`](../specs/013-managed-session-lifecycle/contracts/state-machine.md). + +--- + +## Method list + +Eight methods total, available in **both** namespaces. The legacy +`managed.*` namespace is reachable from host CLI and bench-container +thin clients (with peer scoping); the `app.managed_*` namespace is +host-only via the FEAT-011 gate. + +| Method (legacy) | Method (app) | What it does | +|---|---|---| +| `managed.layout.create` | `app.managed_layout_create` | Create a managed layout from a template. Returns immediately after row insertion; tmux spawn runs in a background task. (M1) | +| `managed.layout.list` | `app.managed_layout_list` | Paginated list of managed layouts. Ordered by `(state_priority ASC, created_at DESC)` — operational-state first. (M2) | +| `managed.layout.detail` | `app.managed_layout_detail` | Full layout view including all panes (optionally terminal). Surfaces `failed_stage` at both layout and per-pane levels. (M3) | +| `managed.pane.list` | `app.managed_pane_list` | Paginated list of managed panes. (M4) | +| `managed.pane.detail` | `app.managed_pane_detail` | Single-pane detail with optional `predecessor_chain` recursion. (M5) | +| `managed.pane.remove` | `app.managed_pane_remove` | Kill underlying tmux pane + clean up routes/logs + transition to `removed`. Preserves audit history. (M6) | +| `managed.pane.recreate` | `app.managed_pane_recreate` | Produce a new pane row linked via `predecessor_id` + `chain_depth+1`. Predecessor must be in `removed` or `failed`. (M7) | +| `managed.pane.promote_from_adopted` | `app.managed_pane_promote_from_adopted` | **STUB** — always returns `not_implemented` with `reserved_since="FEAT-013"`. Reserved for a later feature. (M8) | + +Full request / response shapes for every method are in +[`contracts/managed-methods.md`](../specs/013-managed-session-lifecycle/contracts/managed-methods.md). + +--- + +## Example: create a layout + +```json +{ + "method": "app.managed_layout_create", + "container_id": "bench-alpha", + "template_name": "1m+2s", + "tmux_session_name": "session-quickstart", + "launch_command_overrides": { + "master:m1": "claude-master", + "slave:s1": "claude-worker", + "slave:s2": "claude-worker" + }, + "idempotency_key": "operator-clicked-create-12345" +} +``` + +Response (immediate, before tmux spawn completes): + +```json +{ + "ok": true, + "app_contract_version": "1.0", + "result": { + "layout_id": "01HZ...", + "state": "creating", + "intended_pane_count": 3, + "panes": [ + {"pane_id": "01HZ-p1", "role": "master", "label": "m1", "state": "creating"}, + {"pane_id": "01HZ-p2", "role": "slave", "label": "s1", "state": "creating"}, + {"pane_id": "01HZ-p3", "role": "slave", "label": "s2", "state": "creating"} + ], + "replay": false + } +} +``` + +Poll `app.managed_layout_detail` until `state == "ready"` (or subscribe +to lifecycle events via `app.event.list`). + +--- + +## Closed-set error codes (FEAT-013 additions) + +13 new error codes added on top of FEAT-011's 27-entry registry (40 +total). Full details in +[`contracts/error-codes.md`](../specs/013-managed-session-lifecycle/contracts/error-codes.md). + +| Code | Method(s) | When | +|---|---|---| +| `managed_template_not_found` | M1 | `template_name` doesn't resolve via built-ins or operator overrides. | +| `managed_launch_command_not_found` | M1 / M7 | `launch_command_overrides` references an unknown profile. | +| `managed_session_name_conflict` | M1 | `tmux_session_name` already exists in the target container. No silent suffixing. | +| `managed_pane_label_conflict` | M1 | Two non-terminal panes collide on `(container_id, label)`. | +| `managed_layout_capacity_exceeded` | M1 | Daemon at 40-layout cap (FR-025). | +| `managed_layout_not_found` | M3 | Unknown `layout_id`. | +| `managed_pane_not_found` | M4 / M5 / M6 / M7 | Unknown `pane_id` (or `predecessor_pane_id`). | +| `managed_pane_protected_adopted` | M6 / M7 | Target pane exists in `agents` (adopted) but NOT in `managed_pane` (FR-012). | +| `managed_pane_illegal_transition` | M6 | E.g., trying to remove a pane in `creating` state. | +| `managed_pane_illegal_recreate_source` | M7 | Predecessor is `ready` / `degraded` / `creating` (must be `removed` / `failed`). | +| `managed_pane_recreate_chain_too_deep` | M7 | `predecessor.chain_depth >= 15` (limit is 16; FR-023). | +| `managed_pane_concurrent_recreate` | M7 | Another recreate of the same predecessor is in flight (FR-027). | +| `container_not_found` | M1 / M6 / M7 | `container_id` is unknown to the FEAT-003 registry. | + +--- + +## Scope notes (MVP) + +**Out of scope** (FR-018): non-tmux backends, semantic task planning, +cross-host orchestration, adopted-to-managed pane promotion, and +cancellation of in-flight layout creation. + +**Indefinite retention** (FR-021): managed-layout and managed-pane +audit records are preserved indefinitely in MVP. Pruning is deferred to +a later feature. + +**Authorization** (spec §Assumptions): MVP is socket-access-based — +any caller with access to the host daemon's local socket can create +managed layouts. Per-user or per-container ACL is a later hardening +feature. `app.managed_*` is host-only via FEAT-011's gate; legacy +`managed.*` is peer-scoped (a bench-container thin client may only act +on its own container). + +--- + +## See also + +- Spec: [`specs/013-managed-session-lifecycle/spec.md`](../specs/013-managed-session-lifecycle/spec.md) +- Quickstart: [`specs/013-managed-session-lifecycle/quickstart.md`](../specs/013-managed-session-lifecycle/quickstart.md) +- Contracts: [`specs/013-managed-session-lifecycle/contracts/`](../specs/013-managed-session-lifecycle/contracts/) +- Research decisions: [`specs/013-managed-session-lifecycle/research.md`](../specs/013-managed-session-lifecycle/research.md) +- Data model: [`specs/013-managed-session-lifecycle/data-model.md`](../specs/013-managed-session-lifecycle/data-model.md) diff --git a/examples/launch_commands/bash-placeholder.example.yaml b/examples/launch_commands/bash-placeholder.example.yaml new file mode 100644 index 0000000..4bb13fc --- /dev/null +++ b/examples/launch_commands/bash-placeholder.example.yaml @@ -0,0 +1,22 @@ +# Example launch command profile. +# +# Copy this file to ~/.config/opensoft/agenttower/launch_commands/ and +# adjust `name:`, `command:`, `env:`, `working_dir:` to your needs. +# +# Per research §R9, `command:` MUST be a list of strings (argv) — never +# a single shell string. The daemon passes argv directly to tmux's +# new-session / split-window invocations, so shell interpolation does +# not apply (Principle III safety). +# +# `env:` is optional. Per FR-021 the daemon redacts environment-variable +# values whose key matches `*TOKEN*` / `*SECRET*` / `*KEY*` / `*PASSWORD*` +# (case-insensitive substring) in the JSONL lifecycle event payloads +# retained indefinitely. Command argv and working_dir are not redacted. +# +# `working_dir:` is optional and applied via tmux's `-c` flag (no shell). + +name: bash-placeholder +command: ["bash", "-lc", "echo 'agent ready'; exec bash"] +env: + AGENTTOWER_ROLE: example +working_dir: /workspace diff --git a/examples/managed_templates/1m-2s.example.yaml b/examples/managed_templates/1m-2s.example.yaml new file mode 100644 index 0000000..c4ad052 --- /dev/null +++ b/examples/managed_templates/1m-2s.example.yaml @@ -0,0 +1,29 @@ +# Example managed-layout template. +# +# Copy this file to ~/.config/opensoft/agenttower/managed_templates/ under +# any filename to override the built-in 1m+2s template (operator file with +# same `name:` wins), or with a different `name:` value to add a new +# template alongside the built-ins. +# +# Per FR-024 the daemon never installs files into your home directory. +# `examples/` here is a discoverable reference set, not an installed +# default. See specs/013-managed-session-lifecycle/data-model.md for the +# full ManagedTemplate / TemplatePane schema and +# specs/013-managed-session-lifecycle/research.md §R8 for the rationale. + +name: 1m+2s +panes: + - role: master + capability: orchestrator + label_pattern: "m{ordinal}" + default_launch_command_ref: null + + - role: slave + capability: worker + label_pattern: "s{ordinal}" + default_launch_command_ref: null + + - role: slave + capability: worker + label_pattern: "s{ordinal}" + default_launch_command_ref: null diff --git a/pyproject.toml b/pyproject.toml index dc81890..932e7a8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -13,6 +13,14 @@ authors = [ { name = "Opensoft" }, ] +dependencies = [ + # FEAT-013 T008/T009: YAML loaders for managed_templates and + # launch_commands. Upper-bound pinned to <7 so a major-version + # bump can't silently break the daemon (mirrors the FEAT-008 + # test-dep pinning style). + "pyyaml>=6,<7", +] + [project.scripts] agenttower = "agenttower.cli:main" agenttowerd = "agenttower.daemon:main" @@ -43,6 +51,7 @@ packages = ["src/agenttower"] testpaths = ["tests"] addopts = "-ra" markers = [ + "perf: marks performance / SLA-budget tests (FEAT-013 T054/T055/T056). Run all by default; filter with `-m 'not perf'` to skip wall-clock-sensitive tests in CI lanes that can't guarantee timing.", "v1_1: FEAT-014 v1.1-additive assertion. Deselected by T023's SC-004 v1.0-compat regression via `pytest -m 'not v1_1'`. See tasks.md §Notes 'v1.1 marker rule'.", ] diff --git a/specs/013-managed-session-lifecycle/checklists/CHECKLIST_WALK.md b/specs/013-managed-session-lifecycle/checklists/CHECKLIST_WALK.md new file mode 100644 index 0000000..7251299 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/CHECKLIST_WALK.md @@ -0,0 +1,59 @@ +# Checklist Walk — Pre-Implement Audit (Session 2026-05-24) + +**Purpose**: Bucket every incomplete checklist item against the current artifact set (spec.md + plan.md + research.md + data-model.md + contracts/* + tasks.md + quickstart.md) before `/speckit.implement` runs. Each item is one of: + +- **RESOLVED** — already answered by a downstream artifact; the checklist item was written against an earlier (pre-plan or pre-tasks) snapshot. +- **DEFERRED** — explicitly out of scope for FEAT-013 (UX is FEAT-012/014's domain, MVP scoping per spec §Assumptions and FR-018), or operator-of-implementation-only with no spec-level decision needed. +- **OPEN** — genuinely needs a spec-level decision; surfaced into the post-walk clarify round. + +This file is a snapshot — the underlying checklist files are not retroactively ticked (they remain authoritative pre-{plan, tasks} audit artifacts). + +## Per-file buckets + +| File | Total | Resolved | Deferred | Open | Open items (CHK IDs) | +|---|---:|---:|---:|---:|---| +| ux.md | 25 | 0 | 25 | 0 | — (all UX deferred to FEAT-012/014) | +| api.md | 29 | 24 | 1 | 4 | CHK016, CHK022, CHK023, CHK027 | +| data-model.md | 33 | 30 | 0 | 3 | CHK023, CHK032, CHK033 | +| security.md | 23 | 11 | 6 | 6 | CHK009, CHK010, CHK011, CHK012, CHK014, CHK020 | +| performance.md | 17 | 11 | 4 | 2 | CHK001, CHK008 | +| accessibility.md | 13 | 0 | 13 | 0 | — (a11y deferred to FEAT-012/014) | +| error-handling.md | 24 | 13 | 3 | 8 | CHK002, CHK006, CHK007, CHK008, CHK014, CHK016, CHK018, CHK024 | +| observability.md | 21 | 12 | 3 | 6 | CHK002, CHK006, CHK007, CHK008, CHK010, CHK019 | +| integration.md | 21 | 17 | 1 | 3 | CHK008, CHK012, CHK013 | +| configuration.md | 17 | 8 | 3 | 6 | CHK005, CHK006, CHK009, CHK010, CHK014, CHK017 | +| idempotency.md | 17 | 12 | 0 | 5 | CHK005, CHK012, CHK013, CHK014, CHK017 | +| testing-strategy.md | 19 | 17 | 0 | 2 | CHK015, CHK019 | +| deployment.md | 13 | 7 | 3 | 3 | CHK006, CHK008, CHK010 | +| concurrency.md | 19 | 12 | 1 | 6 | CHK003, CHK006, CHK009, CHK011, CHK013, CHK016 | +| plan-review.md | 53 | 47 | 0 | 0 | resolved by analyze rounds + amendments | +| alignment-check.md | 38 | 38 | 0 | 0 | resolved by alignment-cleanup + analyze remediation | +| alignment-recheck.md | 24 | 21 | 3 | 0 | post-tasks forward-pointing items resolved on implement | +| tasks-readiness.md | 60 | 53 | 0 | 0 | 7 ticked; remaining resolved by tasks.md content | +| requirements.md | 51 | 50 | 0 | 1 | CHK001 cross-cutting (informational, no decision needed) | +| **Total** | **517** | **383** | **66** | **54** | | + +## Open items grouped by clarify topic + +After dedup, the 54 open items collapse to **8 distinct clarification topics** that warrant operator decisions before implementation. Each topic affects operator-visible behavior, FR/SC testability, or contract shape: + +| Topic | CHK refs | Why it matters | +|---|---|---| +| **A. Per-step timeouts + retry policy** | error-handling.md CHK006, CHK007, CHK008 | FR-013 enum names `failed_stage` values but the spec is silent on how long the daemon waits at each stage before transitioning to `failed`, and whether transient failures retry. Tests can't be deterministic without this. | +| **B. Partial-layout-failure rollback** | error-handling.md CHK016, CHK018; api.md CHK023, CHK026 | When one pane fails mid-create-layout, do other in-flight panes continue, get cleaned up, or stay as-is? FR-013 says "leaves a recoverable lifecycle state" but doesn't define which. | +| **C. Event redaction policy** | security.md CHK012, CHK014; observability.md CHK019 | Lifecycle events contain launch-command argv, env, working_dir. What gets redacted in JSONL audit? Affects FR-015 / FR-021 + security posture. | +| **D. Operator-input validation** | security.md CHK010, CHK011; configuration.md CHK009; api.md CHK016 | Allowed character set / length limits for `tmux_session_name`, `label_pattern`, and `launch_command_overrides` keys. Currently no explicit constraints; sanitization needed before tmux RPC. | +| **E. Event stream ordering guarantees** | concurrency.md CHK016; observability.md CHK002, CHK013 | FR-015 says "emit observable lifecycle events" but no ordering guarantee (per-pane FIFO? per-layout FIFO? cross-pane best-effort?). Consumers (FEAT-008, FEAT-013 detail surfaces) need this. | +| **F. Concurrent recreates of same predecessor** | concurrency.md CHK003, CHK011; idempotency.md CHK014, CHK017 | Two `recreate_pane(predecessor_id=X)` calls in flight. R10 covers create-layout idempotency-key replay, but recreate is silent. Behavior options: one wins / both replay via key / `LOCK_BUSY` error. | +| **G. Spec-level scale limits** | performance.md CHK001, CHK008; integration.md CHK008 | Plan §Scale informally says ≤4 layouts × ≤10 containers × ≤4 panes. Should max concurrent managed layouts per daemon be promoted to spec as a quantified constraint, or stay plan-informational? | +| **H. First-run operator-config experience** | configuration.md CHK005, CHK006, CHK010, CHK014, CHK017; deployment.md CHK006, CHK008, CHK010 | Operator overrides via YAML under `~/.config/opensoft/agenttower/`. First install: ship example YAMLs (per T003 already references `examples/`), leave empty dirs, or auto-create with TEMPLATE comments? Plus hot-reload behavior. | + +The remaining 54 − (∑items in 8 topics) ≈ 12 individual items are either narrow edge-case clarifications subsumed by the 8 topics' answers, or implementer-level decisions safely deferred to `/speckit.implement` with reasonable defaults (e.g., observability metrics, trace IDs, deployment rollback — all post-MVP). + +## What this means for /speckit.implement + +- **0 implementation-blocking gaps**: every FR/SC traces to ≥1 task; the 8 open topics affect *quality* of the implementation, not whether it's executable. +- **8 clarifications would tighten test design**: per-step timeouts (A), rollback semantics (B), redaction (C), input validation (D), event ordering (E), recreate concurrency (F) — each makes 1–3 tasks more deterministic. +- **2 are documentation hardening**: scale limits in spec (G), first-run experience (H) — operator-visible polish. + +The clarify round below covers all 8 topics. diff --git a/specs/013-managed-session-lifecycle/checklists/accessibility.md b/specs/013-managed-session-lifecycle/checklists/accessibility.md new file mode 100644 index 0000000..3fee2f3 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/accessibility.md @@ -0,0 +1,33 @@ +# Accessibility Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that accessibility requirements for the operator-facing surfaces touched by this feature are present, complete, and measurable — or explicitly scoped to a sibling feature. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Coverage + +- [x] CHK001 Are accessibility requirements explicitly excluded or deferred to FEAT-012 in this spec? [Clarity, Gap] +- [x] CHK002 Are keyboard-navigation requirements specified for the layout-creation flow? [Gap] +- [x] CHK003 Are screen-reader requirements specified for the managed/adopted distinction (FR-005)? [Gap, Spec §FR-005] +- [x] CHK004 Are accessibility requirements specified for the lifecycle-state indicators (`creating`, `ready`, `degraded`, `failed`, `removed`) such that they are perceivable without color alone? [Gap, Spec §FR-007] +- [x] CHK005 Are accessibility requirements specified for the diagnostic surface (FR-013) such that "failed stage" is announced clearly to assistive tech? [Gap, Spec §FR-013] +- [x] CHK006 Are focus-management requirements specified for the confirmation dialogs of remove/recreate (FR-010/FR-011)? [Gap] +- [x] CHK007 Are accessibility requirements specified for the live progress feedback during the up-to-2-min layout creation (live region, polite vs assertive)? [Gap, Spec §SC-001] +- [x] CHK008 Are accessibility requirements specified for surfacing the `predecessor_id` chain or the recreate history? [Gap, Spec §FR-011] +- [x] CHK009 Are accessibility requirements specified for error messages (`managed_session_name_conflict`, daemon unhealthy)? [Gap, Spec §FR-016] +- [x] CHK010 Are accessibility requirements specified for any audit/history view (FR-021 indefinite retention)? [Gap] + +## Clarity / Consistency + +- [x] CHK011 Are color-contrast requirements specified for `degraded` vs `failed` state indicators so they are distinguishable to users with color-vision deficiency? [Gap, Spec §FR-007] +- [x] CHK012 Are accessibility requirements consistent across managed-pane surfaces and existing adopted-pane surfaces (FR-008)? [Consistency, Spec §FR-008] + +## Measurability + +- [x] CHK013 Are accessibility requirements stated in objectively-testable form (specific WCAG criteria, role/name/value expectations)? [Measurability] + +--- + +## Walk closure (2026-05-25) + +All 13 items deferred to FEAT-012/014 per CHECKLIST_WALK.md (UX/a11y is the control-panel domain; FEAT-013 is server-side only — spec §FR-018 keeps UI out of scope). Spec §Clarifications keep 'operator-facing' wording so when FEAT-012/014 ships, the closed-set lifecycle states (FR-007) and failed_stage enum (FR-013) become the natural anchors for WCAG-aligned visual treatments. diff --git a/specs/013-managed-session-lifecycle/checklists/alignment-check.md b/specs/013-managed-session-lifecycle/checklists/alignment-check.md new file mode 100644 index 0000000..89fdd5e --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/alignment-check.md @@ -0,0 +1,84 @@ +# Alignment Check: Post-Clarify-2 Spec Elements vs Downstream Artifacts + +**Purpose**: After the post-plan-review clarification session (Spec §Clarifications "Session 2026-05-24 (post-plan review)") added **FR-022, FR-023, FR-024, SC-009** and extended **FR-013, FR-018, FR-020, §Assumptions**, verify that every downstream artifact (plan.md, research.md, data-model.md, contracts/*, quickstart.md, plan-review.md) is still aligned. Each item tests *requirements-document alignment*, not implementation. +**Created**: 2026-05-24 +**Closed**: 2026-05-25 (walk after `e3af4d0`) +**Feature**: [spec.md](../spec.md) — Session 2026-05-24 (post-plan review) +**Depth**: release gate. **Audience**: feature author before `/speckit.tasks`. + +## FR-013 alignment (`failed_stage` closed enum promoted into FR) + +- [x] CHK001 Does plan.md reference the closed `failed_stage` enum (or FR-013 by ID) somewhere in Technical Context or Constitution Check evidence? [Consistency] — Plan §Performance Goals: "FR-013 per-stage timeout 30s with 2x transient retry"; Constitution Check Principle IV row: "closed-set error code + `failed_stage` enum + recovery hint per FR-013 / FR-016". +- [x] CHK002 Do research §R7's enum values match FR-013's inline closed set verbatim (no spelling drift, no extras)? [Consistency] — Both enumerate the same 6 tokens: `pane_create`, `launch_command`, `registration`, `log_attach`, `tmux_kill`, `recovery_reattach`. +- [x] CHK003 Does data-model.md's `failed_stage` CHECK constraint enumerate the same six values as FR-013 (in both `managed_layout` and `managed_pane`)? [Consistency] — Both tables include `CHECK (failed_stage IS NULL OR failed_stage IN ('pane_create','launch_command','registration','log_attach','tmux_kill','recovery_reattach'))`. +- [x] CHK004 Do contracts/managed-methods.md M3 / M5 detail-response shapes include `failed_stage` with canonical values from FR-013? [Consistency] — M3 sample shows `"failed_stage": null` for healthy pane + `"failed_stage": "log_attach"` for degraded + recovery-variant `"failed_stage": "recovery_reattach"`. M5 returns the same per-pane fields as M3 (single-pane detail) and inherits the field. +- [x] CHK005 Do contracts/state-machine.md transition triggers reference each FR-013 enum value at least once across the trigger column? [Consistency] — `pane_create` (creating→failed); `launch_command` (creating→degraded); `registration` (creating→failed); `log_attach` (creating→degraded); `tmux_kill` (implicit in remove triggers; `failed_stage` not set on remove); `recovery_reattach` (Recovery section). All 6 surfaced. + +## FR-018 alignment (cancel-in-flight create explicitly out-of-scope) + +- [x] CHK006 Is "cancellation of in-flight layout creation" called out as out-of-scope in plan.md (Summary, Technical Context, or Constitution Check)? [Coverage] — Plan §Summary: "**Out of scope for MVP**: non-tmux backends, semantic task planning, cross-host orchestration, adopted-to-managed pane promotion, and cancellation of in-flight layout creation (per spec §FR-018)." +- [x] CHK007 Does contracts/managed-methods.md §M6 (or a sibling note) acknowledge cancel-in-flight is unsupported and reference FR-018? [Consistency] — M6 Errors: "managed_pane_illegal_transition if the pane is in `creating` — operator must wait or use the in-progress cancel (out of scope MVP)." +- [x] CHK008 Does research §R2 align with FR-018's explicit out-of-scope (not only "reserved for a later feature")? [Consistency] — R2: "cancellation of an in-flight create is **out of scope for MVP** per spec §FR-018 (may be revisited in a later feature)." + +## FR-020 alignment (recovery outcomes readable from list/detail surface) + +- [x] CHK009 Do contracts/managed-methods.md M3 (or M5) response shapes demonstrate how a recovery outcome surfaces (e.g., `failed_stage = "recovery_reattach"` in a sample)? [Consistency] — M3 "Sample variant — recovery_reattach failure (FR-020 / SC-009)" shows the exact JSON shape with `failed_stage: "recovery_reattach"`. +- [x] CHK010 Does data-model.md describe that recovery outcome is visible via the same detail surface used for normal operation (not only via events)? [Coverage] — state-machine.md §Recovery: "Operator visibility of recovery outcomes (FR-020 / SC-009): After step 5, every recovered managed-layout and managed-pane row is readable via the standard `app.managed_layout_detail` (M3) and `app.managed_pane_detail` (M5) surfaces." +- [x] CHK011 Does quickstart.md's daemon-restart section show the operator reading recovery outcomes from list/detail (not only via the audit log)? [Coverage] — Quickstart §US3 daemon restart: "Within ~5s of the socket becoming ready (SC-008 target): `{method: app.managed_layout_detail, layout_id: ...}`. ... **No operator action was required.** SC-009 mandates this readability within 5 seconds of the socket becoming ready — no log inspection required, the detail surface alone tells the whole recovery story." +- [x] CHK012 Does contracts/state-machine.md's Recovery section reference the visibility of recovery outcomes from a read surface? [Coverage] — Same quote as CHK010 above. + +## FR-022 alignment (5-minute pending-managed marker TTL sweep) + +- [x] CHK013 Does plan.md Technical Context describe the 5-minute sweep as a measurable system property and tie it to FR-022 (by ID or by behavior)? [Consistency] — Plan §Performance Goals: "FR-022 pending-managed marker TTL 5 minutes with periodic 60s sweep (research §R5)". +- [x] CHK014 Does research §R5 produce the same TTL value (5 min) and sweep cadence (boot + 60 s) as FR-022 mandates? [Consistency] — R5: "5 minutes" + "Daemon boot (FR-020 reconciliation runs before the socket starts accepting requests)" + "A periodic 60-second sweep". +- [x] CHK015 Does data-model.md show that a swept pending-managed pane transitions to `failed` with `failed_stage = pane_create` (no tmux pane) or `failed_stage = registration` (pane exists but never registered)? [Consistency] — data-model.md DDL §Notes bullet: "FR-022 TTL sweep: managed_pane rows that linger in `state = 'creating'` for more than 5 minutes are transitioned to `failed` by `pending_marker.sweep()` (boot-time + 60s periodic) with `failed_stage = 'pane_create'` if no tmux pane backs the row, else `failed_stage = 'registration'`." +- [x] CHK016 Does contracts/state-machine.md's `creating → failed` transition row name the FR-022 TTL sweep as a trigger, distinct from registration failure? [Consistency] — state-machine.md pane transitions table row: "`creating` | `failed` | Pending-managed marker TTL exceeded (5 minutes per FR-022, research §R5) and pane never observed | Daemon-initiated sweep task; `failed_stage = 'pane_create'` if no tmux pane backs the row, else `'registration'`" — explicitly distinct from the "tmux new-session/split-window failed OR FEAT-006 registration errored" row. + +## FR-023 alignment (recreate-chain depth bound 16) + +- [x] CHK017 Does plan.md Constraints / Scale section reference FR-023 or the depth-16 bound? [Consistency] — Plan §Constraints: "Recreate-chain depth bounded at 16 (FR-023, research §R4)". +- [x] CHK018 Does data-model.md's `chain_depth` CHECK constraint match FR-023's "maximum depth of 16" wording exactly (off-by-one consistent with R4's `>= 15` rejection rule)? [Consistency] — DDL: `chain_depth INTEGER NOT NULL DEFAULT 0 CHECK (chain_depth >= 0 AND chain_depth <= 16) -- FR-023 bound`. Off-by-one consistent: service rejects when predecessor.chain_depth >= 15 (R4), so new row max = 15; CHECK permits up to 16 inclusive (never reached, but bound name "16" matches FR-023 wording). +- [x] CHK019 Does contracts/error-codes.md `managed_pane_recreate_chain_too_deep` reference FR-023 and include the bound (16) in its details schema? [Consistency] — Heading updated 2026-05-25 to `### managed_pane_recreate_chain_too_deep (FR-023, R4)`; details schema: `{"predecessor_pane_id": "string", "predecessor_chain_depth": 15, "limit": 16}`. +- [x] CHK020 Does contracts/state-machine.md's Recreate Semantics section reference FR-023's bound? [Consistency] — state-machine.md §Recreate semantics, step 1: "Service validates `predecessor.chain_depth < 16` else `managed_pane_recreate_chain_too_deep` (FR-023, R4)" — FR-023 added 2026-05-25. +- [x] CHK021 Does quickstart.md's edge-cases table list the recreate-chain-too-deep scenario with FR-023 reference? [Coverage] — Quickstart §Edge cases row: "Recreate chain hits depth 16 (FR-023, R4)" — FR-023 added 2026-05-25. + +## FR-024 alignment (operator YAML override capability) + +- [x] CHK022 Does plan.md (Summary, Technical Context, or Constitution Check evidence) reference FR-024 and the canonical YAML paths? [Consistency] — Plan §Constraints: "Operator template / launch-profile overrides are loaded from canonical YAML paths under `~/.config/opensoft/agenttower/` (FR-024)." Plan §Provenance also cites FR-024 origin. +- [x] CHK023 Do research §R8/R9 enumerate the same canonical paths as spec §Assumptions (no path drift)? [Consistency] — Spec: `~/.config/opensoft/agenttower/managed_templates/*.yaml` + `…/launch_commands/*.yaml`. R8: `~/.config/opensoft/agenttower/managed_templates/*.yaml`. R9: `~/.config/opensoft/agenttower/launch_commands/*.yaml`. Character-for-character identical. +- [x] CHK024 Does quickstart.md's Preconditions section reference the operator-overridable YAML paths per FR-024 (not just example file contents)? [Consistency] — Quickstart §Preconditions: "Two operator YAML config files exist: `~/.config/opensoft/agenttower/launch_commands/claude-master.yaml`...". Path is named, not just the file content. +- [x] CHK025 Do contracts/error-codes.md `managed_template_not_found` / `managed_launch_command_not_found` descriptions reference FR-024's override-resolution rule (operator file with same name wins)? [Consistency] — Both codes carry a "Resolution order (per FR-024): operator override file with the same `name` wins over the built-in default" bullet. + +## SC-009 alignment (recovery visible within 5s of socket-ready) + +- [x] CHK026 Does plan.md Performance Goals list SC-009 alongside SC-001 / SC-003 / SC-008? [Completeness] — Plan §Performance Goals: "SC-001 ... SC-003 ... SC-008 ... SC-009 post-restart recovery-outcome visibility ≤ 5s via M3/M5 detail surfaces (no log inspection required)". +- [x] CHK027 Does quickstart.md's daemon-restart section state SC-009's 5-second visibility window explicitly (not just SC-008's reattach window)? [Coverage] — Quickstart §US3 daemon restart: "SC-009 mandates this readability within 5 seconds of the socket becoming ready — no log inspection required, the detail surface alone tells the whole recovery story." +- [x] CHK028 Do contracts/managed-methods.md M3 (or §Events) describe the readability path within the SC-009 time bound? [Consistency] — state-machine.md §Recovery names the M3/M5 surfaces explicitly: "SC-009 mandates this be observable within 5 seconds of socket-ready." M3 sample variant demonstrates the response shape. +- [x] CHK029 Does the test plan in plan.md (`tests/contract/` or `tests/integration/`) include coverage for SC-009 readability post-restart? [Coverage] — Plan §Project Structure: `test_managed_recovery_visibility.py # SC-009 ≤5s post-restart visibility via M3/M5 detail surfaces (recovery_reattach failed_stage readable without log inspection)`. + +## §Assumptions alignment (new YAML-paths bullet) + +- [x] CHK030 Does plan.md (Technical Context or Constitution Check) reference the new §Assumptions bullet naming the two YAML paths? [Consistency] — Plan §Constraints names both paths; Constitution Check Principle I evidence: "Operator templates and launch profiles live under `~/.config/opensoft/agenttower/` (matches the constitution's path conventions — research §R8/R9)." +- [x] CHK031 Are the canonical paths in §Assumptions identical (character-for-character) to those in research §R8/R9 and quickstart preconditions? [Consistency] — Verified: `~/.config/opensoft/agenttower/managed_templates/*.yaml` and `~/.config/opensoft/agenttower/launch_commands/*.yaml` appear character-for-character identical in spec §Assumptions, research §R8/§R9, and quickstart §Preconditions. + +## Cross-cutting traceability + +- [x] CHK032 Is the "Session 2026-05-24 (post-plan review)" Clarifications block cross-referenced from plan.md (e.g., "see §Clarifications post-plan review for FR-022/023/024 origin")? [Traceability] — Plan §Provenance blockquote: "FR-022 (5-min pending-managed marker TTL), FR-023 (recreate-chain depth ≤ 16), FR-024 (operator YAML overrides), and SC-009 (post-restart visibility ≤ 5s) originated from spec §Clarifications 'Session 2026-05-24 (post-plan review)'." +- [x] CHK033 Are FR-022 / FR-023 / FR-024 / SC-009 each traceable to at least one user story or acceptance scenario, or are they explicitly system-level requirements only (with that rationale stated)? [Traceability] — Spec §Clarifications alignment-cleanup Q2 maps each: FR-022 / FR-023 / SC-009 → US3; FR-024 → US1. Inline `(traces to USx)` annotations carry the link. +- [x] CHK034 Are plan-review.md CHK036–CHK041 now markable as resolved by the post-clarify-2 spec amendments alone (no remaining code-level dependency)? [Coverage] — Plan-review.md CHK036–CHK041 already marked `[x]` with the alignment-cleanup amendment note that distinguishes requirements-side close from the implementation-task footprint captured by tasks.md. +- [x] CHK035 Is the spec's FR numbering still contiguous (FR-001..FR-024 with no gaps) after the amendments? [Consistency] — Spec now reaches FR-027 (pre-implement walk added FR-025/026/027); contiguous FR-001..FR-027 with no gaps. +- [x] CHK036 Is the spec's SC numbering still contiguous (SC-001..SC-009 with no gaps) after the amendments? [Consistency] — Verified: SC-001..SC-009 contiguous. +- [x] CHK037 Are the new closed-set error codes referenced in error-codes.md (`managed_pane_recreate_chain_too_deep`) **only** triggered by FR-023, or do their `details` schemas also need updating to reflect FR-022's TTL-driven failures? [Coverage, Gap] — Spec §Clarifications alignment-cleanup Q4 settled this: FR-022 TTL-driven failures do **not** mint a new error code; the operator-facing signal is the pane's `failed` state plus `failed_stage` from the FR-013 closed set. `managed_pane_recreate_chain_too_deep` is FR-023-only. +- [x] CHK038 Is there any conflict between FR-013's inline `failed_stage` enum and the legacy text "specific failed stage" used elsewhere in spec.md (Edge Cases, SC-006)? [Conflict] — Spec §Clarifications alignment-cleanup Q5 resolved this: SC-006 now reads "with a `failed_stage` from the FR-013 closed set" — no duplicate enum, no conflict. + +--- + +## Walk closure (2026-05-25) + +38/38 items satisfied. Three small cross-reference improvements applied in-place to close strict `FR-023` mentions where the docs previously cited `R4` only: + +1. **contracts/error-codes.md** — `managed_pane_recreate_chain_too_deep` heading now `(FR-023, R4)`. +2. **contracts/state-machine.md** — Recreate semantics step 1 now cites `(FR-023, R4)`. +3. **quickstart.md** — Edge-cases row now cites `(FR-023, R4)`. + +These were not blocking gaps (R4 traces to FR-023 through research.md), but explicit FR cross-refs are cheaper for reviewers than the single hop. diff --git a/specs/013-managed-session-lifecycle/checklists/alignment-recheck.md b/specs/013-managed-session-lifecycle/checklists/alignment-recheck.md new file mode 100644 index 0000000..8e24da5 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/alignment-recheck.md @@ -0,0 +1,54 @@ +# Alignment Recheck: Post-Alignment-Cleanup Verification + +**Purpose**: After the alignment-cleanup clarification round (Spec §Clarifications "Session 2026-05-24 (alignment cleanup)"), verify the 5 edits landed correctly, flag any items still open from `alignment-check.md` round 1 that were NOT addressed, and surface any new gaps introduced by the cleanup edits themselves. +**Created**: 2026-05-24 +**Closed**: 2026-05-25 (walk after `e3af4d0`) +**Feature**: [spec.md](../spec.md) — Sessions "post-plan review" + "alignment cleanup" +**Depth**: release gate. **Audience**: feature author before `/speckit.tasks`. + +## Verify alignment-cleanup edits applied (sanity check) + +- [x] CHK001 Does spec.md SC-006 reference "FR-013 closed set" rather than the abstract "specific failed stage" wording? [Consistency] — Spec SC-006: "A failed or partial layout creation produces a `degraded` (recoverable) or `failed` (non-recoverable) state with a `failed_stage` from the FR-013 closed set and a recovery action visible to the operator." +- [x] CHK002 Do FR-022, FR-023, FR-024, SC-009 each carry an inline `(traces to USx)` annotation matching the alignment-cleanup Q2 decision? [Traceability] — Verified: FR-022 (traces to US3), FR-023 (traces to US3), FR-024 (traces to US1), FR-025 (traces to US1), FR-026 (traces to US1), FR-027 (traces to US3), SC-009 (traces to US3). +- [x] CHK003 Does spec.md contain a `### Session 2026-05-24 (alignment cleanup)` sub-session under `## Clarifications` with five Q/A bullets? [Completeness] — Verified: 5 Q/A bullets covering (a) plan.md back-reference, (b) US traceability, (c) plan-review CHK036–CHK041 closure, (d) FR-022 TTL no new error code, (e) SC-006 rewording. +- [x] CHK004 Does plan.md carry a Provenance blockquote citing BOTH `Session 2026-05-24 (post-plan review)` AND `Session 2026-05-24 (alignment cleanup)`? [Traceability] — Plan §Provenance: "FR-022 ... originated from spec §Clarifications 'Session 2026-05-24 (post-plan review)'; their traceability to user stories was confirmed in spec §Clarifications 'Session 2026-05-24 (alignment cleanup)'." (Also cites pre-implement walk for FR-025/026/027.) +- [x] CHK005 Are plan-review.md CHK036–CHK041 marked `[x]` with per-item "Resolved 2026-05-24" annotations? [Completeness] — Verified: all 6 items ticked with explicit dates in the bullet body. +- [x] CHK006 Does plan-review.md include an amendment note flagging FR-022 / FR-020 / SC-009 implementation footprint for `/speckit.tasks`? [Completeness] — "Amendment note 2026-05-24 (alignment cleanup): CHK036–CHK041 closed by post-plan spec edits. Per spec §Clarifications 'Session 2026-05-24 (alignment cleanup)' Q3, the implementation work implied by FR-022 (sweep loop), FR-020 (recovery outcomes in detail surface), and SC-009 (5-second post-restart visibility) is to be captured as tasks by `/speckit.tasks`." + +## New gaps introduced by the alignment-cleanup edits + +- [x] CHK007 Are the new `(traces to USx)` annotations consistent with the rest of the FR/SC list — should ALL FRs and SCs carry similar annotations for parity, or were FR-022/023/024 and SC-009 explicitly the only system-level ones needing disambiguation? [Consistency, Gap] — Spec §Clarifications alignment-cleanup Q2 documents the rule: "The inline `(traces to USx)` annotation is reserved for these system-level requirements that lacked obvious US affinity at write-time; FR-001..FR-021 and SC-001..SC-008 do not carry the annotation by convention because their US affinity is evident from their text." Annotation now also applied to FR-025/026/027 from the pre-implement walk. +- [x] CHK008 If only the new system-level FRs/SCs carry the annotation, is the asymmetry documented (e.g., a note in §Clarifications "alignment cleanup" Q2 explaining why FR-001..FR-021 do NOT need it)? [Clarity, Gap] — Same Q2 above documents the asymmetry explicitly with the "by convention because their US affinity is evident from their text" rationale. + +## Still-outstanding items from alignment-check.md round 1 + +These items were flagged "Likely failing" in alignment-check.md but were NOT in scope of the alignment-cleanup clarify round (which only handled the 5 "Worth investigating" judgment calls). They remain open as cross-doc wording edits. + +- [x] CHK009 Does plan.md Summary explicitly name "cancel in-flight create" as out-of-scope, or rely only on the FR-018 reference? [Coverage] (alignment-check.md CHK006) — Plan §Summary: "**Out of scope for MVP**: non-tmux backends, semantic task planning, cross-host orchestration, adopted-to-managed pane promotion, and cancellation of in-flight layout creation (per spec §FR-018)." Both named explicitly and FR-018 referenced. +- [x] CHK010 Does research §R2 use "out of scope" wording aligned with FR-018, instead of "reserved for a later feature"? [Consistency] (alignment-check.md CHK008) — R2: "cancellation of an in-flight create is **out of scope for MVP** per spec §FR-018 (may be revisited in a later feature)." Both phrases present; "out of scope" is the operative wording. +- [x] CHK011 Does contracts/managed-methods.md §M3 sample response include a `recovery_reattach` `failed_stage` example, or only the general `failed_stage` field? [Consistency] (alignment-check.md CHK009) — M3 "Sample variant — recovery_reattach failure (FR-020 / SC-009)" shows the full response with `failed_stage: "recovery_reattach"` and per-pane recovery state. +- [x] CHK012 Does quickstart.md US3 daemon-restart section show the recovery-failure read path (not only the all-ready outcome)? [Coverage] (alignment-check.md CHK011) — Quickstart §US3 daemon-restart has both the happy path and the "If reattach failed for a pane" sample with `failed_stage: "recovery_reattach"`. +- [x] CHK013 Does contracts/state-machine.md Recovery section reference visibility from the M3 / M5 detail surface? [Coverage] (alignment-check.md CHK012) — state-machine.md §Recovery: "After step 5, every recovered managed-layout and managed-pane row is readable via the standard `app.managed_layout_detail` (M3) and `app.managed_pane_detail` (M5) surfaces." +- [x] CHK014 Does plan.md Technical Context cite FR-022 / FR-023 / FR-024 by ID anywhere (not only behaviorally)? [Consistency] (alignment-check.md CHK013 / CHK017 / CHK022) — Plan §Performance Goals: "FR-022 pending-managed marker TTL 5 minutes…"; §Constraints: "Recreate-chain depth bounded at 16 (FR-023, research §R4)" + "operator template/launch-profile overrides… (FR-024)". All three IDs cited. +- [x] CHK015 Does contracts/error-codes.md `managed_template_not_found` / `managed_launch_command_not_found` reference the FR-024 override-resolution rule (operator file with same `name` wins)? [Consistency] (alignment-check.md CHK025) — Both codes carry: "Resolution order (per FR-024): operator override file with the same `name` wins over the built-in default…" +- [x] CHK016 Does plan.md Performance Goals list SC-009 ≤ 5s alongside SC-001 / SC-003 / SC-008? [Completeness] (alignment-check.md CHK026) — Plan §Performance Goals: "SC-001 layout-create p95 ≤ 120s … SC-003 log-attach failure visible ≤ 10s … SC-008 daemon-restart reattach ≤ 5s … SC-009 post-restart recovery-outcome visibility ≤ 5s via M3/M5 detail surfaces". +- [x] CHK017 Does quickstart.md restart section cite SC-009 by ID and name the 5-second visibility window? [Coverage] (alignment-check.md CHK027) — "SC-009 mandates this readability within 5 seconds of the socket becoming ready — no log inspection required, the detail surface alone tells the whole recovery story." +- [x] CHK018 Does plan.md `tests/contract/` or `tests/integration/` list include coverage for SC-009 readability post-restart? [Coverage] (alignment-check.md CHK029) — Plan §Project Structure: `test_managed_recovery_visibility.py # SC-009 ≤5s post-restart visibility via M3/M5 detail surfaces (recovery_reattach failed_stage readable without log inspection)`. + +## Forward-pointing tasks queued for /speckit.tasks (from alignment-cleanup Q3) + +- [x] CHK019 Will the FR-022 pending-managed marker sweep loop be captured as an implementation task by `/speckit.tasks` (per the plan-review.md amendment note)? [Coverage] — Captured as T012 (helper) + T050 (60s periodic wiring); tasks.md §Phase 6 polish. +- [x] CHK020 Will the FR-020 detail-surface readability (recovery outcome fields in M3/M5 response shapes) be captured as an implementation task by `/speckit.tasks`? [Coverage] — Captured as T049 (impl: "Implement detail-surface readability for recovery outcomes in `view_models.py` and the M3/M5 response shapes") + T039 (test: "covering SC-009 (recovery outcome readable from `app.managed_layout_detail` and `app.managed_pane_detail`…)"). +- [x] CHK021 Will the SC-009 ≤ 5-second post-restart visibility test be captured for `/speckit.tasks`? [Coverage] — Captured as T039 (functional) + T056 (perf SLA verification: "Verify SC-009 (≤5s post-restart recovery-outcome visibility from detail surface) is measurable in `test_managed_recovery_visibility.py`"). + +## Cross-doc traceability under both Clarifications sessions + +- [x] CHK022 Does research.md cite the post-plan and alignment-cleanup Clarifications sessions as the documented origin of FR-022/023/024/SC-009 + the SC-006 rewording? [Traceability] — research.md header: "**Spec back-reference**: Origin of FR-022 / FR-023 / FR-024 / SC-009 is spec §Clarifications 'Session 2026-05-24 (post-plan review)'; user-story traceability + SC-006 rewording are recorded in spec §Clarifications 'Session 2026-05-24 (alignment cleanup)'." +- [x] CHK023 Does data-model.md acknowledge the FR-022 TTL behavior with a note in the recovery / pending-managed marker section? [Coverage] — data-model.md DDL §Notes bullet on FR-022 TTL sweep + ManagedPane field reference for `pending_marker_token` cites "FR-022 TTL sweep target". +- [x] CHK024 Are the SC-009 5-second budget and the FR-022 5-minute TTL consistent with each other — different time horizons, no overlap or conflict? [Consistency] — Different horizons: FR-022's 5-min TTL bounds *creating-state residue* in normal operation; SC-009's 5-sec budget bounds *recovery-outcome visibility* after daemon restart. The two budgets never overlap in scope (one is steady-state, one is cold-start). SC-009 self-states "Begins after SC-008's reattach phase completes; SC-008 and SC-009 are sequential, not parallel, so the worst-case cold-start observability budget is SC-008 + SC-009 ≤ 10 seconds" — explicit sequencing. + +--- + +## Walk closure (2026-05-25) + +24/24 items satisfied. No edits required during this walk — every item was already addressed by prior alignment commits (`ca67caf`, `817fb48`, `a0ab4a0`, `e7f2c89`, `bad699a`, `39dbb5f`, `e3af4d0`) and verified clean by `/speckit.analyze` Pass 15. diff --git a/specs/013-managed-session-lifecycle/checklists/api.md b/specs/013-managed-session-lifecycle/checklists/api.md new file mode 100644 index 0000000..e71b0cc --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/api.md @@ -0,0 +1,58 @@ +# API Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that the daemon socket API contract requirements for managed-layout operations are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Requirement Completeness + +- [x] CHK001 Are request/response schemas specified for the create-layout operation? [Gap, Spec §FR-001] +- [x] CHK002 Are request/response schemas specified for the remove-managed-pane operation? [Gap, Spec §FR-010] +- [x] CHK003 Are request/response schemas specified for the recreate-managed-pane operation? [Gap, Spec §FR-011] +- [x] CHK004 Are request/response schemas specified for listing managed layouts and managed panes? [Gap, Spec §FR-005] +- [x] CHK005 Is the structured error response specified for `managed_session_name_conflict` (code, message, hint)? [Gap, Spec §FR-016] +- [x] CHK006 Are error response codes/strings enumerated for every failure mode listed in FR-013 and FR-016? [Completeness] +- [x] CHK007 Is the contract for the lifecycle event stream defined (event types, payload shape, ordering)? [Gap, Spec §FR-015] +- [x] CHK008 Are API versioning requirements specified for the new managed-layout operations? [Gap] +- [x] CHK009 Is the API contract for cancellation of an in-flight create-layout defined? [Gap, Scenario Coverage] +- [x] CHK010 Is the contract for re-attaching to surviving panes after daemon restart specified (operator-driven, automatic, hybrid)? [Gap, Spec §FR-020] +- [x] CHK011 Are pagination/filtering requirements specified for layout listing and event listing? [Gap] +- [x] CHK012 Is the contract for the predecessor_id linkage queryable through the API (e.g., GET predecessor chain)? [Gap, Spec §FR-011] +- [x] CHK013 Are the contract requirements specified for the `promoted_from_adopted` transition stub (e.g., not-implemented response in MVP)? [Gap, Spec §FR-007] + +## Requirement Clarity + +- [x] CHK014 Is idempotency-key behavior defined for create-layout (header name, scope, lifetime)? [Clarity, Spec §FR-014] +- [x] CHK015 Is the contract behavior under FR-019 serialization defined (block-and-wait, queue-and-poll, immediate-reject-with-retry-after)? [Clarity, Spec §FR-019] +- [x] CHK016 Is the pending-managed-marker visibility specified for API consumers (part of the pane resource, separate field, hidden)? [Clarity, Gap, Spec §FR-014] +- [x] CHK017 Are timing/SLA requirements specified for API responses (synchronous vs async create-layout)? [Clarity, Gap, Spec §SC-001] +- [x] CHK018 Are the API authentication/identification requirements specified or explicitly absent for MVP? [Clarity, Spec §Assumptions] + +## Requirement Consistency + +- [x] CHK019 Are the contracts consistent between thin client → daemon and app → daemon for the same operations? [Consistency, Spec §FR-017] +- [x] CHK020 Are the contracts for distinguishing managed vs adopted agents specified consistently across endpoints (FR-005)? [Consistency] +- [x] CHK021 Are deprecation/migration requirements specified should any FEAT-011 contract surface change? [Gap] + +## Scenario Coverage + +- [x] CHK022 Is the contract behavior defined for the bench-container disappearance edge case (long-poll error, immediate failure, retry-after)? [Coverage, Gap, Spec §Edge Cases] +- [x] CHK023 Are concurrent-request semantics specified for non-create operations (remove, recreate) in addition to create-layout? [Coverage, Spec §FR-019] +- [x] CHK024 Is the contract for surfacing the `degraded` reason (which subsystem degraded: log, command, registration) specified? [Coverage, Gap, Spec §FR-013] + +## Edge Case Coverage + +- [x] CHK025 Is the contract behavior specified when the operator retries with the same idempotency key but different inputs? [Gap, Spec §FR-014] +- [x] CHK026 Is the contract behavior specified for remove of a pane that is currently in `creating` state? [Gap] +- [x] CHK027 Is the contract behavior specified for recreate of a pane whose predecessor record is missing (e.g., pruned in a future version)? [Gap, Spec §FR-021] + +## Non-Functional API + +- [x] CHK028 Are response-size or pagination requirements specified for high-volume audit/event queries (FR-021 indefinite retention)? [Gap] +- [x] CHK029 Are observability requirements specified for the API contract (request-id propagation, log fields)? [Gap, Cross-ref: observability.md] + +--- + +## Walk closure (2026-05-25) + +29/29 items resolved by contracts/managed-methods.md (M1-M8 with full request/response schemas) + contracts/error-codes.md (13 closed-set codes with details schemas) + R10 (idempotency) + R12 (peer scoping) + FR-016 (input validation) + FR-018 (cancel-in-flight out of scope). Pre-implement walk Clarifications session (4) closed the remaining open items from CHECKLIST_WALK.md (topic D input validation + topic B partial-failure rollback + topic E event ordering). diff --git a/specs/013-managed-session-lifecycle/checklists/concurrency.md b/specs/013-managed-session-lifecycle/checklists/concurrency.md new file mode 100644 index 0000000..459bdc0 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/concurrency.md @@ -0,0 +1,48 @@ +# Concurrency Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that concurrency requirements (serialization, locking, races, ordering) are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Serialization Scope + +- [x] CHK001 Are concurrency requirements specified for layout-creation against the same container (FR-019)? [Completeness, Spec §FR-019] +- [x] CHK002 Are concurrency requirements specified for layout-creation across different containers (must they also serialize, or run in parallel)? [Gap, Spec §FR-019] +- [x] CHK003 Are concurrency requirements specified for remove + recreate ordering on the same managed pane? [Gap] +- [x] CHK004 Are concurrency requirements specified for two operators issuing the same operation at the same time on the same pane (e.g., two removes, two recreates)? [Gap] + +## Locking Model + +- [x] CHK005 Is the locking model specified for the per-container serialization (mutex, semaphore, queue)? [Gap, Spec §FR-019] +- [x] CHK006 Are deadlock-prevention requirements specified (per-container locks must release on operator disconnect / crash)? [Gap, Spec §FR-019] +- [x] CHK007 Are starvation-prevention requirements specified for the FR-019 wait queue (FIFO ordering, max wait time, fairness)? [Gap] +- [x] CHK008 Is lock granularity specified (per-container vs per-layout vs per-pane)? [Clarity, Spec §FR-019] + +## Race Conditions + +- [x] CHK009 Are concurrency requirements specified for the scan + creation flow interaction (FR-014 marker is the mitigation — but what is the low-level race set)? [Coverage, Spec §FR-014] +- [x] CHK010 Are concurrency requirements specified for the daemon's handling of overlapping retries on the same pending-managed layout? [Gap, Spec §FR-014] +- [x] CHK011 Are concurrency requirements specified for the predecessor_id chain (two simultaneous recreations of the same predecessor)? [Gap, Spec §FR-011] +- [x] CHK012 Are race conditions enumerated for the periodic scan vs creation completion (low-level race set)? [Coverage] +- [x] CHK013 Are concurrency requirements specified for the case where tmux itself executes commands asynchronously vs the daemon's expected ordering? [Gap] + +## Recovery & Restart + +- [x] CHK014 Are concurrency requirements specified for daemon-restart recovery vs an in-flight operator request at the moment of restart? [Gap, Spec §FR-020] +- [x] CHK015 Are concurrency requirements specified for resumption of partially-serialized work after a daemon crash? [Gap, Spec §FR-019, FR-020] + +## Event Ordering + +- [x] CHK016 Are concurrency requirements specified for the lifecycle event stream (consumer ordering guarantees per pane, per layout)? [Gap, Spec §FR-015] +- [x] CHK017 Are concurrency requirements specified for the audit/history append-only semantics under concurrent writers? [Gap, Spec §FR-021] + +## Consistency + +- [x] CHK018 Are concurrency requirements consistent with the assumption "MVP authorization is socket-access based" (single operator typical, but the requirements still cover concurrent calls)? [Consistency, Spec §Assumptions] +- [x] CHK019 Are concurrency safety properties testable from the operator surface alone? [Measurability] + +--- + +## Walk closure (2026-05-25) + +19/19 items resolved by R2 (per-container threading.Lock, FIFO via CPython contention semantics) + FR-019 (per-container serialization) + FR-014 + R1 (pending-managed marker race mitigation) + FR-015 amendment (per-pane FIFO + per-layout FIFO event ordering, from pre-implement walk topic E) + FR-027 + managed_pane_concurrent_recreate (concurrent recreate, from pre-implement walk topic F). diff --git a/specs/013-managed-session-lifecycle/checklists/configuration.md b/specs/013-managed-session-lifecycle/checklists/configuration.md new file mode 100644 index 0000000..2f625e1 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/configuration.md @@ -0,0 +1,43 @@ +# Configuration Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that configuration requirements (templates, launch command profiles, paths, defaults, validation) are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Schema Definition + +- [x] CHK001 Are the standard templates' configuration shapes specified (file format, location, schema)? [Gap, Spec §FR-001] +- [x] CHK002 Are the standard templates' default contents (1 master + 2 slaves, 2 masters + 2 slaves) specified field-by-field? [Gap, Spec §FR-001] +- [x] CHK003 Are the launch command profile configuration shapes specified (file format, location, fields)? [Gap, Spec §FR-002] +- [x] CHK004 Are configuration requirements specified for label-pattern templates (FR-003) — is the pattern configurable per template? [Gap, Spec §FR-003] + +## Defaults & Overrides + +- [x] CHK005 Are configuration overrides specified (per-container, per-layout-instance, per-pane)? [Gap] +- [x] CHK006 Are defaults specified for omitted configuration fields (default capability, default label pattern, default working directory)? [Gap] +- [x] CHK007 Are the precedence rules between operator-supplied launch commands and template-default commands specified? [Clarity, Spec §FR-002] + +## Validation + +- [x] CHK008 Are validation requirements specified for configuration before layout creation (required fields, command syntax, label-pattern syntax)? [Gap] +- [x] CHK009 Are validation requirements specified for the tmux session name input (length, character set)? [Gap, Spec §FR-016] + +## Lifecycle + +- [x] CHK010 Are configuration reload requirements specified (does the daemon hot-reload, or restart-only)? [Gap] +- [x] CHK011 Are configuration migration requirements specified across versions of the template schema? [Gap, Cross-ref: deployment.md] +- [x] CHK012 Are configuration requirements specified for the durable storage path used by FR-020? [Gap, Spec §FR-020] +- [x] CHK013 Are configuration requirements specified for the canonical local-socket path (FR-017)? [Gap, Spec §FR-017] +- [x] CHK014 Are configuration requirements specified for the scan interval that interacts with the pending-managed marker (FR-014)? [Gap, Spec §FR-014] +- [x] CHK015 Are configuration requirements specified for the audit retention behavior in MVP (file location, format) even though retention is indefinite? [Gap, Spec §FR-021] + +## Tmux Adapter + +- [x] CHK016 Are configuration requirements specified for which tmux pane-control flags AgentTower must support? [Gap] +- [x] CHK017 Are configuration requirements specified for tmux server selection (default socket vs custom)? [Gap] + +--- + +## Walk closure (2026-05-25) + +17/17 items resolved by R8/R9 (template + launch profile YAML schemas) + FR-024 (operator override with name-wins precedence and no-auto-create policy from pre-implement walk topic H) + spec §Assumptions (canonical YAML paths) + FR-016 (input validation from pre-implement walk topic D) + examples/managed_templates/ and examples/launch_commands/ as discoverable references (T003). diff --git a/specs/013-managed-session-lifecycle/checklists/coverage-alignment.md b/specs/013-managed-session-lifecycle/checklists/coverage-alignment.md new file mode 100644 index 0000000..ddd8d6b --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/coverage-alignment.md @@ -0,0 +1,80 @@ +# Coverage & Alignment Verification: Exhaustive Breadth + Post-Implementation Alignment + +**Purpose**: A meta-checklist ("unit tests for English") that verifies (a) the FEAT-013 checklist set is **wide** — every requirement-quality domain the feature touches is represented — and (b) the requirements (spec / plan / tasks / contracts / data-model) are **fully aligned** with each other AND with what implementation + the deep-swarm code review + the FEAT-014 merge revealed. The 21 prior checklists all closed `2026-05-25`, *before* implementation, the 19-finding deep review, and the `main` merge — so this file re-tests requirement quality against everything learned since. + +**Created**: 2026-06-01 +**Feature**: [spec.md](../spec.md) · [plan.md](../plan.md) · [tasks.md](../tasks.md) · [data-model.md](../data-model.md) · [contracts/](../contracts/) +**Depth**: release gate (maximum). **Audience**: feature owner before opening the PR to `main`. +**Convention**: `[x]` = requirement quality is adequate (evidence inline); `[ ]` = genuine requirement-quality gap to resolve (the implementation may already be correct + tested, but the *spec/requirement English* under-specifies it). Each `[ ]` notes the originating review finding where applicable. + +## Coverage Breadth — is the checklist set WIDE? (meta-coverage) + +- [x] CHK001 Is every requirement-quality domain the feature touches represented by a checklist file? [Coverage] — Present: ux, api, data-model, security, performance, accessibility, error-handling, observability, integration, configuration, idempotency, testing-strategy, deployment, concurrency, requirements (cross-cutting) + lifecycle alignment/readiness files. No applicable domain is missing. +- [x] CHK002 Is the **concurrency** domain covered as a first-class checklist, given the feature's per-container serialization + shared-conn + background-thread surface? [Coverage] — `concurrency.md` exists; the deep review's concurrency findings (#3 capacity race, #13 shutdown, #17 stale read) confirm this domain was correctly identified as in-scope. +- [x] CHK003 Is there a checklist domain for **multi-tenant / cross-container isolation** distinct from generic `security.md`? [Gap, Coverage] — The R12 peer-scoping trust model (deep-review #1 CRITICAL spoof, #16 id-normalization, #8 cross-tenant detail leakage) is a cohesive isolation concern that no single checklist gates end-to-end; consider an `isolation.md` (or an explicit R12 section in `security.md`). +- [x] CHK004 Does a cross-cutting `requirements.md` cover Completeness / Clarity / Consistency / Acceptance-Criteria / Dependencies / Ambiguities? [Coverage] — Present (52 items) plus `alignment-check.md` / `alignment-recheck.md` for inter-artifact consistency. + +## Cross-Artifact Alignment — do spec ↔ plan ↔ tasks ↔ contracts ↔ data-model still agree? + +- [x] CHK005 Does **data-model.md**'s `ux_managed_pane_tmux_target` uniqueness scope match **FR-016**'s per-container conflict semantics? [Consistency, Conflict, Spec §FR-016, data-model.md §indexes] — Resolved in code/DDL by review #9 (index now keyed `(container_id, tmux_session_name, tmux_pane_index)`), but FR-016's prose says "the target tmux session name already exists in the selected container" without stating the uniqueness key is container-scoped — verify the spec text and data-model DDL now state the same scoping explicitly. +- [x] CHK006 Is the **app-contract version** referenced by FEAT-013's contracts consistent with the post-merge `1.1` that `app.managed_*` responses now emit? [Consistency, Conflict] — FEAT-014 bumped the envelope `1.0`→`1.1`; FEAT-013 handlers inherit it (test_managed_dispatch updated). Confirm `contracts/managed-methods.md` doesn't pin a stale `1.0` in any example envelope. +- [x] CHK007 Do **tasks.md** entries trace the post-review production-wiring work (T057/T057b/T058/T059) and the 19 review fixes to their requirements? [Traceability] — tasks.md T057b/T058/T059 bodies record the wiring + GitHub issues #30/#32/#33; the 6 review-fix commits reference findings. (Spec amendments for the gaps below are NOT yet captured.) +- [x] CHK008 Are the deep-review fixes that changed observable behavior reflected back into the **spec/plan**, or only into code + tasks? [Completeness, Gap] — The fixes (e.g., synchronous conflict pre-check, atomic capacity, kill idempotency) live in code + tests + tasks.md but the spec FRs were not amended; decide whether the spec is the source of truth that must be updated for one-hop auditability. + +## Post-Review Requirement-Quality Gaps — does the SPEC specify what the code had to get right? + +*(Each item below corresponds to a confirmed deep-review finding. The code is fixed + tested; the question is whether the requirement English specified the behavior — under-specification is why the defect was possible.)* + +- [x] CHK009 Does the spec specify that the **R12 bench-peer identity MUST derive from an unspoofable signal** (kernel cgroup) and be **registry-verified**, NOT from a container-suppliable value (`/etc/hostname`)? [Gap, Security, Spec §FR-016/§R12] — Review #1 (CRITICAL): the spoofable gate shipped because no requirement pinned the trust model. The clarification ("bench-container peer MAY only target its own container") omits HOW identity is established. +- [x] CHK010 Is the **short(12)/full(64)-char container-id normalization** for peer-identity comparison specified as a requirement? [Gap, Clarity] — Review #16: legitimate peers were denied because the spec never stated identity comparison must normalize id forms against the registry. +- [x] CHK011 Is **FR-025**'s 40-layout cap specified as **atomic under concurrent cross-container creation**, or only as a sequential count? [Clarity, Gap, Spec §FR-025] — Review #3: "MUST return capacity_exceeded rather than silently fail or queue" doesn't say the count↔insert is atomic; the non-atomic check overshot the cap under concurrency. +- [x] CHK012 Does **FR-010** specify that killing the tmux pane is **idempotent when the pane is already gone** (already-exited pane = success, not failure)? [Gap, Exception Flow, Spec §FR-010] — Review #5: the documented idempotent-remove contract lived only in the adapter protocol docstring, not in FR-010. +- [x] CHK013 Do **FR-011 / FR-027** specify **idempotency-key replay semantics for recreate** (parity with create's R10)? [Gap, Consistency, Spec §FR-011/§FR-027] — Review #10: contracts/managed-methods said "same as create," but no FR stated recreate honors idempotency_key, so a safe retry surfaced as concurrent_recreate. +- [x] CHK014 Does **FR-024** require **synchronous validation of a template's `default_launch_command_ref`** at create time (parity with explicit overrides)? [Gap, Spec §FR-024] — Review #14: a missing template-default profile failed only later in the background spawn, not synchronously per the M1 contract. +- [x] CHK015 Does **FR-020 / FR-026** specify that a **per-container recovery failure must not abort reconcile for other containers**, and that pane→failed transitions keep the **layout aggregate consistent**? [Gap, Recovery, Spec §FR-020/§FR-026] — Review #7: a raising list-panes for one container left already-processed layouts with stale aggregate state. +- [x] CHK016 Does **FR-022** specify that the TTL **sweep recomputes the parent layout's aggregate state** when it fails a stale pane (consistency with FR-026)? [Gap, Consistency, Spec §FR-022/§FR-026] — Review #12: sweep failed panes without updating the layout row, leaving detail surfaces inconsistent. +- [x] CHK017 Is the **host_only error `details` shape required to be empty** (no resolved-peer / foreign-container id disclosure)? [Consistency, Conflict, Spec §FR-016, contracts/error-codes §FR-034a] — Review #8: FR-034a (a FEAT-011 contract) requires `details = {}`, but FEAT-013's host_only requirement doesn't restate it, and the handlers leaked ids (now fixed) — verify the requirement cross-references FR-034a. +- [x] CHK018 Is **FR-013**'s 30s per-stage timeout specified as a hard requirement (not just a default)? [Acceptance Criteria, Spec §FR-013] — FR-013 states each stage "MUST time out after 30 seconds" with the retry policy. (Review #2 was a *wiring* gap — the requirement itself is well-specified and measurable.) +- [x] CHK019 Does any requirement specify the **clean-shutdown ordering** for in-flight managed background work (spawn threads / sweep) relative to closing the shared DB connection? [Gap, Resilience] — Review #13: a shutdown race was an implementation concern with no governing requirement; decide whether this belongs in the spec or is acceptably an implementation invariant in plan.md. + +## Scenario-Class Completeness — are all five classes specified for the lifecycle? + +- [x] CHK020 Are **Primary** create/registration/log-attach flows specified with measurable criteria? [Coverage, Primary] — FR-001..FR-006, SC-001..SC-004. +- [x] CHK021 Are **Alternate** flows (override templates/profiles, 2m+2s template, idempotency replay) specified? [Coverage, Alternate] — FR-024, FR-001, FR-014/R10. +- [x] CHK022 Are **Exception/Error** flows specified with the closed `failed_stage` set + degraded-vs-failed rules? [Coverage, Exception, Spec §FR-013] — FR-013, FR-026, SC-006. +- [x] CHK023 Are **Recovery** flows (boot reconcile, reattach, detail-surface visibility) specified with budgets? [Coverage, Recovery, Spec §FR-020/§SC-008/§SC-009] — present and measurable. +- [x] CHK024 Are **Recovery** flows complete for the **resumed-creating** disposition — does any requirement state whether a pane that survived in tmux but never registered is re-driven, or only swept to failed at TTL? [Gap, Recovery] — Review #11: the implementation does NOT re-drive it (docs corrected); the spec/state-machine is silent on this disposition's terminal behavior. +- [x] CHK025 Are **Non-Functional** requirements (capacity, ordering, retention/redaction, local-first) specified and measurable? [Coverage, NFR] — FR-015 (FIFO), FR-017, FR-021 (redaction), FR-025 (capacity), SC-008/009 (timing). + +## Ambiguities, Conflicts & Measurability (residual) + +- [x] CHK026 Is the `failed_stage` enum stated once as a closed set and referenced (not duplicated) elsewhere? [Consistency, Spec §FR-013/§SC-006] — SC-006 references "the FR-013 closed set" per the alignment-cleanup round. +- [x] CHK027 Is **FR-021**'s env-redaction policy testable for the events that actually carry env/argv, and is it stated where (today) no event carries env values? [Measurability, Spec §FR-021] — research §R-021 notes the redaction rule is forward-looking guard-rail; confirm the requirement marks it as such so a reviewer doesn't expect redaction on events that omit env entirely. +- [x] CHK028 Can **SC-008 / SC-009** be objectively measured as sequential budgets? [Measurability, Spec §SC-008/§SC-009] — SC-009 explicitly states the budgets are sequential (≤10s combined). +- [x] CHK029 Are the **GitHub issues** (#30 recreate-residual, #32, #33) that were filed for deferred production-wiring resolved-or-tracked in the spec/plan handoff now that T057b/T058/T059 are complete? [Traceability, Gap] — tasks.md marks the tasks done and "Closes #3x"; verify the issues are actually closed and no spec-level follow-up (e.g., the #11 register-only continuation) is left undocumented. +- [x] CHK030 Is a requirement & acceptance-criteria ID scheme established and used consistently across artifacts? [Traceability] — FR-/SC-/NFR- IDs used throughout spec, plan, tasks, contracts, and prior checklists. + +## Verdict Summary + +- **Wide (breadth):** PASS — all standard domains covered; the one recommendation (CHK003) is **done**: `isolation.md` now gates the R12 cross-container trust model end-to-end. +- **Deep (alignment):** PASS after the 2026-06-01 alignment round — all flagged spec/contract/doc gaps are closed (see Resolution Log). Spec↔code is now one-hop traceable. + +## Resolution Log (2026-06-01) + +All items closed by a doc-only alignment round (no code changed — the implementation already satisfied each clause; this made the requirement English match the as-built, reviewed behavior). Recorded in spec §Clarifications "Session 2026-06-01 (post-implementation review alignment)". + +- CHK003 → new `checklists/isolation.md` (R12 trust model, 14 items). +- CHK005 → FR-016 now states tmux-session-name uniqueness is per-container; data-model DDL already keyed `(container_id, tmux_session_name, tmux_pane_index)`. +- CHK006 → `contracts/managed-methods.md` example envelopes + prose updated `1.0`→`1.1` (FEAT-014 envelope bump; FEAT-013 handlers inherit it). +- CHK008 → umbrella; closed by the FR amendments below. +- CHK009 / CHK010 / CHK017 → FR-016 R12 sub-clause: unspoofable cgroup identity, registry-canonicalized, 12/64-char normalization, fail-closed, `host_only details = {}` (FR-034a). +- CHK011 → FR-025: cap enforced atomically (count+insert in one write transaction). +- CHK012 → FR-010: kill is idempotent for an already-gone pane. +- CHK013 → FR-011 + FR-027: recreate idempotency-key replay + non-terminal-successor rule; state-machine Recreate-semantics note. +- CHK014 → FR-024: template `default_launch_command_ref` resolved synchronously at create. +- CHK015 → FR-020: per-container recovery isolation + atomic pane/aggregate write. +- CHK016 → FR-022: sweep recomputes the parent layout aggregate. +- CHK019 → state-machine.md Recovery: clean-shutdown ordering recorded as a daemon implementation invariant. +- CHK024 → FR-020 + state-machine.md: resumed-`creating` pane not re-driven at boot; TTL sweep is its terminal transition. +- CHK027 → FR-021: redaction rule marked a forward-looking guard-rail (no MVP event carries env values). +- CHK029 → tasks T057b/T058/T059 complete; commits carry "Closes #30/#32/#33" (auto-close on PR merge — live state unverifiable now due to GitHub API rate-limit); the #11 register-only continuation is documented in FR-020 + state-machine, so no undocumented follow-up remains. diff --git a/specs/013-managed-session-lifecycle/checklists/data-model.md b/specs/013-managed-session-lifecycle/checklists/data-model.md new file mode 100644 index 0000000..09e72de --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/data-model.md @@ -0,0 +1,68 @@ +# Data Model Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that data-model and lifecycle-state-machine requirements (entities, attributes, transitions, constraints, durability) are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Entity Attribute Completeness + +- [x] CHK001 Are all attributes of `Managed Layout` enumerated (id, template_id, container_id, state, created_at, updated_at, owner, …)? [Completeness, Spec §Key Entities] +- [x] CHK002 Are all attributes of `Managed Pane` enumerated (id, layout_id, role, capability, label, launch_command_ref, state, predecessor_id, pending_marker, tmux_pane_ref, created_at, …)? [Completeness, Spec §Key Entities] +- [x] CHK003 Are all attributes of `Launch Command Profile` enumerated (id, name, command, env, working_dir, …)? [Completeness, Spec §Key Entities] +- [x] CHK004 Are all attributes of `Lifecycle Event` enumerated (id, layout_id, pane_id, event_type, timestamp, payload, actor)? [Completeness, Spec §Key Entities] +- [x] CHK005 Are required-vs-optional field markers specified for every entity attribute? [Completeness] +- [x] CHK006 Are `Adopted Agent` attributes within FEAT-013's scope clarified (delegated to FEAT-006, partially overridden, fully owned here)? [Clarity, Dependency, Spec §Key Entities] + +## State Machine Coverage + +- [x] CHK007 Is the lifecycle state transition graph fully enumerated (every valid transition from every state)? [Coverage, Gap, Spec §FR-007] +- [x] CHK008 Are illegal lifecycle state transitions enumerated (e.g., `removed → ready` without a recreate; `failed → ready` without a recreate)? [Coverage, Gap] +- [x] CHK009 Is the state of the predecessor record at the moment of recreation defined (must be `removed` or `failed`; not `ready` or `creating`)? [Clarity, Spec §FR-011] +- [x] CHK010 Are the relationships between layout-level state and pane-level state defined (e.g., a layout is `ready` iff all panes are `ready` or `degraded`)? [Gap] +- [x] CHK011 Is the boundary between `creating` and `ready` defined precisely (at pane spawn, at first prompt, at registration)? [Clarity, Spec §FR-007] +- [x] CHK012 Is the data-model representation of the `promoted_from_adopted` reserved transition specified (extra optional field, sentinel value, separate table)? [Gap, Spec §FR-007] + +## Constraints & Identity + +- [x] CHK013 Is the field type for `predecessor_id` defined (UUID, opaque string, integer)? [Gap] +- [x] CHK014 Is the label uniqueness constraint scope storage specified (database constraint, application-level check, both)? [Clarity, Spec §FR-003] +- [x] CHK015 Are unique constraints enumerated (layout_id PK, pane_id PK, label uniqueness per container, tmux session-name uniqueness)? [Completeness] +- [x] CHK016 Is the cardinality between Managed Layout and Managed Pane specified (1:N enforced)? [Completeness] +- [x] CHK017 Is the cardinality between Managed Pane and Lifecycle Event specified (1:N append-only)? [Completeness] +- [x] CHK018 Is the relationship between Managed Pane and the underlying tmux pane identifier specified (tmux pane_id stored, recomputed, both)? [Clarity, Spec §FR-007] + +## Durability & Persistence + +- [x] CHK019 Is the data-at-rest requirement specified (sqlite, json file, in-memory only)? [Gap, Spec §FR-020] +- [x] CHK020 Is the durability boundary specified for FR-020 (which records must be durable, which may be in-memory)? [Clarity, Spec §FR-020] +- [x] CHK021 Is the retention model for `Lifecycle Event` storage specified (indefinite per FR-021, but is the storage shape and growth profile specified)? [Clarity, Spec §FR-021] +- [x] CHK022 Are timestamp requirements specified (UTC, monotonic, system-clock-only, RFC3339)? [Gap] +- [x] CHK023 Is the data model robust against partial writes during the failure of a layout-creation transaction (write-ahead, idempotent commit)? [Gap, Spec §FR-014] + +## Schema Evolution + +- [x] CHK024 Are schema migration requirements specified for adding `predecessor_id`, pending-managed marker, etc.? [Gap] +- [x] CHK025 Are forward/backward compatibility requirements specified for the durable store across daemon upgrades? [Gap, Cross-ref: deployment.md] + +## Consistency + +- [x] CHK026 Is the data model consistent with the FEAT-011 agent registry (same id space, FK constraints)? [Consistency, Dependency] +- [x] CHK027 Are there any data-model conflicts with the `Adopted Agent` storage owned by FEAT-006? [Conflict, Dependency] +- [x] CHK028 Does the data model align with FR-008's "same registry/queue/route/event/health/direct-send surfaces" claim (no parallel managed-only tables)? [Consistency, Spec §FR-008] + +## Edge Cases + +- [x] CHK029 Is the recreate-chain depth (predecessor → predecessor → …) bounded or explicitly unbounded? [Gap, Spec §FR-011] +- [x] CHK030 Is the data shape for "failed stage" (FR-013) defined as an enum or free-text? [Clarity, Spec §FR-013] +- [x] CHK031 Is the pending-managed marker's representation specified (field on Managed Pane, separate record, tmux pane title prefix)? [Gap, Spec §FR-014] + +## Non-Functional + +- [x] CHK032 Are concurrency-safety requirements specified at the data model level (row-level locks, optimistic concurrency, transaction isolation)? [Gap, Spec §FR-019] +- [x] CHK033 Are integrity-check / fsck-style requirements specified for the durable store on daemon boot (FR-020)? [Gap, Spec §FR-020] + +--- + +## Walk closure (2026-05-25) + +33/33 items resolved by data-model.md DDL (all entity attributes + CHECK constraints + partial unique indexes + RFC3339 timestamps + WAL-mode concurrency from research §R2) + state-machine.md (full transition graph including illegal transitions + recreate semantics + recovery rules) + FR-023 + R4 (chain depth bounded at 16) + FR-022 + R5 (5-min TTL sweep) + FEAT-001's in-Python migration registry (single forward migration v9 with idempotent IF NOT EXISTS). diff --git a/specs/013-managed-session-lifecycle/checklists/deployment.md b/specs/013-managed-session-lifecycle/checklists/deployment.md new file mode 100644 index 0000000..2c3f21b --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/deployment.md @@ -0,0 +1,39 @@ +# Deployment & Rollback Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that deployment, upgrade, rollback, and first-run requirements are complete, clear, consistent, and measurable for this feature. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Migration & Schema + +- [x] CHK001 Are deployment requirements specified for the schema migration that adds `predecessor_id`, pending-managed marker, and any new tables/fields? [Gap, Cross-ref: data-model.md] +- [x] CHK002 Are rollback requirements specified for the schema migration (down-migration safety)? [Gap] +- [x] CHK003 Are backwards-compatibility requirements specified with existing FEAT-011 contracts during a phased rollout? [Gap] + +## First-Run & Install + +- [x] CHK004 Are deployment requirements specified for the durable storage initialization (empty state, first-run behavior, schema seeding)? [Gap, Spec §FR-020] +- [x] CHK005 Are deployment requirements specified for the local-socket path / permissions during install? [Gap, Spec §FR-017] +- [x] CHK006 Are deployment requirements specified for configuration file installation (templates, launch profiles, defaults)? [Gap, Cross-ref: configuration.md] + +## Daemon Upgrade / Restart + +- [x] CHK007 Are deployment requirements specified for the daemon restart sequence (graceful shutdown, in-flight create-layout handling)? [Gap, Spec §FR-020] +- [x] CHK008 Are deployment requirements specified for surviving daemon upgrades while in-flight layouts exist? [Gap, Recovery Flow] +- [x] CHK009 Are rollback requirements specified if a daemon upgrade introduces breaking changes to the managed-layout contract? [Gap] +- [x] CHK010 Are post-deployment audit requirements specified to verify reattach completeness (FR-020)? [Gap] + +## Validation + +- [x] CHK011 Are deployment-time validation requirements specified (smoke test, configuration sanity check, durable-store integrity check)? [Gap] +- [x] CHK012 Are requirements specified for cleaning up stale tmux panes / pending-managed markers left over from a prior failed deployment? [Gap] + +## Observability of Deploys + +- [x] CHK013 Are observability requirements specified for the deploy/restart path itself (events emitted on reattach, FR-020)? [Gap, Cross-ref: observability.md] + +--- + +## Walk closure (2026-05-25) + +13/13 items resolved by FEAT-001's in-Python migration registry pattern (idempotent CREATE TABLE IF NOT EXISTS, single forward migration v9 — see T002/T007) + FR-020 + recovery.py (boot reconcile before socket accepts requests) + FR-022 + R5 (boot-time pending-marker GC) + FR-024 (no auto-create under override directories from pre-implement walk topic H). Down-migration and cross-version compatibility are constitution-level invariants documented in data-model.md §Migration & rollout (no down-migration in MVP). diff --git a/specs/013-managed-session-lifecycle/checklists/error-handling.md b/specs/013-managed-session-lifecycle/checklists/error-handling.md new file mode 100644 index 0000000..dfbf720 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/error-handling.md @@ -0,0 +1,53 @@ +# Error Handling & Resilience Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that error-handling and resilience requirements (failure categorization, recovery, rollback) are complete, clear, consistent, and measurable across the layout-creation, registration, log-attach, remove, and recreate pipelines. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Failure Categorization + +- [x] CHK001 Are error categories enumerated (transient/recoverable vs permanent/non-recoverable)? [Completeness, Spec §FR-013] +- [x] CHK002 Is the mapping from each error category to the resulting lifecycle state (`degraded` vs `failed`) specified for every error type? [Coverage, Spec §FR-013] +- [x] CHK003 Are error requirements specified for surfacing the failed stage to the operator with enough granularity for action (FR-013)? [Clarity, Spec §FR-013] +- [x] CHK004 Are requirements specified for distinguishing `degraded` from `failed` to the operator via a single observable signal? [Clarity, Spec §FR-007] + +## Pipeline Coverage + +- [x] CHK005 Are error handling requirements specified for every step of the layout creation pipeline (pane create, command launch, registration, log attach)? [Completeness, Spec §FR-013] +- [x] CHK006 Are timeout requirements specified for each launch-command, log-attach, registration step? [Gap] +- [x] CHK007 Are retry requirements specified for transient failures (network blip during scan, tmux command failure)? [Gap] +- [x] CHK008 Are error requirements specified for the case where `tmux kill-pane` fails during remove (FR-010)? [Gap, Spec §FR-010] +- [x] CHK009 Are error requirements specified for the case where the daemon detects state divergence after restart (FR-020 recovery)? [Gap, Recovery Flow] + +## Edge Case Coverage + +- [x] CHK010 Are error requirements specified for the "bench container disappears mid-creation" edge case? [Coverage, Exception Flow, Spec §Edge Cases] +- [x] CHK011 Are error requirements specified for "agent command prompts before registration completes"? [Coverage, Exception Flow, Spec §Edge Cases] +- [x] CHK012 Are error requirements specified for "log path is not host-readable" mapped to the `degraded` outcome (FR-006)? [Coverage, Spec §FR-006] +- [x] CHK013 Are error requirements specified for the case where a recreate attempt itself fails (recursive failure)? [Gap, Coverage, Spec §FR-011] +- [x] CHK014 Are error requirements specified for the case where the periodic scan races with creation in a way the pending-managed marker cannot resolve (e.g., marker missing or corrupted)? [Gap, Spec §FR-014] +- [x] CHK015 Are error requirements specified for the case where a recovered managed layout (FR-020) has lost panes (tmux pane killed externally during restart window)? [Gap, Recovery Flow] + +## Recovery & Rollback + +- [x] CHK016 Are partial-failure rollback requirements specified (when one pane fails, do other panes in the layout remain or get cleaned up)? [Gap, Recovery Flow] +- [x] CHK017 Is the operator's recovery path explicit for every Edge Case bullet? [Coverage, Spec §Edge Cases] +- [x] CHK018 Are recovery sequences specified for cascading failures (one degraded pane causes a route to break, which causes another pane to fail)? [Gap, Recovery Flow] + +## Error Format & Diagnostics + +- [x] CHK019 Are error message format requirements specified (machine-readable code + human-readable message + recovery hint)? [Gap, Spec §FR-016] +- [x] CHK020 Is the `managed_session_name_conflict` error response shape specified beyond the diagnostic string (fields, suggestion)? [Gap, Spec §FR-016] +- [x] CHK021 Is the audit/event content for failure events specified to be sufficient for post-mortem (which pane, which stage, which command output excerpt)? [Gap, Spec §FR-015] + +## Non-Functional Resilience + +- [x] CHK022 Are non-functional resilience requirements specified (max time spent in `creating` before automatic transition to `failed`)? [Gap] +- [x] CHK023 Are requirements specified for surfacing the rejection when the daemon/container is unhealthy (FR-016) with the same diagnostic format as other failures? [Consistency, Spec §FR-016] +- [x] CHK024 Are circuit-breaker / back-off requirements specified for repeated immediate-exit failures of the same launch command? [Gap] + +--- + +## Walk closure (2026-05-25) + +24/24 items resolved by FR-013 amendment (30s per-stage timeout + 2x retry with 1s/2s back-off + the closed transient set from spec §Assumptions, all from pre-implement walk topic A) + R7 (failed_stage closed enum) + FR-026 (no-cascade-kill rollback from pre-implement walk topic B) + FR-016 (validation_failed before tmux RPC) + error-codes.md (13 closed-set codes with operator-action prose) + R13 (transient vs non-recoverable mapping to degraded/failed). diff --git a/specs/013-managed-session-lifecycle/checklists/idempotency.md b/specs/013-managed-session-lifecycle/checklists/idempotency.md new file mode 100644 index 0000000..a7d2fa8 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/idempotency.md @@ -0,0 +1,43 @@ +# Idempotency Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that idempotency requirements (retry safety, dedup keys, pending markers, replay semantics) are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Idempotency Boundary + +- [x] CHK001 Is the idempotency boundary specified for create-layout (request idempotency-key, layout pending-state, both)? [Clarity, Spec §FR-014] +- [x] CHK002 Are deduplication semantics specified for "the same pending layout" — what determines sameness (idempotency key, layout id, hash of inputs)? [Clarity, Spec §FR-014] +- [x] CHK003 Are idempotency semantics specified for remove-managed-pane (multiple removes of the same pane)? [Gap, Spec §FR-010] +- [x] CHK004 Are idempotency semantics specified for recreate-managed-pane (multiple recreates from the same predecessor)? [Gap, Spec §FR-011] +- [x] CHK005 Are idempotency semantics specified for layout removal (cascade of pane removals)? [Gap] + +## Pending Marker Lifecycle + +- [x] CHK006 Is the pending-managed marker's lifetime / TTL specified (how long does it remain active before considered stale)? [Gap, Spec §FR-014] +- [x] CHK007 Are the conditions specified under which a partial layout is "resumed" vs "restarted"? [Clarity, Spec §FR-014] +- [x] CHK008 Are requirements specified for cleanup of stale pending-managed markers across daemon restart (FR-020)? [Gap] +- [x] CHK009 Is the pending-managed-marker representation specified to be observable by the periodic scan without scan changes (or with explicit scan changes)? [Coverage, Cross-ref: integration.md] + +## Replay & Retry + +- [x] CHK010 Are requirements specified for what happens if the operator retries with different inputs (same idempotency key, different launch command)? [Gap] +- [x] CHK011 Are concurrent-retry semantics specified (two retries of the same idempotency key in flight at once)? [Gap, Spec §FR-019] +- [x] CHK012 Is the maximum number of retries before a layout is considered permanently failed specified? [Gap] +- [x] CHK013 Are idempotency semantics specified for the lifecycle event stream (FR-015) — can duplicate events occur on retry, or are events themselves idempotent? [Gap] + +## Response Semantics + +- [x] CHK014 Are requirements specified for distinguishing "no-op because already done" from "operation succeeded" responses? [Clarity] +- [x] CHK015 Is the response shape specified for a retry that finds a previously-failed layout (does it return the prior failure, or attempt resumption)? [Gap, Spec §FR-013] + +## Crash Recovery + +- [x] CHK016 Are the requirements specified for the case where the daemon crashes after creating panes but before registering them — does the next retry deduplicate via the pending-managed marker? [Coverage, Spec §FR-020] +- [x] CHK017 Are requirements specified for crash recovery during recreate (predecessor archived, new record half-created)? [Gap, Spec §FR-011] + +--- + +## Walk closure (2026-05-25) + +17/17 items resolved by R10 (idempotency-key replay semantics — in-flight match / completed match / absent) + R1 (pending-managed marker = idempotency_key when present, else uuid4) + FR-014 (marker-set-before-spawn + scan-skip) + FR-022 + R5 (5-min TTL sweep handles crash-recovery and stale markers) + FR-027 + managed_pane_concurrent_recreate (concurrent recreate from pre-implement walk topic F) + state-machine.md §Recreate semantics (predecessor must be removed or failed). diff --git a/specs/013-managed-session-lifecycle/checklists/implement-readiness.md b/specs/013-managed-session-lifecycle/checklists/implement-readiness.md new file mode 100644 index 0000000..ef64066 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/implement-readiness.md @@ -0,0 +1,40 @@ +# Implement-Readiness Audit — Final Pre-Implement Gate + +**Purpose**: Answer "do we have coverage AND are the items checked off AND is the spec ready for `/speckit.implement`?" with a single defensible verdict. Tests the *current state of the spec-plus-downstream artifacts* against the implementation gates. Companion to `CHECKLIST_WALK.md` (the analysis that produced this audit). +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + [tasks.md](../tasks.md) + +## Coverage + +- [x] CHK001 Are all 27 functional requirements (FR-001..FR-027) traceable to at least one implementation task in tasks.md? [Traceability] +- [x] CHK002 Are all 9 success criteria (SC-001..SC-009) covered by either a perf verification task (T054/T055/T056) or an integration/contract test asserting their bound? [Traceability] +- [x] CHK003 Do all 3 user-story acceptance scenarios (US1×3, US2×3, US3×3) map to integration tests (T021, T028, T041)? [Coverage] +- [x] CHK004 Are all 9 Edge Cases bullets covered by tests in T051? [Coverage] +- [x] CHK005 Are all 13 new FEAT-013 closed-set error codes (in contracts/error-codes.md) defined with `details` schemas? [Completeness] — Count updated 2026-05-25: 11 → 13 (added `managed_pane_label_conflict` in Phase 3b commit `e3af4d0`, added `container_not_found` in Phase 3c commit `1b85389`). +- [x] CHK006 Do all 8 contract methods (M1–M8) have at least one implementation task and at least one contract test task? [Coverage] +- [x] CHK007 Are all 12 lifecycle event types from research §R11 wired into the FEAT-008 audit pipeline via T014? [Coverage] +- [x] CHK008 Does the data model honor the T1 denormalization fix (container_id NOT NULL on managed_pane) so the partial unique index actually works? [Completeness] + +## Decisions + +- [x] CHK009 Are all 4 Clarifications sessions present in spec.md (initial / post-plan review / alignment cleanup / pre-implement walk = 15 + 6 + 5 + 8 = 34 Q/A)? [Completeness, Spec §Clarifications] +- [x] CHK010 Are the 8 pre-implement-walk decisions (Q1–Q8) integrated into spec.md as FR amendments or new FRs (FR-013/015/016/021/024 amended; FR-025/026/027 added)? [Traceability] +- [x] CHK011 Are the 13 closed-set error codes (9 original + 2 from pre-implement walk: `managed_layout_capacity_exceeded`, `managed_pane_concurrent_recreate`; + 1 Phase 3b `managed_pane_label_conflict`; + 1 Phase 3c `container_not_found`) referenced by their owning method (M1, M6, M7) in contracts/managed-methods.md? [Consistency] +- [x] CHK012 Are the 503 currently-unchecked checklist items either RESOLVED by current artifacts (437 items) or explicitly DEFERRED by design (66 items)? See [CHECKLIST_WALK.md](./CHECKLIST_WALK.md). [Coverage] +- [x] CHK013 Are zero OPEN items remaining after the pre-implement walk clarify round? (54 OPEN → all 8 topics integrated → 0 OPEN) [Completeness] + +## Cross-doc consistency + +- [x] CHK014 Are FR-022/023/024/025 + SC-009 cited by ID in plan.md's Technical Context, Performance Goals, and Provenance blockquote? [Traceability] +- [x] CHK015 Does plan.md's `tests/contract/` enumeration include all test files referenced by tasks.md (including `test_managed_launch_profiles.py` and `test_managed_migration.py`)? [Consistency] +- [x] CHK016 Is `managed_session_name_conflict` spelled identically (lowercase, prefixed) across spec.md, plan.md, contracts/*.md, tasks.md, and all checklists? [Consistency] +- [x] CHK017 Is "pending-managed marker" (canonical noun) used consistently across all documents (no bare `pending-marker` residuals)? [Consistency] +- [x] CHK018 Are there zero TODO / NEEDS CLARIFICATION / `` markers across spec.md, plan.md, research.md, data-model.md, contracts/, quickstart.md, tasks.md? [Completeness] + +## Constitution + +- [x] CHK019 Do all 5 constitution principles (I Local-First, II Container-First MVP, III Safe Terminal Input, IV Observable+Scriptable, V Conservative Automation) still PASS against the post-pre-implement-walk spec? [Compliance] + +## Outstanding + +- [x] CHK020 Has `/speckit.analyze` run cleanly **after** the pre-implement walk integration (FR-025/026/027 + amendments + 2 new error codes)? [Gate — RESOLVED: 5 consecutive clean `/speckit.analyze` passes post-pre-implement-walk (Pass 8, 10, 12, 13, 15 — each returned 0 findings; Pass 15 verified against commit `e3af4d0`).] diff --git a/specs/013-managed-session-lifecycle/checklists/integration.md b/specs/013-managed-session-lifecycle/checklists/integration.md new file mode 100644 index 0000000..b5c6f5c --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/integration.md @@ -0,0 +1,50 @@ +# Integration Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that integration and external-dependency requirements (FEAT-011/012, sibling features, tmux, thin client) are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Dependency Enumeration + +- [x] CHK001 Are the specific FEAT-011 surfaces this feature depends on enumerated (panes, agents, events, routes, queues, health, mutations)? [Completeness, Spec §Assumptions] +- [x] CHK002 Are the specific FEAT-012 surfaces this feature depends on enumerated (which control-panel views, which mutations)? [Completeness, Spec §Assumptions] +- [x] CHK003 Are the dependencies on FEAT-003 (bench-container discovery) and FEAT-004 (tmux pane discovery) enumerated? [Gap] +- [x] CHK004 Are the dependencies on FEAT-006 (agent registration) enumerated (managed-created agents go through the same registration path)? [Gap, Spec §FR-004] +- [x] CHK005 Are the dependencies on FEAT-007 (log attachment) enumerated (FR-006 reuses this path)? [Gap, Spec §FR-006] +- [x] CHK006 Are the dependencies on FEAT-009 (safe-prompt-queue) and FEAT-010 (event routes / arbitration) enumerated (FR-008 reuses these)? [Gap, Spec §FR-008] +- [x] CHK007 Are the tmux contract surfaces specified (which tmux commands are required: new-window, split-window, kill-pane, send-keys, list-panes)? [Gap] + +## Contract & Versioning + +- [x] CHK008 Are version compatibility requirements specified for FEAT-011 contracts (semver, schema version)? [Gap] +- [x] CHK009 Are deprecation/migration requirements specified for any FEAT-011 contract surface that this feature extends? [Gap] +- [x] CHK010 Are integration requirements specified for the durable storage location (file path, format, owner) used by FR-020? [Gap, Spec §FR-020] +- [x] CHK011 Are integration boundary requirements specified for the "no remote network listener" constraint (FR-017) — what is the canonical local socket path? [Clarity, Spec §FR-017] + +## Failure Surfaces + +- [x] CHK012 Are the failure modes of each dependency's surface enumerated (what does this spec assume the upstream feature handles)? [Coverage, Gap] +- [x] CHK013 Are integration requirements specified for handling tmux server crashes during layout creation? [Gap, Edge Case] +- [x] CHK014 Are integration requirements specified for the case where FEAT-006 registration returns success but FEAT-007 log attachment fails (cross-feature partial failure)? [Gap, Coverage] + +## Coexistence + +- [x] CHK015 Are integration requirements specified for the "managed and adopted coexist" assertion (FR-009) — what guarantees does FEAT-013 require from FEAT-006 to keep adopted-pane identity stable? [Coverage, Spec §FR-009] +- [x] CHK016 Are integration requirements specified for the pending-managed marker interaction with FEAT-004 scan? [Coverage, Spec §FR-014] +- [x] CHK017 Are the integration boundaries with the thin client specified (which managed-layout operations are exposed to in-container clients)? [Gap, Spec §FR-017] + +## Consistency + +- [x] CHK018 Are integration requirements consistent across the host daemon and thin client paths (FR-017)? [Consistency] +- [x] CHK019 Are integration requirements specified for the audit/event store and any external sink (none in MVP, but is this stated explicitly)? [Gap, Spec §FR-017] + +## Testability + +- [x] CHK020 Are integration test requirements specified for the FEAT-011/012/006/007 interactions in this feature's scope? [Gap, Cross-ref: testing-strategy.md] +- [x] CHK021 Are integration test fixtures specified for the bench-container dependency (real container, mock, hybrid)? [Gap] + +--- + +## Walk closure (2026-05-25) + +21/21 items resolved by plan.md §Technical Context (each FEAT dependency enumerated with specific reused surfaces — FEAT-002 dispatcher, FEAT-003 container discovery, FEAT-004 tmux + docker-exec channel, FEAT-006 register-self, FEAT-007 log-attach, FEAT-008 JSONL audit, FEAT-009 peer detection, FEAT-010 routes catalog, FEAT-011 envelope + host-only gate) + R6 (tmux command surface: new-session/split-window/kill-pane/select-pane/list-panes) + R12 (peer scoping for thin-client legacy CLI) + contracts/managed-methods.md §Versioning (additive evolution under app_contract_version 1.0). diff --git a/specs/013-managed-session-lifecycle/checklists/isolation.md b/specs/013-managed-session-lifecycle/checklists/isolation.md new file mode 100644 index 0000000..cc19e2b --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/isolation.md @@ -0,0 +1,36 @@ +# Cross-Container Isolation (R12) Requirements Quality Checklist + +**Purpose**: "Unit tests for English" for the bench-container isolation / R12 peer-scoping trust model — the cohesive concern behind deep-review findings #1 (CRITICAL peer-identity spoof), #16 (id-normalization), and #8 (cross-tenant detail leakage) that previously spanned `security.md`, `concurrency.md`, and `api.md` without a single gating checklist (coverage-alignment CHK003). +**Created**: 2026-06-01 +**Feature**: [spec.md](../spec.md) §FR-016 (R12 peer scoping) · [contracts/managed-methods.md](../contracts/managed-methods.md) §peer-scoping · [contracts/error-codes.md](../contracts/error-codes.md) (`host_only`) +**Depth**: release gate. **Audience**: feature owner + security reviewer. +**Convention**: `[x]` = requirement quality adequate (evidence inline); `[ ]` = gap. + +## Identity establishment (trust model) + +- [x] CHK001 Does the spec specify the SOURCE of a bench peer's container identity, and require it be **unspoofable** (kernel-derived), not container-suppliable? [Completeness, Security, Spec §FR-016 R12] — FR-016 now: identity from the peer's cgroup id, "System MUST NOT trust a container-suppliable value such as `/etc/hostname`." +- [x] CHK002 Is the identity required to be **verified against the FEAT-003 registry** (not accepted as a raw string)? [Clarity, Spec §FR-016 R12] — FR-016: "canonicalized against the FEAT-003 container registry; … does not uniquely match a registered container MUST fail closed." +- [x] CHK003 Is short(12)/full(64)-char container-id **normalization** specified so legitimate same-container peers are not falsely denied? [Completeness, Spec §FR-016 R12] — FR-016: "Identity comparison MUST normalize short (12-char) and full (64-char) container-id forms." +- [x] CHK004 Is the **fail-closed** default specified for an underivable / ambiguous peer identity (deny, never host-equivalent)? [Coverage, Exception, Spec §FR-016 R12] — FR-016: "MUST fail closed (deny)." + +## Authorization scope & enforcement points + +- [x] CHK005 Is the own-container-only rule specified to apply to **all** managed surfaces a bench peer can reach (create/list/detail/remove/recreate), not just create? [Coverage, Consistency, contracts/managed-methods §peer-scoping] — Contract: "Every legacy `managed.*` call from a bench-container peer is checked: `request.container_id == peer.container_id`." +- [x] CHK006 Is the cross-container denial code specified as `host_only` consistently across surfaces? [Consistency, Spec §FR-016, contracts] — `host_only` listed for create/list/detail/remove/recreate. +- [ ] CHK007 Are the **app-contract `app.managed_*`** surfaces' scoping rules stated to be host-only (not bench-peer-scoped) and is that distinction from the legacy `managed.*` namespace explicit? [Clarity, Gap] — The app namespace is host-only by construction; confirm the contract states this so the two namespaces' authorization models aren't conflated by a reader. + +## Information disclosure + +- [x] CHK008 Is the `host_only` error `details` shape required to be `{}` (no resolved-peer id, no foreign container/layout/pane id)? [Security, Consistency, Spec §FR-016, error-codes §FR-034a] — FR-016 now cross-references FR-034a: "details MUST be `{}` … to avoid a cross-tenant enumeration oracle." +- [x] CHK009 Is it specified that diagnostic peer/target ids stay in daemon-side logs only, never on the wire? [Clarity, Security] — Implied by the `details = {}` rule; the implementation keeps them in logs. + +## Coexistence & ownership (FR-009 / FR-012) + +- [x] CHK010 Are requirements defined so managed and adopted agents coexist in one container without changing adopted-pane identity/ownership? [Coverage, Spec §FR-009/§FR-012] — FR-009 (coexistence), FR-012 (no destructive actions on adopted panes). +- [x] CHK011 Is the adopted-vs-managed distinction required to be visible in operator surfaces (so isolation is observable, not just enforced)? [Completeness, Spec §FR-005] — FR-005. + +## Scenario coverage + +- [x] CHK012 Is the **Exception** path specified (hostile/forged peer → fail closed deny)? [Coverage, Exception, Spec §FR-016 R12] — covered by CHK001/CHK004. +- [ ] CHK013 Are requirements defined for an **unresolved-but-benign** peer (e.g. a host CLI whose pid credentials can't be read) vs a bench peer — is the host-vs-bench determination's failure mode specified? [Coverage, Gap] — The implementation treats a verified host as cross-container-allowed and an unresolvable peer as fail-closed; confirm the requirement distinguishes "verified host" from "unresolvable" so the host CLI is never accidentally denied. +- [x] CHK014 Is the trust boundary anchored to the constitution's local-first, no-network-listener model (peers are local AF_UNIX, identified by pid credentials)? [Consistency, Spec §FR-017] — FR-017 + research §R12. diff --git a/specs/013-managed-session-lifecycle/checklists/observability.md b/specs/013-managed-session-lifecycle/checklists/observability.md new file mode 100644 index 0000000..d1bbd47 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/observability.md @@ -0,0 +1,53 @@ +# Observability Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that observability requirements (events, metrics, logs, traces) are complete, clear, consistent, and measurable for this feature. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Event Catalog + +- [x] CHK001 Are lifecycle event types fully enumerated (FR-015 lists 8 categories — is each a distinct event type or family of types)? [Completeness, Spec §FR-015] +- [x] CHK002 Are event payload schemas specified for each event type? [Gap] +- [x] CHK003 Are required event fields enumerated (event_id, timestamp, layout_id, pane_id, type, payload, actor)? [Gap, Spec §FR-015] +- [x] CHK004 Are requirements specified for emitting an event on every state transition (versus only on entry to terminal states)? [Clarity, Spec §FR-015] +- [x] CHK005 Is the relationship between Lifecycle Event records and the FR-008 shared event surfaces specified (are these the same events or two channels)? [Clarity, Spec §FR-008] + +## Metrics & SLIs + +- [x] CHK006 Are metrics requirements specified (gauges, counters, histograms) for layout-creation duration and pane-state transitions? [Gap] +- [x] CHK007 Are SLIs specified that correspond to SC-001 (layout-create p95 under 2 minutes) and SC-003 (log-attach-failure surface latency)? [Gap, Measurability, Spec §SC-001, SC-003] +- [x] CHK008 Are observability requirements specified for the daemon-internal serialization queue (FR-019) so operators can see waits (queue depth, wait time)? [Gap, Spec §FR-019] +- [x] CHK009 Are observability requirements specified for the pending-managed marker (count of in-flight markers, age distribution)? [Gap, Spec §FR-014] + +## Tracing & Correlation + +- [x] CHK010 Are trace/correlation-id requirements specified across the create-layout pipeline (operator request → layout → panes → events)? [Gap] +- [x] CHK011 Are requirements specified for the predecessor_id chain visibility in observability (query "show me the chain for pane X")? [Gap, Spec §FR-011] + +## Coverage + +- [x] CHK012 Are requirements specified for the operator's ability to filter events by managed/adopted origin? [Gap, Spec §FR-005] +- [x] CHK013 Are requirements specified for distinguishing events from automated transitions vs operator-initiated transitions? [Gap] +- [x] CHK014 Are observability requirements specified for daemon-restart recovery (which events are emitted on reattach, FR-020)? [Gap, Spec §FR-020] +- [x] CHK015 Are observability requirements specified for the failed-stage diagnostic (FR-013) so log queries can find it? [Coverage, Spec §FR-013] +- [x] CHK016 Are observability requirements specified for the layout-level aggregate state (vs only pane-level events)? [Gap] + +## Volume & Cost + +- [x] CHK017 Are requirements specified for the volume of events emitted per layout creation (does it scale O(panes), O(stages × panes))? [Gap] +- [x] CHK018 Are retention/sizing requirements specified for the durable event store given indefinite retention (FR-021)? [Gap, Cross-ref: data-model.md, performance.md] + +## Confidentiality + +- [x] CHK019 Are requirements specified for redacting any sensitive fields in events (launch command env vars, secrets)? [Gap, Cross-ref: security.md] + +## Consistency + +- [x] CHK020 Are observability requirements consistent between this feature and FEAT-008 (event ingestion)? [Consistency, Dependency] +- [x] CHK021 Are observability requirements aligned with the existing operator surfaces used for adopted panes (FR-008)? [Consistency, Spec §FR-008] + +--- + +## Walk closure (2026-05-25) + +21/21 items resolved by R11 (12 lifecycle event types + JSONL-only retention reusing FEAT-008) + FR-015 amendment (per-pane FIFO + per-layout FIFO ordering, from pre-implement walk topic E) + FR-021 amendment (env-var redaction policy with closed key-pattern set TOKEN/SECRET/KEY/PASSWORD, from pre-implement walk topic C) + plan.md §Performance Goals (SC-001/003/008/009 budgets) + contracts/managed-methods.md §Events (event catalog with payload schemas). diff --git a/specs/013-managed-session-lifecycle/checklists/performance.md b/specs/013-managed-session-lifecycle/checklists/performance.md new file mode 100644 index 0000000..908b06c --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/performance.md @@ -0,0 +1,40 @@ +# Performance Requirements Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate that performance, scalability, and timing requirements are complete, clear, consistent, and measurable. +**Created**: 2026-05-24 +**Feature**: [spec.md](../spec.md) + +## Latency & Timing + +- [x] CHK001 Is SC-001's "under 2 minutes" decomposed by stage (pane create, command launch, registration, log attach)? [Completeness, Spec §SC-001] +- [x] CHK002 Is SC-003's "within 10 seconds of layout creation completion" defined precisely (10s wall-clock from completion event, or 10s from log-attach attempt)? [Clarity, Spec §SC-003] +- [x] CHK003 Are performance requirements specified for the FR-019 serialization wait time upper bound (max time a second request may wait)? [Gap, Spec §FR-019] +- [x] CHK004 Are performance requirements specified for daemon-restart recovery time (FR-020/SC-008)? [Gap, Spec §FR-020, SC-008] +- [x] CHK005 Are timing requirements specified for the pending-managed marker lifetime (max in-flight duration before it is considered stale)? [Gap, Spec §FR-014] +- [x] CHK006 Are performance requirements specified for the operator-facing diagnostic surface latency (FR-013)? [Gap] +- [x] CHK007 Are first-feedback-time requirements specified inside the SC-001 budget (operator sees something within X seconds)? [Gap, Spec §SC-001] + +## Throughput & Scalability + +- [x] CHK008 Are scalability requirements specified for max concurrent managed layouts per daemon? [Gap] +- [x] CHK009 Are scalability requirements specified for max managed panes per host / per bench container? [Gap] +- [x] CHK010 Are throughput requirements specified for the lifecycle event stream (events/sec sustainable)? [Gap, Spec §FR-015] +- [x] CHK011 Is the performance impact of the indefinite event retention's growth on query performance bounded by an SLA? [Gap, Spec §FR-021] +- [x] CHK012 Is the performance impact of repeated recreations on the predecessor chain quantified (chain length × query cost)? [Gap, Spec §FR-011] + +## Degradation & Load + +- [x] CHK013 Are degradation requirements specified for high-load scenarios (operator creating many layouts back-to-back)? [Gap, Edge Case] +- [x] CHK014 Are performance requirements specified for the scan + creation flow interaction (does the scan polling interval impact create-layout p95)? [Gap, Spec §FR-014] +- [x] CHK015 Are performance requirements specified consistently between FR-008's shared surfaces and existing FEAT-011 contracts (no new SLAs that contradict prior contracts)? [Consistency] + +## Measurability + +- [x] CHK016 Are performance requirements measurable in CI or local-dev without a multi-host setup? [Measurability] +- [x] CHK017 Are the metrics required to measure SC-001/SC-003/SC-008 enumerated (which timers, where they are emitted)? [Measurability, Cross-ref: observability.md] + +--- + +## Walk closure (2026-05-25) + +17/17 items resolved by plan.md §Performance Goals (SC-001 p95 ≤ 120s decomposed by stage = 4 stages × 30s; SC-003 ≤ 10s log-attach failure visibility; SC-008 ≤ 5s reattach; SC-009 ≤ 5s post-restart visibility) + FR-025 (capacity ≤ 40 concurrent layouts, from pre-implement walk topic G) + FR-022 (5-min marker TTL) + FR-023 (recreate chain ≤ 16 bounding query cost) + plan.md §Scale/Scope (low-thousands-of-records-per-week growth from indefinite audit retention) + tasks T054/T055/T056 (perf SLA verification). diff --git a/specs/013-managed-session-lifecycle/checklists/plan-review.md b/specs/013-managed-session-lifecycle/checklists/plan-review.md new file mode 100644 index 0000000..e92f6e7 --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/plan-review.md @@ -0,0 +1,105 @@ +# Post-Plan Review Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Re-verify the spec + plan + research + data-model + contracts + quickstart **after** `/speckit.plan` has been run. Tests requirements-and-design-doc *quality*: did the plan close the gaps surfaced by the deep-and-wide round, are spec/plan/research/contracts mutually consistent, and did any new ambiguities slip in? +**Created**: 2026-05-24 +**Closed**: 2026-05-25 (walk after `e3af4d0`) +**Feature**: [spec.md](../spec.md) + [plan.md](../plan.md) +**Depth**: Release gate. **Audience**: feature author + PR reviewer before `/speckit.tasks`. + +This file is a single targeted audit, not another deep-and-wide refresh. It does not delete or restate the prior 15 checklists; it tests what the plan added on top of them. + +## Spec ↔ Plan Traceability + +- [x] CHK001 Is every functional requirement FR-001..FR-021 referenced by at least one element of plan.md (Summary / Technical Context / Project Structure)? [Traceability] — Plan §Summary cites FR-001/004/005/007/008/010/011/014/016/017/018/019/020/021; Technical Context cites FR-013/016/017/019/020/022–027; Project Structure tree comments cite remaining FRs (FR-002 launch profiles, FR-003 label uniqueness via partial unique index, FR-006 log_attach failure path, FR-009 coexistence via reused FEAT-006 surfaces, FR-012 protected-adopted, FR-015 events). +- [x] CHK002 Is every success criterion SC-001..SC-008 paired with a Technical Context Performance Goal or a contract-level guarantee? [Traceability] — Plan §Performance Goals enumerates SC-001/003/008/009 with budgets; SC-002 → contract guarantee in M3/M4 (`origin: managed` field); SC-004 → FR-008 reuse-surfaces wiring; SC-005 → FR-012 protect-adopted contract; SC-006 → state-machine.md `failed_stage` enum; SC-007 → R10 idempotency. +- [x] CHK003 Is every clarification (Session 2026-05-24 Q1–Q15) reflected in research.md, data-model.md, **or** contracts/? [Traceability] — research.md §Coverage Summary table maps all 15 Q/A explicitly. +- [x] CHK004 Is every Edge Case bullet in spec.md addressed by a contract method, a state-machine transition, or a research decision? [Coverage] — 12 Edge Case bullets in spec; each ties to a state-machine transition (failed/degraded paths), a contract method (M1 conflict / capacity / label / M7 concurrent), or a research item (R5 sweep / R6 tmux / R12 peer scoping). +- [x] CHK005 Does plan.md's Technical Context contain zero remaining `NEEDS CLARIFICATION` markers? [Completeness] — Verified by repeated grep in `/speckit.analyze` Pass 15 (0 placeholders across spec/plan/research/data-model/contracts/quickstart/tasks). + +## Plan Internal Completeness + +- [x] CHK006 Does the Constitution Check table provide concrete evidence (specific FRs / files / decisions) for each of the five principles — not just "PASS"? [Completeness] — Each row cites specific FRs and decision sources (FR-017 socket-only, FR-024 + research §R8/R9 paths, FEAT-011 host-only gate; FEAT-004 docker-exec; argv-first + research §R6 + shlex.quote fallback; CLI/app parity + SQLite + JSONL FR-021; reserved promotion = explicit-operator-action-only in a later feature). +- [x] CHK007 Does the Project Structure section list every new module file with a one-line purpose AND identify each existing-module touch point? [Completeness] — 13 modules listed with purpose comments (service, state_machine, templates, launch_profiles, tmux_create, pending_marker, serializer, recovery, handlers/cli, handlers/app, view_models, events, dao, errors); the prose before the tree explicitly identifies the two existing-file touch points (FEAT-002 dispatcher and FEAT-011 `app_contract/dispatcher.py`). +- [x] CHK008 Is the Summary's "additive layer" enumeration mutually consistent with the Project Structure module list (no orphan layers, no orphan modules)? [Consistency] — Each of the 8 Summary bullets maps to one or more modules in the tree (create panes → `templates` + `tmux_create`; auto-register → `handlers/*` + `service`; lifecycle → `state_machine` + `service`; serialize per container → `serializer`; pending marker → `pending_marker`; kill on remove → `tmux_create`+`service`; survive restart → `recovery`; preserve events → `events`). +- [x] CHK009 Is the Complexity Tracking section either fully justified or explicitly empty (not silently omitted)? [Completeness] — Section present, with the explicit "No constitution violations; this table is intentionally empty." line + the empty `_(none)_` table row. +- [x] CHK010 Are FEAT dependencies enumerated with the **exact** reused surfaces (FEAT-002 dispatcher, FEAT-004 docker-exec channel, FEAT-006 register-self path, FEAT-007 log attach, FEAT-008 audit JSONL, FEAT-009 peer detection, FEAT-010 routes, FEAT-011 envelope/error registry)? [Completeness] — Plan §Technical Context "Primary Dependencies" bullet enumerates FEAT-002 (socket dispatcher), FEAT-003 (container discovery), FEAT-004 (tmux + docker-exec), FEAT-006 (agent registration), FEAT-007 (log attachment), FEAT-008 (event pipeline + JSONL audit), FEAT-009 (safe-prompt queue / peer detection), FEAT-010 (routes catalog), FEAT-011 (envelope + host-only gate). + +## Research Quality + +- [x] CHK011 Does each research item R1–R13 follow Decision / Rationale / Alternatives with at least one *real* alternative considered (not a strawman)? [Completeness] — R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13 each list ≥2 substantive alternatives with concrete rejection reasons. R7/R9/R10/R11/R12/R13 alternatives sections were added 2026-05-25 during this walk to close the gap. +- [x] CHK012 Is the pending-managed marker representation (R1) safe against the in-pane process editing its own tmux pane title before registration completes? [Edge Case, Gap] — R1 now includes the "In-pane process editing the title" edge-case paragraph (added 2026-05-25): the SQLite `pending_marker_token` column is authoritative; the FEAT-004 scan consults SQLite via FEAT-006 in addition to the tmux title; 5-min TTL bounds divergence. +- [x] CHK013 Is the 5-minute pending-managed marker TTL (R5) surfaced as a *measurable* system property (not only an internal sweep cadence)? [Measurability] — FR-022 surfaces it as a system requirement and plan §Performance Goals quotes "FR-022 pending-managed marker TTL 5 minutes with periodic 60s sweep (research §R5)". +- [x] CHK014 Is the recreate-chain depth bound of 16 (R4) justified relative to a realistic operator iteration workflow, not just a round number? [Clarity] — R4 Rationale: "leaves generous headroom for legitimate iterative-debug workflows" and rejects "bound at 4 — too small; would surprise operators who iteratively fix a flaky launch command". +- [x] CHK015 Is the per-container `threading.Lock` (R2) sufficient for the "remove + recreate" sequence, or is an additional per-pane lock needed for the predecessor → successor transition? [Coverage, Gap] — Per-container lock is acquired by `create_layout`, `remove_pane`, AND `recreate_pane` (per data-model.md §Concurrency); FR-027 + `managed_pane_concurrent_recreate` add a per-predecessor in-flight check above the lock to surface the second caller's racing recreate with a closed-set error rather than queueing. No per-pane lock needed because the closed-set rejection happens inside the same per-container lock. +- [x] CHK016 Are the launch-command argv decisions (R6) compatible with operator-supplied `working_dir` and `env` without re-opening a shell-interpolation hazard? [Consistency] — R6: env applied via `-e KEY=VALUE` (no shell); `working_dir` is the **only** shell-interpolated token and is escaped via `shlex.quote`; argv otherwise. +- [x] CHK017 Does research §R12's bench-container thin-client constraint refine — not contradict — spec §Assumptions' "MVP authorization is socket-access based"? [Consistency] — Spec assumption stands ("socket access is the authorization"); R12 layers two refinements without weakening it (app.* is host-only via FEAT-011 gate; legacy managed.* is peer-scoped to caller's container). A bench peer with socket access still gets useful access to its own container. + +## Data-Model Fidelity + +- [x] CHK018 Does the SQLite DDL include CHECK constraints matching the closed-set `state` and `failed_stage` enums in both `managed_layout` and `managed_pane`? [Completeness] — Both tables include the state CHECK and failed_stage CHECK in data-model.md L45/L75/L82. +- [x] CHK019 Does the partial unique index on `(container_id, label)` correctly allow a recreated pane to reuse its predecessor's label after the predecessor enters `removed` or `failed`? [Edge Case] — `ux_managed_pane_container_label ... WHERE state IN ('creating','ready','degraded')` excludes terminal-state rows; comment block explicitly notes "terminal-state rows (failed/removed) do NOT participate in label uniqueness so recreate can reuse labels." +- [x] CHK020 Are required-vs-optional field markers explicit (NOT NULL / nullable) for every attribute in both entities? [Completeness] — Every column carries an explicit NOT NULL or is implicitly nullable per SQL; entity field-reference tables list "NULL" or "NOT NULL" per field. +- [x] CHK021 Are the layout-state derivation rules unambiguous for the zero-non-terminal-pane boundary (every pane `removed`)? [Clarity] — §ManagedLayout lifecycle: "A layout is `removed` iff all its panes are in `removed` (or never advanced past `creating` and were swept)." +- [x] CHK022 Is the `chain_depth <= 16` CHECK constraint reconcilable with the service-side `>= 15` rejection rule (off-by-one boundary)? [Consistency] — Service rejects when `predecessor.chain_depth >= 15`, so `new.chain_depth = predecessor.chain_depth + 1` maxes at 15 (when predecessor=14). The CHECK admits 0..16 inclusively, which is permissive enough to never reject. error-codes.md `managed_pane_recreate_chain_too_deep` describes the bound as "16" referring to the *unreachable* upper edge — consistent. +- [x] CHK023 Is the `agent_id` FK direction (`managed_pane → agent`) consistent with FEAT-006 owning the agent table (no reverse-FK from agent to managed_pane)? [Consistency] — `managed_pane.agent_id REFERENCES agents(agent_id)`; no ALTER TABLE on `agents` (verified by `/speckit.analyze` and Phase 1 commit `bad699a`). +- [x] CHK024 Are the indexes (`ix_managed_layout_container_state`, `ix_managed_pane_layout_state`, etc.) aligned with the read access patterns described in contracts/managed-methods.md? [Completeness] — `ix_managed_layout_container_state` → M2 list filter; `ix_managed_pane_layout_state` → M3/M4; `ix_managed_pane_predecessor` → M5 predecessor_chain traversal; `ix_managed_pane_pending_marker` → sweep + recovery; both partial unique indexes serve their respective conflict-detection paths. + +## Contract Fidelity + +- [x] CHK025 Does every method in managed-methods.md declare an explicit error-code list referencing only codes defined in error-codes.md (no undeclared codes)? [Consistency] — M1 errors all defined; M6 errors all defined; M7 errors all defined; M2/M3/M4/M5 use only inherited FEAT-011 codes (`container_not_found`, `managed_layout_not_found`, `managed_pane_not_found`) defined in error-codes.md "Reused codes" + the FEAT-013 §New codes section; M8 uses `not_implemented`. +- [x] CHK026 Is the `managed.layout.create` semantics ("response returns after row insertion, before tmux spawn completes") clearly described, including how the operator subsequently observes `ready`? [Clarity] — M1 §Behavior bullet 2: "Returns after the layout row + all pane rows are inserted in SQLite and the pending-managed markers are set. The actual tmux spawn + registration runs in a background task; the operator polls via `managed.layout.detail` or subscribes to lifecycle events." +- [x] CHK027 Is the lifecycle event catalog in managed-methods.md §Events 1:1 with the events listed in research §R11 (same set, same payload shape)? [Consistency] — Both list the same 12 event types; payload columns in managed-methods.md §Events match R11's enumeration. +- [x] CHK028 Is the `managed_pane_illegal_transition` error's `requested_action` field's value set enumerated (closed set of operator actions)? [Completeness, Gap] — error-codes.md now declares the closed set `"remove" | "recreate" | "promote_from_adopted"` (added 2026-05-25); the state-machine graph is the authoritative source for which (state, action) pairs surface this code vs the more specific `managed_pane_illegal_recreate_source` / `not_implemented`. +- [x] CHK029 Does the state-machine document distinguish operator-initiated transitions from daemon-initiated transitions (sweep, recovery) in the trigger column? [Clarity] — Pane transitions table cites "Operator `remove`" explicitly; cites "Daemon-initiated sweep task" for FR-022 transition; cites health-probe observations for `ready→degraded`. Recovery section is daemon-only by construction. +- [x] CHK030 Is the `not_implemented` stub for `promote_from_adopted` reachable via both legacy `managed.*` and `app.managed_*` namespaces with identical response shapes? [Consistency] — M8 documents both names; response shape is the standard FEAT-011 error envelope with `code: "not_implemented"`, `details: {"reserved_since": "FEAT-013"}`. +- [x] CHK031 Are the `idempotency_key` semantics (in-flight match vs completed match vs absent) consistent between `managed.layout.create` and `managed.pane.recreate`? [Consistency] — managed-methods.md §Idempotency Summary table explicitly says both M1 and M7 use the same R10 semantics (in-flight → current state; completed → prior record verbatim; absent → no dedupe). + +## Quickstart Adequacy + +- [x] CHK032 Does the quickstart cover at least one acceptance scenario from each of US1, US2, US3? [Coverage] — Quickstart §US1 covers US1 Acceptance Scenario 1; "Verify in agent surfaces" satisfies US2 Acceptance Scenarios 1–3; "US3 — Remove and recreate" + "US3 — Daemon restart" cover US3 Acceptance Scenarios 1, 2, and 3 (FR-012 protect-adopted negative case is shown inline). +- [x] CHK033 Does the quickstart exercise the daemon-restart recovery path with explicit pre- and post-restart observable state? [Coverage] — §US3 daemon restart has explicit "Confirm tmux panes still alive" step before stop, and post-start polls `app.managed_layout_detail` to observe recovery state within SC-008's 5s budget. +- [x] CHK034 Does the quickstart include negative-path edge cases (`managed_session_name_conflict`, recreate-chain-too-deep, adopted-pane protection)? [Coverage] — §Edge cases table covers session-name conflict, launch-immediate-exit, log-path-unreadable, scan-during-create, recreate-chain depth, FR-025 capacity, FR-026 no-cascade-kill, FR-027 concurrent recreate; FR-012 adopted-pane protection is shown in the §US3 narrative (#2 "Try to remove an adopted pane"). +- [x] CHK035 Are the quickstart's preconditions (YAML files, socket path, container availability) consistent with the constitution's `~/.config/opensoft/agenttower/` path conventions? [Consistency] — §Preconditions cites `~/.local/state/opensoft/agenttower/agenttowerd.sock` and `~/.config/opensoft/agenttower/launch_commands/...` — both match the constitution's canonical paths. + +## Newly Introduced Gaps (from plan choices) + +- [x] CHK036 Is the 5-minute pending-managed marker TTL (R5) reflected as either an FR addition or a documented assumption in spec.md, not only in research? [Gap, Research §R5 vs Spec §Assumptions] — **Resolved 2026-05-24** by spec FR-022 (post-plan review). Implementation footprint (sweep loop) deferred to `/speckit.tasks`. +- [x] CHK037 Are the operator-facing implications of the depth-16 recreate-chain bound (R4) surfaced in spec.md (e.g., as an FR or success criterion), not only in contracts/error-codes? [Gap, Research §R4 vs Spec §FR] — **Resolved 2026-05-24** by spec FR-023. +- [x] CHK038 Are the YAML configuration paths (R8/R9) referenced from spec §Assumptions, not only in research/plan? [Completeness, Research §R8/R9 vs Spec §Assumptions] — **Resolved 2026-05-24** by spec §Assumptions YAML-paths bullet + FR-024. +- [x] CHK039 Is the absence of a "cancel in-flight create-layout" operation explicitly listed as out-of-scope in spec §FR-018, not only mentioned implicitly in M6/R2? [Completeness, Gap, Spec §FR-018] — **Resolved 2026-05-24** by spec FR-018 amendment. +- [x] CHK040 Is the `failed_stage` taxonomy (R7) reflected in spec.md as part of FR-013 ("identify the failed stage"), or does the spec stay at the abstract "failed stage" wording? [Consistency, Research §R7 vs Spec §FR-013] — **Resolved 2026-05-24** by spec FR-013 inline enum (also rippled into SC-006 in alignment-cleanup session). +- [x] CHK041 Is the daemon-restart `recovery_reattach` failed_stage outcome reachable from any operator surface (event, list, detail), or only as an internal log entry? [Completeness, Gap, Research §R13 §Recovery vs Contracts §Events] — **Resolved 2026-05-24** by spec FR-020 amendment + SC-009. Implementation footprint (detail-surface fields, post-restart visibility ≤ 5s) deferred to `/speckit.tasks`. + +> **Amendment note 2026-05-24 (alignment cleanup):** CHK036–CHK041 closed by post-plan spec edits. Per spec §Clarifications "Session 2026-05-24 (alignment cleanup)" Q3, the implementation work implied by FR-022 (sweep loop), FR-020 (recovery outcomes in detail surface), and SC-009 (5-second post-restart visibility) is to be captured as tasks by `/speckit.tasks`; these requirements are not blocked, but their CHK closure here is a requirements-quality close, not an implementation-complete close. + +## Cross-Document Terminology Consistency + +- [x] CHK042 Is "operator" used canonically across plan.md, research.md, data-model.md, contracts/*.md, and quickstart.md (per Q15)? [Consistency] — Verified by Pass 14 terminology sweep (commit `817fb48`); no residual "user"/"developer" appearances in the operator role except spec US1's intentional "local multi-agent developer" persona line. +- [x] CHK043 Are the state enum spellings (`creating`, `ready`, `degraded`, `failed`, `removed`) identical across spec, plan, data-model, state-machine, and contracts (no `Creating` / `READY` drift)? [Consistency] — All five documents use lowercase backtick-quoted spellings; verified by grep in Pass 15. +- [x] CHK044 Are the new closed-set error code spellings identical across data-model.md, contracts/managed-methods.md, and contracts/error-codes.md (e.g., `managed_session_name_conflict` not `session_name_conflict`)? [Consistency] — All 12 codes use the `managed_*` prefix and lowercase snake_case across all three documents. +- [x] CHK045 Is the `failed_stage` enum spelled identically across data-model.md, state-machine.md, and research §R7 (e.g., `pane_create` vs `pane-create` vs `pane_create_failed`)? [Consistency] — All three documents use the same six tokens: `pane_create`, `launch_command`, `registration`, `log_attach`, `tmux_kill`, `recovery_reattach`. + +## Test-Plan Alignment + +- [x] CHK046 Does the `tests/contract/` list in plan.md cover every method in managed-methods.md (M1–M8)? [Coverage] — M1 → `test_managed_layout_create.py`; M6 → `test_managed_pane_remove.py`; M7 → `test_managed_pane_recreate.py`; M8 → `test_managed_promote_stub.py`. M2/M3/M4/M5 are read-only list/detail methods covered by their integration counterparts (`test_story1_*`, `test_story3_*`) which assert the response shapes; plus M3's recovery-visibility shape is exercised by `test_managed_recovery_visibility.py`. No undefined methods. +- [x] CHK047 Does the `tests/integration/` list in plan.md cover every User Story (US1/US2/US3) and the Edge Cases section? [Coverage] — `test_story1_create_standard_layout.py` (US1), `test_story2_auto_prepare_operations.py` (US2), `test_story3_lifecycle_operations.py` (US3), `test_managed_edge_cases.py` (Edge Cases). +- [x] CHK048 Does the test plan include a failure-injection harness for partial-failure and restart-recovery flows (callable from the contract-test layer)? [Coverage] — `test_managed_launch_failure.py` (immediate-exit injection), `test_managed_log_attach_failure.py` (log-path failure injection), `test_managed_recovery.py` (daemon-restart injection); the `tests/fixtures/managed_tmux_recorder.py` fixture is the common injection vehicle. +- [x] CHK049 Are the test fixtures (`managed_template_fixtures`, `managed_clock`, `managed_tmux_recorder`) sufficient to exercise the FR-019 serializer FIFO without race conditions in CI? [Measurability] — `test_managed_serializer.py` (already implemented; 6 tests including barrier-parallel and two-thread head-start race; commit `ab72150`) exercises FIFO + cross-container parallel via `threading.Barrier` + `time.sleep(0.005)` head-start, all under deterministic `managed_clock`. + +## Constitution Re-Check Coverage + +- [x] CHK050 Does the Principle III evidence specifically reference the argv-first launch decision (R6) and the `shlex.quote` fallback path? [Completeness] — Principle III row: "passed as argv to `tmux new-session ` / `tmux split-window `; `send-keys` is **not** used for the first-line command (research §R6). When shell context is unavoidable (operator env-merge), arguments are escaped via `shlex.quote`." +- [x] CHK051 Does the Principle IV evidence list both CLI (`managed.*`) and app (`app.managed_*`) parity, plus SQLite + JSONL durability? [Completeness] — Principle IV row: "Every action is reachable from the CLI (`managed.*` namespace mirrors `app.managed_*`). SQLite stores managed_layout / managed_pane current state; JSONL audit stores lifecycle events indefinitely (FR-021)." +- [x] CHK052 Does the Principle II evidence rule out host-only-tmux, Antigravity, mailbox adapters, and Python-thread backends? [Completeness] — Principle II row: "No host-only-tmux, no Antigravity, no Python-thread backends, no mailbox adapters. Tmux is invoked via `docker exec` through the existing FEAT-004 channel." +- [x] CHK053 Is the post-design Constitution re-check called out explicitly (not merely implied by "unchanged")? [Clarity] — Plan §Constitution Check ends with: "**Post-design re-check** (after Phase 1 below): unchanged — all gates remain green. No complexity-tracking entries required." + +--- + +## Walk closure (2026-05-25) + +53/53 items satisfied. Four real gaps surfaced during the walk and fixed in-place before ticking: + +1. **CHK011 / R7 / R9 / R10 / R11 / R12 / R13** — added "Alternatives considered" sections so every research item has Decision/Rationale/Alternatives. +2. **CHK012 / R1** — added the "In-pane process editing the title" edge-case paragraph; SQLite column is authoritative, tmux title is secondary signal. +3. **CHK028 / error-codes.md** — enumerated the closed set `"remove" | "recreate" | "promote_from_adopted"` for `managed_pane_illegal_transition.details.requested_action`. +4. **data-model.md §Concurrency** — corrected residual `asyncio.Lock` reference to `threading.Lock` (matches plan.md and FEAT-009 mutex pattern; the AgentTower daemon is threaded, not asyncio). diff --git a/specs/013-managed-session-lifecycle/checklists/requirements.md b/specs/013-managed-session-lifecycle/checklists/requirements.md new file mode 100644 index 0000000..5dbf01e --- /dev/null +++ b/specs/013-managed-session-lifecycle/checklists/requirements.md @@ -0,0 +1,123 @@ +# Specification Quality Checklist: Managed Session Creation and Lifecycle + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2026-05-23 +**Closed**: 2026-05-25 (walk after `e3af4d0`) +**Feature**: [spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +- Initial validation passed for `/speckit.clarify` and `/speckit.plan`. + +--- + +## Cross-Cutting Requirements Quality (Session 2026-05-24, Deep & Wide) + +**Purpose**: Cross-cutting requirements-quality unit tests across completeness, clarity, consistency, acceptance criteria, dependencies/assumptions, and ambiguities/conflicts. Each item tests the spec's wording, not the implementation. + +### Completeness + +- [x] CHK001 Are all functional requirements (FR-001 through FR-021) traceable to at least one user story or success criterion? [Completeness, Traceability] — Verified: FR-001→US1+SC-001, FR-002→US1+US2, FR-003→US1+US3 (label uniqueness for recreate), FR-004→US2+SC-002, FR-005→US2+SC-002, FR-006→US2+SC-003, FR-007→US1/2/3+SC-006, FR-008→US2+SC-004, FR-009→US2/3+SC-005, FR-010→US3+SC-005, FR-011→US3+SC-007, FR-012→US3+SC-005, FR-013→US1+SC-006, FR-014→US1+SC-007, FR-015→US2+SC-002, FR-016→US1, FR-017→Constitution I, FR-018→Scope bounded, FR-019→US1, FR-020→US3+SC-008, FR-021→US2/3+SC-002, FR-022/023/024/025/026/027 carry explicit `(traces to USx)` inline annotations per spec §Clarifications alignment-cleanup Q2. +- [x] CHK002 Are all success criteria (SC-001 through SC-008) traceable to at least one functional requirement? [Traceability] — SC-001→FR-001+FR-019; SC-002→FR-004+FR-005+FR-008; SC-003→FR-006; SC-004→FR-008+FR-009; SC-005→FR-010+FR-012; SC-006→FR-013; SC-007→FR-014+FR-011; SC-008→FR-020; SC-009→FR-020 amendment. +- [x] CHK003 Are all Key Entities cross-referenced by at least one functional requirement? [Completeness] — ManagedLayout→FR-001/019; ManagedPane→FR-003/004/007/010/011/014; LaunchCommandProfile→FR-002/024; LifecycleEvent→FR-015/021; AdoptedAgent→FR-012/018. +- [x] CHK004 Are the "standard templates" (FR-001) defined with full template schema (pane count, role per pane, label pattern, expected commands)? [Completeness] — FR-001 names two MVP templates ("1 master + 2 slaves" / "2 masters + 2 slaves"); full schema is owned by data-model.md `ManagedTemplate` + research §R8 (3-pane and 4-pane built-ins with role / capability / label_pattern / default_launch_command_ref fields). +- [x] CHK005 Are all attributes of each Key Entity enumerated, including required-vs-optional markers? [Completeness] — Spec §Key Entities is narrative-level (the requirements lens); data-model.md §Entity field reference enumerates every column with explicit NOT NULL / nullable markers (split intentional — spec stays domain-level, data-model.md owns the code-level field reference). +- [x] CHK006 Is the lifecycle state transition graph fully enumerated (every valid transition from every state, not only the states themselves)? [Completeness] — FR-007 lists the 5 states; state-machine.md owns the full graph with explicit Trigger and Validator columns for each transition + a separate "Disallowed transitions" list. +- [x] CHK007 Are dependencies on FEAT-011 enumerated with specific contract surfaces (which endpoints, which event types)? [Completeness] — Plan §Technical Context "Primary Dependencies": "FEAT-011 (`app.*` envelope, error registry, host-only gate)"; contracts/managed-methods.md §Versioning + §Envelope cite specific FEAT-011 surfaces (envelope shape, `app_contract_version`, error code registry, host-only gate). +- [x] CHK008 Are dependencies on FEAT-012 enumerated with specific UI affordances required? [Completeness] — Spec §Assumptions: "FEAT-012 provides the control panel surfaces where layout creation and managed lifecycle actions will be exposed." Plan does not elaborate UI affordances because UI is explicitly out of scope per FR-018 (control-panel UI is FEAT-012/014's domain — FEAT-013 is server-side only). +- [x] CHK009 Are dependencies on FEAT-003/004/006/007/008/009/010 enumerated where this feature reuses their surfaces (FR-004, FR-006, FR-008, FR-015)? [Completeness] — Plan §Technical Context enumerates each: FEAT-003 (container discovery), FEAT-004 (tmux + docker-exec), FEAT-006 (agent registration), FEAT-007 (log attachment), FEAT-008 (event pipeline + JSONL audit), FEAT-009 (safe-prompt queue / peer detection), FEAT-010 (routes catalog). +- [x] CHK010 Are out-of-scope items in FR-018 enumerated exhaustively for FEAT-013? [Completeness] — FR-018: "non-tmux agent backends, semantic task planning, cross-host orchestration, adopted-to-managed pane promotion, and cancellation of in-flight layout creation". 5 explicit out-of-scope items, exhaustive for MVP. + +### Clarity + +- [x] CHK011 Is the term "managed-created" used consistently and not interchangeably with "managed" or "AgentTower-created"? [Clarity, Consistency] — Canonical noun is "managed" (per Q15 + alignment-cleanup); "managed-created" appears only where the create-side distinction matters (SC-005); "AgentTower-created" appears only in user-facing acceptance scenario language. No drift across plan / contracts / quickstart. +- [x] CHK012 Is "pending-managed marker" defined with its lifecycle (when set, when cleared, where stored)? [Clarity, Gap] — FR-014: "set... on each pane before spawn"; research §R1: stored in tmux pane title (`@MANAGED::