diff --git a/.agents/skills/d3k/SKILL.md b/.agents/skills/d3k/SKILL.md new file mode 100644 index 00000000..4407f0e5 --- /dev/null +++ b/.agents/skills/d3k/SKILL.md @@ -0,0 +1,145 @@ +--- +name: "d3k" +description: "d3k assistant for debugging web apps" +--- + +# d3k Commands + +d3k captures browser and server logs in a unified log file. Use these commands: + +## Viewing Errors and Logs + +```bash +d3k errors # Show recent errors (browser + server combined) +d3k errors --context # Show errors + user actions that preceded them +d3k errors -n 20 # Show last 20 errors + +d3k logs # Show recent logs (browser + server combined) +d3k logs --type browser # Browser logs only +d3k logs --type server # Server logs only +``` + +## Other Commands + +```bash +d3k fix # Deep analysis of application errors +d3k fix --focus build # Focus on build errors + +d3k crawl # Discover app URLs +d3k crawl --depth all # Exhaustive crawl +``` + +## Browser Interaction + +`d3k agent-browser` auto-connects to the active session's browser via CDP: + +```bash +d3k agent-browser open http://localhost:3000/page +d3k agent-browser snapshot -i # Get element refs (@e1, @e2) +d3k agent-browser click @e2 +d3k agent-browser fill @e3 "text" +d3k agent-browser screenshot /tmp/shot.png +``` + +To target a different browser, run `d3k agent-browser connect ` first. + +## Codex Fresh Browser/Profile Startup + +Use this workflow when the user asks Codex to start d3k with a fresh browser/profile. + +1. Close any stale `agent-browser` daemon before launching with `--profile`. Otherwise `agent-browser` will reuse the existing daemon and print `--profile ignored`. + ```bash + d3k agent-browser close --all + ``` + +2. Start the app through d3k in `servers-only` mode and keep that command running. In Codex, this is more reliable than asking d3k to launch the browser itself when a fresh profile is required. + ```bash + d3k --no-agent --no-skills --servers-only --command "npm run dev -- -H 127.0.0.1 -p 3000" --port 3000 --startup-timeout 90 --no-tui + ``` + + Adjust the package-manager command and port for the project. Prefer `--command` over `--script` when passing framework flags. For npm scripts, put flags after `--`; otherwise tools like Next.js can interpret the port as a project directory. + +3. Verify the server before opening more browser windows: + ```bash + curl -I http://127.0.0.1:3000 + ``` + +4. Open the fresh profile as a separate browser step: + ```bash + d3k agent-browser --profile /tmp/d3k-fresh-profile --headed open http://127.0.0.1:3000 + ``` + +5. Sanity-check the opened page: + ```bash + d3k agent-browser get title + d3k agent-browser snapshot -i + d3k errors + ``` + +Practical rules: + +- Prefer `127.0.0.1` for this workflow. If `localhost` hangs or flips between IPv4/IPv6 behavior, do not keep retrying browser launches. +- If `curl -I` hangs, the server is wedged even if the port appears occupied; restart the d3k server process before opening a browser. +- In `servers-only` mode there is no d3k-monitored CDP browser. Use regular `d3k agent-browser` commands, not `d3k cdp-port`. +- In sandboxed agent environments, rerun local-network checks and `agent-browser` opens outside the sandbox when sandbox networking blocks access to `127.0.0.1`. + +## Browser Tool Choice + +Use `agent-browser` for browser work. + +Practical rule: + +- Need to drive the same monitored browser session: use `agent-browser`. +- Examples: + +```bash +d3k agent-browser snapshot -i +d3k agent-browser click @e2 +``` + +To make d3k prefer one locally when it launches helper browser commands, use: + +```bash +d3k --browser-tool agent-browser +``` + +## Fix Workflow + +1. `d3k errors --context` - See errors and what triggered them +2. Fix the code +3. `d3k agent-browser open ` then `d3k agent-browser click @e1` to replay +4. `d3k errors` - Verify fix worked + +## Creating PRs with Before/After Screenshots + +When creating a PR for visual changes, **always capture before/after screenshots** to show the impact: + +1. **Before making changes**, screenshot the production site: + ```bash + d3k agent-browser open https://production-url.com/affected-page + d3k agent-browser screenshot /tmp/before.png + ``` + +2. **After making changes**, screenshot localhost: + ```bash + d3k agent-browser open http://localhost:3000/affected-page + d3k agent-browser screenshot /tmp/after.png + ``` + +3. **Or use the tooling API** to capture multiple routes at once: + ``` + capture_before_after_screenshots( + productionUrl: "https://myapp.vercel.app", + routes: ["/", "/about", "/contact"] + ) + ``` + +4. **Include in PR description** using markdown: + ```markdown + ### Visual Comparison + | Route | Before | After | + |-------|--------|-------| + | `/` | ![Before](before.png) | ![After](after.png) | + ``` + + Upload screenshots by dragging them into the GitHub PR description. diff --git a/.agents/skills/ln-build/SKILL.md b/.agents/skills/ln-build/SKILL.md index fefbc4dc..10ae39b4 100644 --- a/.agents/skills/ln-build/SKILL.md +++ b/.agents/skills/ln-build/SKILL.md @@ -14,7 +14,7 @@ A full or light scope card from `ln-scope`, the next ready card in `memory/CARDS Extract: target behavior / objective, acceptance criteria, and verification approach. -Treat the scope card as the next implementation step inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item; the scope card is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive scope cards may land on the same branch. +Treat the scope card as the next implementation slice inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item and Linear/branch unit; the scope-card slice is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive slices may land on the same branch. If `memory/CARDS.md` exists, treat it as a derivative execution queue, not canonical planning state. Start with the next card marked `next` or the first unfinished card in that file. If that card is already satisfied on the current branch, do **not** manufacture a no-op build commit; verify the acceptance criteria, mark the card `done` or `dropped` as appropriate, reconcile the queue, and either continue to the next honest build target or route back to `ln-scope` if no build remains. @@ -35,7 +35,7 @@ Do not invent new planning docs, scratch histories, or alternate memory location ## Serial execution mode -When several prepared cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit. +When several prepared slice cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit. Loop shape: @@ -62,18 +62,26 @@ Stop the serial loop immediately when any of these becomes true: Translate acceptance criteria into failing tests when the change benefits from them. For bugfixes or subtle seam changes, prefer one high-leverage regression test. For trivial maintenance or doc-only work, tests may be unnecessary. +Test behavior through public interfaces, not implementation details. A good test describes what capability exists and would survive internal refactoring. Avoid tests that mock internal collaborators, assert private call order, or inspect storage directly when the public interface can prove the behavior. + +Do not horizontal-slice TDD. Never write a batch of imagined tests first and then a batch of implementation. Use tracer bullets: one failing behavioral test → minimum code to pass → next failing behavioral test. Each new test should respond to what the previous cycle taught you. + Run the relevant checks. Confirm failures are meaningful. If the card is already green before any code change, treat that as evidence the queue item is already satisfied or stale — not as permission to create a ceremonial red/green cycle. ## Green -Write the minimum code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring. +Write the minimum coherent code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring. -No speculative abstractions. Only extract when two concrete cases force it. +Honor the repo's pre-release posture: if the current schema, fixture shape, dummy data, or terminology is wrong for the model, change it and regenerate dependent artifacts rather than preserving accidental compatibility. Delete obsolete paths in the same slice when they are inside the active seam. + +No speculative abstractions. Only extract when two concrete cases force it. Do not anticipate later tests or build shape-only scaffolding; let the current behavioral test pull the interface into existence. ## Refactor With tests green, improve names, boundaries, and obvious local structure. Do not widen scope. +Refactor only while green. Keep the tests pinned to the public behavior so they protect the slice while allowing internals to move. If refactoring reveals that the test is coupled to implementation, fix the test seam before trusting it. + ## Verify and commit Run the project's verification harness. All checks must pass. If the card proved already satisfied and no code or canonical-state change was needed, do not create an empty commit. @@ -93,10 +101,10 @@ After the build lands and verification passes, ask: ### If all answers are no -- Mark the work done in `memory/PLAN.md` **if it was tracked there** +- Mark the containing frontier done in `memory/PLAN.md` **if the build completed the frontier item**, usually by updating `Sequencing` / frontier status rather than moving definition blocks - Update `Recently Completed` if the plan uses it -- Do **not** add new SPEC/PLAN bookkeeping just because work happened -- If the work was non-trivial, required manual verification, or leaves residual risk, record `Done / Verified / Watch` in `memory/PLAN.md` `Recently Completed` when that watch matters beyond the current session +- Do **not** add new SPEC/PLAN bookkeeping just because a slice happened +- If the slice was non-trivial, required manual verification, or leaves residual risk that matters beyond the current session, record it in the containing frontier definition or a terse `Recently Completed` entry only when it affects frontier-level re-entry ### If any answer is yes @@ -111,8 +119,9 @@ Update only the touched traceability items. #### Update rules 1. **PLAN** - - Mark the item done if it was tracked - - If the change closes or unblocks a frontier item, reflect that in `Active`, `Next`, or `Recently Completed` + - Mark the frontier item done if this slice completed it + - If the change closes, blocks, or unblocks a frontier item, reflect that in `Sequencing`, the affected `Frontier Definitions` entry, or `Recently Completed` + - Do not mirror detailed slice/card history into `memory/PLAN.md`; keep active execution queues in `memory/CARDS.md` 2. **Assumptions** - evidence answered it → update to `validated` or `invalidated` diff --git a/.agents/skills/ln-diagnose/SKILL.md b/.agents/skills/ln-diagnose/SKILL.md new file mode 100644 index 00000000..f945c38e --- /dev/null +++ b/.agents/skills/ln-diagnose/SKILL.md @@ -0,0 +1,137 @@ +--- +name: ln-diagnose +description: "Scientific debugging for bugs, flakes, failures, and performance regressions. Use when something is broken, throwing, failing, slow, nondeterministic, or when the user says diagnose/debug this. Builds a trusted repro loop, tests falsifiable hypotheses, installs a regression oracle, and routes durable findings back into ln-* planning." +argument-hint: "[bug report, failing command, error, or regression description]" +--- + +# Ln Diagnose + +Debug by scientific method: trusted repro loop, falsifiable hypotheses, one-variable probes, regression oracle. Do not fix by inspection unless the cause is already proven. + +## Input + +Bug, failure, flake, or regression to diagnose: $ARGUMENTS + +Orient first: + +1. Read `memory/SPEC.md` if present; use its lexicon and live invariants. +2. Read `memory/PLAN.md` if present; identify the containing frontier item if one exists. +3. Read `HANDOFF.md` if present. +4. For runtime/UI failures, read the relevant project praxis doc before inspecting logs or driving browsers. + +Write a 2-4 bullet orientation note: symptom, suspected seam, current feedback loop, proof standard. + +## 1. Build the repro loop + +This is the skill. A fast deterministic pass/fail loop makes the rest mechanical. No loop, no diagnosis. + +Try, in rough order: + +1. failing unit/integration/e2e test at the seam that reaches the bug +2. CLI/script with fixture input and asserted output +3. HTTP/curl script against a running server +4. browser automation asserting DOM, console, or network +5. replayed artifact: request, trace, event log, fixture, HAR +6. throwaway harness around the smallest subsystem that exercises the path +7. property/fuzz loop for intermittent wrong output +8. bisection/differential loop across commits, versions, datasets, or configs +9. structured HITL loop only when a human must observe or click + +Improve the loop before moving on: faster, sharper assertion, less flake. Pin time, randomness, network, filesystem, and concurrency. For nondeterministic bugs, raise reproduction rate with repetition/stress until it is debuggable. + +If no loop can be built, stop. Report what you tried and ask for access, logs, traces, fixtures, timestamped recordings, or permission for temporary instrumentation. + +## 2. Reproduce the user's bug + +Run the loop. Confirm it demonstrates the reported bug, not a nearby failure. + +Capture: + +- command/script/test used +- exact symptom: error, diff, timing, screenshot, console/network evidence +- reproduction rate for flakes +- saved replay artifact, if any + +Lack of reproduction is allowed only as an explicit diagnosis result. + +## 3. Rank falsifiable hypotheses + +Generate 3-5 hypotheses before testing any one of them. Each hypothesis must predict an observation: + +```md +If [cause] is true, then [probe/change] will make [specific observation] happen. +``` + +Prefer hypotheses that distinguish seams or invariants from `memory/SPEC.md`. Show the ranking to the user when they are present; proceed if they are AFK. + +## 4. Probe one variable at a time + +Every probe maps to one prediction. Prefer debugger/REPL inspection, then targeted boundary logs, then temporary assertions/counters. + +Tag temporary instrumentation with a unique prefix like `[DEBUG-a4f2]`. Cleanup must be grep-able. Never "log everything and grep". + +Performance branch: measure first. Establish a baseline timing/profiler/query-plan signal, then bisect or compare. Do not optimize before the measurement identifies the seam. + +## 5. Choose the fix route + +Before coding, choose the route: + +- **Direct fix / `ln-build`** — cause is proven and the change stays inside a settled seam. +- **`ln-scope` or `ln-spec`** — the fix changes a seam, invariant, requirement, assumption, or frontier shape. +- **`ln-spike` or `ln-design`** — diagnosis answered one question but the fix shape remains uncertain. +- **`ln-review` / `ln-refactor`** — no correct regression seam exists, or architecture contributed to the bug. + +Install the regression oracle before the fix when a correct seam exists. A correct seam reproduces the real bug pattern as it occurs at the call site. Shallow tests that cannot fail for the original bug are false confidence. + +## 6. Cleanup and postmortem + +Before declaring done: + +- [ ] original repro loop no longer reproduces the bug, or non-repro is the diagnosis +- [ ] regression oracle exists and passes, or absence of a correct seam is documented +- [ ] all `[DEBUG-...]` instrumentation is removed +- [ ] throwaway harnesses are deleted or visibly temporary +- [ ] confirmed causal hypothesis is stated in the report / commit message + +Ask: what would have prevented this bug? Route missing invariants, unclear seams, weak oracles, and bad module shapes into the appropriate `ln-*` skill. + +## Canonical reconciliation + +Reconcile only durable truth: + +- New/retired assumption → update `memory/SPEC.md` §Assumptions. +- New seam-level invariant or oracle gap → update `memory/SPEC.md` or route to `ln-oracles`. +- Frontier status changed → update `memory/PLAN.md`. +- Local bug with no durable implication → no canonical update beyond tracked PLAN status. + +Do not create `CONTEXT.md`, ADRs, or alternate planning docs. Canonical docs are `memory/SPEC.md` and `memory/PLAN.md`. + +## Output + +```md +## Diagnosis: [symptom] + +**Repro loop:** [command/script/test and reproduction rate] +**Confirmed cause:** [one sentence] +**Evidence:** [key observations] +**Fix route:** [direct fix | ln-scope | ln-build | ln-spike | ln-review | ln-refactor] +**Regression oracle:** [test/harness or why unavailable] +**Canonical updates:** [none | specific SPEC/PLAN changes needed] +``` + +## Routing + +After diagnosis, present these options to the user (use `tool-ask-question`): + +| # | Label | Target | Why | +| --- | ---------------- | ------------ | --- | +| 1 | Scope the fix | `ln-scope` | The fix needs a buildable card or durable seam update | +| 2 | Build the fix | `ln-build` | The fix is settled and ready for red-green-refactor | +| 3 | Spike deeper | `ln-spike` | A hard question remains after reproduction | +| 4 | Review structure | `ln-review` | No good seam/regression oracle exists or architecture contributed | +| 5 | Back to triage | `ln-consult` | Diagnosis changed priority or scope | + +Recommended: **2** only when cause and seam are proven; otherwise **1**. + +--- +*Adapted from [mattpocock/skills/engineering/diagnose](https://github.com/mattpocock/skills/tree/main/skills/engineering/diagnose).* diff --git a/.agents/skills/ln-grill/SKILL.md b/.agents/skills/ln-grill/SKILL.md index 0c17ce77..593a5b35 100644 --- a/.agents/skills/ln-grill/SKILL.md +++ b/.agents/skills/ln-grill/SKILL.md @@ -26,7 +26,7 @@ When understanding is reached, present these options to the user (use `tool-ask- | # | Label | Target | Why | | --- | --------------- | ---------- | --------------------------------------- | | 1 | Write a spec | `ln-spec` | Understanding is sufficient for a spec | -| 2 | Plan slices | `ln-plan` | Problem is clear, needs slice breakdown | +| 2 | Plan frontier | `ln-plan` | Problem is clear, needs frontier breakdown | | 3 | Scope one slice | `ln-scope` | One slice is already obvious | Recommended: choose based on how much structure the understanding needs. diff --git a/.agents/skills/ln-handoff/SKILL.md b/.agents/skills/ln-handoff/SKILL.md index 0c5c3db1..f808b8a5 100644 --- a/.agents/skills/ln-handoff/SKILL.md +++ b/.agents/skills/ln-handoff/SKILL.md @@ -38,7 +38,7 @@ This is the critical step. Scan the conversation for volatile artifacts — info - **Queued scope cards** already persisted in `memory/CARDS.md` — capture only what is still volatile about them: which card is next, whether the queue is still valid, and any card-level corrections that have not been written back yet - **Plan drafts** from `ln-plan` — slice lists, ordering decisions, dependency reasoning not yet in `memory/PLAN.md` - **Design outputs** from `ln-design` — alternative module shapes considered, the chosen shape, and rejected tradeoffs -- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-slice verification approaches, acknowledged blind spots, and whether slice verification design is complete / pending / stale relative to the code +- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-frontier or per-slice verification approaches, acknowledged blind spots, and whether verification design is complete / pending / stale relative to the code - **Spike state** from `ln-spike` — the question, what was tried, partial findings, verdict if reached - **Review findings** from `ln-review` — **ALL findings, not just the one being acted on.** Review debt is critical context. Name every finding, its status (addressed / in-progress / deferred), and any remaining implications. A fresh thread that only knows about the active finding will lose track of deferred review debt. - **Refactor state** from `ln-refactor` — commit sequence, target structure, and any constraints on safe ordering diff --git a/.agents/skills/ln-oracles/SKILL.md b/.agents/skills/ln-oracles/SKILL.md index c636a97e..121e7faf 100644 --- a/.agents/skills/ln-oracles/SKILL.md +++ b/.agents/skills/ln-oracles/SKILL.md @@ -1,7 +1,7 @@ --- name: ln-oracles -description: "Design verification strategy: diagnose observability, select oracle families, map to loop tiers, surface blind spots. Use after ln-plan when slices need oracle design — especially for LLM, visual, or compositional work — or when verification coverage has drifted." -argument-hint: "[slices to design oracles for, or 'all' for full reassessment]" +description: "Design verification strategy: diagnose observability, select oracle families, map to loop tiers, surface blind spots. Use after ln-plan when frontier items or scoped slices need oracle design — especially for LLM, visual, or compositional work — or when verification coverage has drifted." +argument-hint: "[frontier items or scoped slices to design oracles for, or 'all' for full reassessment]" --- # Ln Oracles @@ -18,9 +18,9 @@ Read the [diagnostic framework](assets/diagnostic-framework.md) and [oracle taxo ## Input -The slices to design oracles for: $ARGUMENTS +The frontier items or scoped slices to design oracles for: $ARGUMENTS -Read `memory/SPEC.md` (invariants, assumptions, decisions, verification design) and `memory/PLAN.md` (slices, acceptance criteria). If `memory/SPEC.md` already has a §Verification Design section, this is an update -- read it as prior state to evolve, not preserve uncritically. +Read `memory/SPEC.md` (invariants, assumptions, decisions, verification design) and `memory/PLAN.md` (frontier definitions, sequencing, acceptance criteria). If `memory/SPEC.md` already has a §Verification Design section, this is an update -- read it as prior state to evolve, not preserve uncritically. ## Procedure @@ -34,7 +34,7 @@ Score **Observability**, **Reproducibility**, and **Controllability** (see the [ ### 2. Extract verification claims -From `memory/SPEC.md` invariant bundles, acceptance criteria, and `memory/PLAN.md` slice definitions -- list what must be proved. Distinguish: +From `memory/SPEC.md` invariant bundles, acceptance criteria, `memory/PLAN.md` frontier definitions, and any in-hand scope-card slices -- list what must be proved. Distinguish: - **Structural claims** (schema conformance, DB round-trips, type safety) -- oracle-able programmatically - **Behavioral claims** (LLM output quality, UX judgment) -- require human assessment or statistical thresholds @@ -56,9 +56,9 @@ Assign each selected oracle to inner (ms, agent-autonomous), middle (seconds-min **Grill**: For middle-loop oracles that require external resources (API calls, fixtures), ask: how will fixtures be created? What bootstraps ground truth? Is single-shot measurement sufficient or do we need multi-run variance? -### 5. Design per-slice verification approach +### 5. Design per-frontier / per-slice verification approach -For each in-scope slice in `memory/PLAN.md`, specify: which oracles apply, what they prove, and which loop tier they belong to. This becomes the `**Verification approach**` annotation on each slice. +For each in-scope frontier item in `memory/PLAN.md`, specify: which oracles apply, what they prove, and which loop tier they belong to. This becomes the `Verification` annotation in the frontier definition. If a scope-card slice is already available, add slice-level oracle notes there without promoting detailed card history into `memory/PLAN.md`. **Grill**: For each slice, ask: does this oracle strategy cover the slice's acceptance criteria? What's the gap between "oracle says pass" and "slice is actually correct"? @@ -78,15 +78,16 @@ Update `memory/SPEC.md` §Verification Design: - **Design notes** -- project-specific oracle design decisions (e.g. observer history projection, fixture bootstrapping strategy) - **Acknowledged Blind Spots** -- table with blind spot, reason, mitigation, and revisit trigger -Update `memory/PLAN.md` per-slice annotations: +Update `memory/PLAN.md` frontier annotations: -- Add `**Verification approach**` line to each in-scope slice with oracle family, loop tier, and cross-reference to `memory/SPEC.md` sections +- Add or refresh the `Verification` line in each in-scope frontier definition with oracle family, loop tier, and cross-reference to `memory/SPEC.md` sections +- Keep slice-level oracle detail in the current `ln-scope` card or `memory/CARDS.md` queue unless it changes the frontier definition ### Cross-reference integrity After writing, verify: - Every `memory/SPEC.md` invariant has at least one oracle assigned (inner, middle, or outer) -- Every in-scope `memory/PLAN.md` slice has a verification approach annotation +- Every in-scope `memory/PLAN.md` frontier definition has a verification approach annotation - The blind spots section is non-empty - Middle/outer loop oracles cross-reference the invariants or assumptions they prove diff --git a/.agents/skills/ln-plan/SKILL.md b/.agents/skills/ln-plan/SKILL.md index 6f645d66..93dd23a9 100644 --- a/.agents/skills/ln-plan/SKILL.md +++ b/.agents/skills/ln-plan/SKILL.md @@ -1,6 +1,6 @@ --- name: ln-plan -description: "Break a feature or project area into frontier-ordered slices and update `memory/PLAN.md`. Re-run to retire completed work, reorder priorities, or add new items." +description: "Break a feature or project area into frontier items and update `memory/PLAN.md`. Re-run to retire completed work, reorder priorities, or add new items." argument-hint: "[feature or project area to plan]" --- @@ -10,17 +10,34 @@ Plan the **rolling frontier**, not the whole historical timeline. `memory/PLAN.md` is the canonical record of what's next. `docs/archive/PLAN_HISTORY.md` is the only sanctioned archive for retired plan history. `memory/CARDS.md` is the sanctioned derivative queue for multiple prepared scope cards inside one frontier item; it is not canonical planning state. Do not invent other sidecar plan docs, milestone ledgers, or alternate memory locations without explicit permission. -The mature-mode shape is: +## Frontier vs slice vocabulary -- `Active` — ordered work that is open now -- `Next` — near-horizon items, loosely ordered +Use **frontier item** for a named canonical work item in `memory/PLAN.md`. Frontier items are the unit of Linear issue / Graphite branch work and should be vertical enough to establish or unlock a meaningful product or architecture step. + +Use **slice** for the buildable scope card produced by `ln-scope` and implemented by `ln-build`. A slice is often a sub-unit of one frontier item. Several slices may land on the same frontier branch. Do not turn slices into separate PLAN entries unless the frontier itself changes shape, ownership, or dependency ordering. + +The vertical-slicing instinct still applies at planning time: frontier items should cut through the relevant concerns of `memory/SPEC.md` instead of becoming layer-by-layer chores. The term "frontier" names their canonical/branch role; the term "slice" remains reserved for scoped execution. + +## Plan document shape + +Prefer the conflict-resistant mature shape: + +- `Context` — short rolling narrative for re-entry +- `Sequencing` — small, frequently edited ordering/status references by stable frontier id +- `Frontier Definitions` — relatively stable per-frontier definitions keyed by stable id +- `Recently Completed` — last 2-3 completed frontier items only +- `Dependencies` — active / next blocking relationships by stable id only + +Within `Sequencing`, use: + +- `Active` — ordered frontier items open now +- `Next` — near-horizon frontier items, loosely ordered +- `Parallel / Low-conflict` — useful work that can proceed without disturbing the main stack - `Horizon` — future work, lightly shaped -- `Recently Completed` — last 2-3 completed items only -- `Dependencies` — active / next blocking relationships only Archive deeper history to `docs/archive/PLAN_HISTORY.md` instead of keeping it live in `memory/PLAN.md`. -Treat frontier items as branch-sized work, not commit-sized work. If one frontier item will unfold as several consecutive verified commits, keep that execution queue in `memory/CARDS.md` or in session context instead of fragmenting `memory/PLAN.md` into a commit ledger. +Treat frontier items as branch-sized work, not commit-sized work. If one frontier item will unfold as several consecutive verified slices, keep that execution queue in `memory/CARDS.md` or in session context instead of fragmenting `memory/PLAN.md` into a commit ledger. `memory/PLAN.md` may carry at most a lightweight pointer such as `current card queue: memory/CARDS.md`; detailed discretionary sub-slicing belongs in `memory/CARDS.md`. ## Input @@ -32,52 +49,74 @@ If this is a fresh thread or the frontier rationale is unclear, read `HANDOFF.md ## Planning rules +### Stable frontier ids + +Every frontier definition should have a stable lowercase id / slug. Good ids are short and semantic, e.g. `agent-fixture-substrate`, `intent-graph-semantics`, `changeset-ledger`. + +Rules: + +- `Sequencing` references frontier ids; it does not duplicate definition blocks. +- `Frontier Definitions` are keyed by frontier id and should not move just because ordering changes. +- Rename a frontier id only when the identity of the work changed, not because the title improved. +- Linear issue ids belong in the definition metadata when known; they are not the only stable id. + ### Work-type awareness -Classify each item before deciding how much planning weight it needs. +Classify each frontier item before deciding how much planning weight it needs. | Work type | Planning weight | | --- | --- | -| Structural | full slice with `memory/SPEC.md` traceability | -| Bounded feature | objective + acceptance + verification for a slice; add `memory/SPEC.md` links only if durable boundaries change | +| Structural | full frontier definition with `memory/SPEC.md` traceability | +| Bounded feature | objective + acceptance + verification; add `memory/SPEC.md` links only if durable boundaries change | | Hardening | task-level objective + acceptance | | Bugfix | usually do not add to `memory/PLAN.md` unless it changes frontier priority | | Refactor | route through `ln-refactor` unless it is itself frontier work | ### Anti-fragmentation -Create a new item only when it introduces at least one of: +Create a new frontier item only when it introduces at least one of: 1. a new lifecycle seam 2. a new transport or persistence seam 3. a new workflow entry / exit behavior 4. a meaningful unblocker for forward progress +5. a distinct dependency / branch boundary that should be tracked independently Do not fragment the plan for minor action/status variants or ordinary follow-through inside a settled seam. Do not split one frontier item into several new PLAN entries just because execution will require several scope cards or commits. Only split when the frontier itself changes shape, ownership, or dependency ordering. +### Sequencing vs definition edits + +When priorities change, edit `Sequencing` first. Do not move or rewrite frontier definitions merely to reorder work. + +When the meaning, acceptance, verification, traceability, or design-doc references of a frontier changes, edit its `Frontier Definitions` entry. + +When a frontier completes, remove it from `Sequencing`, add a terse `Recently Completed` entry, and archive older completion history if needed. Keep the definition only if it still carries live rationale for nearby work; otherwise archive/retire it. + ### Epistemic horizon -If live low-confidence assumptions block downstream work, stop the plan at that boundary. Plan spikes or thinner proving steps, not fantasy certainty. +If live low-confidence assumptions block downstream work, stop the plan at that boundary. Plan spikes or thinner proving frontier items, not fantasy certainty. ## Procedure -1. Read `memory/PLAN.md` if it exists. Retire or archive stale completed material into `docs/archive/PLAN_HISTORY.md`. +1. Read `memory/PLAN.md` if it exists. Identify existing frontier ids and retire/archive stale completed material into `docs/archive/PLAN_HISTORY.md`. 2. Read `memory/SPEC.md` if it exists. Pull only the live requirements, assumptions, decisions, and invariants that still constrain forward work. 3. Explore the codebase enough to understand real boundaries. -4. Draft or revise `Active`, `Next`, and `Horizon`. -5. Add `Why now / unlocks` for `Active` or `Next` items when ordering would otherwise be opaque to a fresh thread. -6. Keep `Recently Completed` to 2-3 terse items max. Move older history to `docs/archive/PLAN_HISTORY.md`, not to handoff files or ad hoc notes. -7. Update `Dependencies` to reflect only active / next items. -8. If several commit-sized execution steps are already obvious inside one frontier item, keep them out of `memory/PLAN.md`; they belong in `memory/CARDS.md` or in the active thread as derivative execution detail. +4. Draft or revise `Sequencing` (`Active`, `Next`, `Parallel / Low-conflict`, `Horizon`) by stable frontier id. +5. Draft or revise `Frontier Definitions` only for new or substantively changed frontier items. +6. Add `Why now / unlocks` in a frontier definition when ordering would otherwise be opaque to a fresh thread. +7. Keep `Recently Completed` to 2-3 terse items max. Move older history to `docs/archive/PLAN_HISTORY.md`, not to handoff files or ad hoc notes. +8. Update `Dependencies` to reflect only active / next items, by frontier id. +9. If several commit-sized execution steps are already obvious inside one frontier item, keep them out of `memory/PLAN.md`; they belong in `memory/CARDS.md` or in the active thread as derivative execution detail. ## Traceability Traceability is conditional on structural significance. -- Structural items should name relevant requirements, assumptions, decisions, or invariants from `memory/SPEC.md`. +- Structural frontier items should name relevant requirements, assumptions, decisions, or invariants from `memory/SPEC.md`. - Bounded features and hardening tasks only need SPEC links if they change durable boundaries or depend on a live assumption. +- Scope-card slices inherit traceability from their containing frontier unless `ln-scope` discovers a durable change that must promote back into SPEC/PLAN. ## Output @@ -89,7 +128,7 @@ After writing the plan, present these options to the user (use `tool-ask-questio | # | Label | Target | Why | | --- | ----------------- | ------------ | --- | -| 1 | Scope next item | `ln-scope` | The frontier is clear and ready to scope | +| 1 | Scope next slice | `ln-scope` | The frontier is clear and ready to scope | | 2 | Design oracles | `ln-oracles` | Verification design needs explicit work | | 3 | Grill it more | `ln-grill` | Planning surfaced unresolved product questions | | 4 | Back to triage | `ln-consult` | Direction needs reassessment | diff --git a/.agents/skills/ln-plan/assets/plan-template.md b/.agents/skills/ln-plan/assets/plan-template.md index b97405a3..e5083836 100644 --- a/.agents/skills/ln-plan/assets/plan-template.md +++ b/.agents/skills/ln-plan/assets/plan-template.md @@ -2,39 +2,62 @@ Created by ln-plan · Read by all skills · Updated by ln-build, ln-sync, and ln-spike. Authority: active frontier, near-horizon ordering, and dependencies that still matter. + Frontier item = canonical plan/Linear/branch unit. + Slice = scoped execution unit from ln-scope/ln-build, often inside one frontier. + Keep this file light. Archive older completed work to docs/archive/PLAN_HISTORY.md. - Only Active / Next items should usually carry detailed traceability. + Edit Sequencing for ordering/status churn; keep Frontier Definitions relatively stable. Do not spread retired work history across handoff files, refactor plans, or ad hoc status notes. --> # Plan -## Active +## Context + +[Short rolling narrative for fresh-thread re-entry: where the product/initiative stands, which arc is active, and what the next coordination bottleneck is.] + +## Sequencing + +### Active + +1. `[frontier-id]` — [status: not-started|in-progress|branch-complete|blocked] — [one-line current state] -1. **[Item name]** — [structural | bounded feature | hardening | bugfix] `[status: not-started|in-progress]` - - Objective: [what this work changes] - - Why now / unlocks: [why this is on the frontier now] - - Acceptance: [observable outcome] - - Verification: [inner / middle / outer summary] - - Traceability: [→ SPEC.md requirement / assumption / decision / invariant if needed] +### Next -## Next +1. `[frontier-id]` — [why it follows the active work] -1. **[Item name]** — [why it follows the active work] - - Why now / unlocks: [what this prepares or depends on] +### Parallel / Low-conflict -## Horizon +- `[frontier-id]` — [why it can proceed independently] -- [Future item, intentionally loose] +### Horizon + +- `[frontier-id]` — [future item, intentionally loose] + +## Frontier Definitions + +### frontier-id + +- **Name:** [Human-readable frontier name] +- **Linear:** [FE-XXX if known, or `unassigned`] +- **Kind:** [structural | bounded feature | hardening | bugfix | refactor] +- **Status:** [not-started | in-progress | branch-complete | blocked | done] +- **Objective:** [what this frontier changes] +- **Why now / unlocks:** [why this belongs on the frontier and what it unlocks] +- **Acceptance:** [observable frontier-level outcome] +- **Verification:** [inner / middle / outer summary] +- **Traceability:** [→ SPEC.md requirement / assumption / decision / invariant if needed] +- **Design docs:** [links if relevant] +- **Current execution pointer:** [optional: `memory/CARDS.md` or next intended scope card; omit when not needed] ## Recently Completed -- [YYYY-MM-DD] [item] — Done: [shipped outcome]. Verified: [command / manual step]. Watch: [residual risk or none]. -- [YYYY-MM-DD] [item] — Done: [shipped outcome]. Verified: [command / manual step]. Watch: [residual risk or none]. +- [YYYY-MM-DD] `[frontier-id]` — Done: [shipped outcome]. Verified: [command / manual step]. Watch: [residual risk or none]. +- [YYYY-MM-DD] `[frontier-id]` — Done: [shipped outcome]. Verified: [command / manual step]. Watch: [residual risk or none]. Older history: `docs/archive/PLAN_HISTORY.md` ## Dependencies -``` -[ASCII diagram of blocking relationships among Active / Next items] +```text +[ASCII diagram of blocking relationships among Active / Next frontier ids] ``` diff --git a/.agents/skills/ln-prototype/SKILL.md b/.agents/skills/ln-prototype/SKILL.md new file mode 100644 index 00000000..a6ce915d --- /dev/null +++ b/.agents/skills/ln-prototype/SKILL.md @@ -0,0 +1,106 @@ +--- +name: ln-prototype +description: "Throwaway design probe for logic, state models, UI variations, and affordances before production work. Use when the user wants to prototype, sanity-check a model, make something playable, compare UI directions, or explore a design before ln-spec/ln-plan/ln-scope." +argument-hint: "[prototype question or design uncertainty]" +--- + +# Ln Prototype + +A prototype is a disposable answer to one design question. Keep the verdict, not the artifact. + +Use `ln-prototype` when the question needs feel, play, or comparison. Use `ln-spike` when the question is technical feasibility or unknown API behavior. + +## Input + +Prototype question or design uncertainty: $ARGUMENTS + +Orient first: + +1. Read `memory/SPEC.md` if present; use its lexicon and live invariants. +2. Read `memory/PLAN.md` if present; identify whether the prototype serves an existing frontier item. +3. Read `HANDOFF.md` if present. +4. Inspect nearby code only enough to place the prototype where it is understandable and runnable. + +Write a 2-4 bullet orientation note: question, prototype branch, nearest seam/page/module, answer-capture path. + +## Choose one branch + +Ask if ambiguous and the user is present; otherwise state the assumption. + +### Logic prototype + +Use for state, transition, reducer, parser, planner, or workflow questions. Build a tiny interactive terminal app or CLI harness around a portable logic module. + +Good shapes: + +- pure reducer: `(state, action) => state` +- explicit state machine with named states and legal transitions +- small pure functions over plain data +- state-owning module/class only when internal ongoing state is the question + +Keep the shell thin. The logic must not know about prompts, terminal escape codes, stdout, or UI widgets. + +### UI prototype + +Use for layout, interaction, navigation, approval/recovery, inspection, or comparison questions. + +Generate several meaningfully different variants in one local route/page/screen, switchable by URL search param or floating switcher. Prefer adapting an existing page over inventing a playground. Variants should differ by design bet, not skin: name the bet each variant tests. + +## Prototype discipline + +1. **Throwaway from day one.** Name files/routes with `prototype`, `scratch`, or equivalent. Add: `PROTOTYPE — delete or absorb after verdict`. +2. **Near the real seam.** Keep context obvious; avoid public exports unless needed to run it. +3. **One command to run.** Use the repo's task runner and record the exact command. +4. **No persistence by default.** Use memory. If persistence is the question, use clearly wipeable scratch storage. +5. **No production polish.** Skip comprehensive tests, abstractions, analytics, and hardening beyond safe evaluation. +6. **Surface state.** After each logic action or UI variant switch, show relevant inputs, outputs, and state. +7. **One question only.** New questions become follow-up prototypes, spikes, or scope cards. + +## Capture the verdict + +```md +## Prototype Verdict: [question] + +**Branch:** logic | UI +**Command:** [how to run] +**What we tried:** [variants/actions/cases] +**Verdict:** [decision or remaining uncertainty] +**Absorb:** [what production code/spec/plan should inherit] +**Delete:** [prototype files/routes/storage to remove] +``` + +Durability routing: + +- Requirements, assumptions, invariants, or lexicon changed → `ln-spec`. +- Sequencing or frontier changed → `ln-plan`. +- One implementation slice is now obvious → `ln-scope`. +- Human judgment remains pending → record volatile state in `HANDOFF.md`. + +Do not create `CONTEXT.md`, ADRs, or alternate planning docs. Canonical docs are `memory/SPEC.md` and `memory/PLAN.md`. + +## Cleanup + +Finish by stating one of: + +- deleted prototype files +- kept prototype temporarily, with reason and deletion trigger +- absorbed prototype into production through a scoped build + +If prototype files remain, they must be visibly non-production and easy to find. + +## Routing + +After the verdict, present these options to the user (use `tool-ask-question`): + +| # | Label | Target | Why | +| --- | -------------- | ------------ | --- | +| 1 | Revise spec | `ln-spec` | Prototype changed durable understanding | +| 2 | Revise plan | `ln-plan` | Prototype changed sequencing or frontier shape | +| 3 | Scope a slice | `ln-scope` | Prototype answered enough to build | +| 4 | Spike instead | `ln-spike` | The remaining question is technical feasibility | +| 5 | Back to triage | `ln-consult` | Prototype did not settle direction | + +Recommended: **3** when the prototype produced a concrete build direction; **1** when it changed the model. + +--- +*Adapted from [mattpocock/skills/engineering/prototype](https://github.com/mattpocock/skills/tree/main/skills/engineering/prototype).* diff --git a/.agents/skills/ln-review/SKILL.md b/.agents/skills/ln-review/SKILL.md index 67d57842..6fefc8fb 100644 --- a/.agents/skills/ln-review/SKILL.md +++ b/.agents/skills/ln-review/SKILL.md @@ -8,6 +8,8 @@ argument-hint: "[area of codebase to review, or 'recent' for recently changed fi Explore the codebase. Surface structural improvement opportunities. Be opinionated. +Use the repo's pre-release posture: reward conceptual clarity over compatibility scaffolding, and treat unnecessary preservation as review debt. Look for stale code, obsolete fixtures, legacy terms, and compatibility paths that should be deleted rather than protected. + ## Input What to review: $ARGUMENTS @@ -18,13 +20,19 @@ If "recent" or unspecified, focus on recently modified files. Apply Ousterhout's depth test: modules should have small interfaces hiding significant complexity. Modules that move together should live together — clusters of small files always used in concert are a single deep module waiting to be extracted. +Use the deletion test for suspected shallow modules: if deleting the module makes complexity vanish, it was pass-through structure; if the same complexity reappears across multiple callers, the module was earning its keep. Prefer depth as leverage/locality, not line-count ratio. + +Treat the interface as the test surface. If callers or tests must reach past the interface to verify important behavior, the module shape is probably wrong. A good seam lets tests and callers cross the same public boundary. + +Apply seam discipline: one adapter usually means a hypothetical seam; two adapters make a real seam. Flag indirection introduced only for imagined future variation, especially when it spreads configuration, mocks, or ordering knowledge into callers. + Check the functional core / imperative shell boundary (Gary Bernhardt, "Boundaries"). Pure functions should stay pure. Flag when a pure function has acquired side effects or a growing parameter list — it has drifted into shell territory. Make invalid states unrepresentable (Yaron Minsky). Split optional fields into distinct types. Use branded types for domain-distinct values. ### Oracle coverage -If `memory/SPEC.md` §Oracle Strategy by Loop Tier exists, check whether recent slices implemented the oracles their persisted `memory/PLAN.md` verification approaches declare. If a full or light scope card is available in session context, use it as a higher-resolution supplement, not the primary source of truth. Look for: +If `memory/SPEC.md` §Oracle Strategy by Loop Tier exists, check whether recent work implemented the oracles declared by the relevant `memory/PLAN.md` frontier definition. If a full or light scope card is available in session context, use it as a higher-resolution slice supplement, not the primary source of truth. Look for: - Scope card promised schema validation → is there a Zod parse in the test? - Scope card promised differential oracle → are there golden master fixtures? @@ -51,7 +59,7 @@ Present findings as numbered candidates: ```md ## Review: [area] -1. **[Description]** — [category: depth|naming|model|coupling] — [impact: low|medium|high] +1. **[Description]** — [category: depth|naming|model|coupling|seam|oracle-coverage] — [impact: low|medium|high] [1-2 sentence explanation and suggested action] 2. ... diff --git a/.agents/skills/ln-scope/SKILL.md b/.agents/skills/ln-scope/SKILL.md index 753b342d..1befe791 100644 --- a/.agents/skills/ln-scope/SKILL.md +++ b/.agents/skills/ln-scope/SKILL.md @@ -13,6 +13,8 @@ Define **one** buildable scope card. The card always describes one slice, but it If the target behavior needs "and", split it. +Apply the repo's pre-release posture while scoping: prefer correcting the model and regenerating fixtures over preserving accidental compatibility, unless live docs or the user require migration support. Include deletion/retirement work in the slice when obsolete code, data, or terminology would otherwise linger. + ## Input The behavior to deliver: $ARGUMENTS @@ -21,9 +23,9 @@ Orient before weighting. If `memory/SPEC.md` exists, use its lexicon and respect its live invariants. -If `memory/PLAN.md` exists, check whether the named work is already in `Active`, `Next`, or `Horizon`. +If `memory/PLAN.md` exists, check whether the named work is already represented as a frontier item in `Sequencing` (`Active`, `Next`, `Parallel / Low-conflict`, or `Horizon`) and `Frontier Definitions`. -Treat the containing `memory/PLAN.md` frontier item as the Linear-issue / branch boundary. Here, a frontier item means the plan-level work item itself, not the scope card you are about to write. Your scope card may narrow that frontier item into the next buildable sub-slice, but scope-card granularity alone does **not** imply a new issue or branch. Only route to `ln-plan` for new plan items when the frontier itself must be split or reordered. +Treat the containing `memory/PLAN.md` frontier item as the Linear-issue / branch boundary. Here, a frontier item means the canonical plan item, preferably keyed by a stable frontier id in `Frontier Definitions`, not the scope card you are about to write. Your scope card may narrow that frontier item into the next buildable slice, but scope-card granularity alone does **not** imply a new issue or branch. Only route to `ln-plan` for new frontier items when the frontier itself must be split or reordered. If this is a fresh thread or an unfamiliar area, also read `HANDOFF.md` if present. Read `docs/archive/PLAN_HISTORY.md` only if the frontier rationale or touched area is still unclear. @@ -31,7 +33,7 @@ Write a 2-4 bullet orientation note naming the containing seam, the relevant fro Do not create new planning documents or scratch scope files without explicit permission. The canonical planning state remains `memory/SPEC.md` and `memory/PLAN.md`. The sanctioned derivative exception is `memory/CARDS.md`, which may hold several prepared scope cards for one frontier item while that execution queue is still live. -If scoping reveals that one frontier item needs multiple sequential sub-slices, keep them nested under that same frontier item unless the plan-level frontier must change. Do not silently turn sub-slices into separate tracker / branch work items. +If scoping reveals that one frontier item needs multiple sequential slices, keep them nested under that same frontier item unless the plan-level frontier must change. Do not silently turn slices into separate tracker / branch work items. ## Prepared card queue @@ -159,7 +161,7 @@ Canonical reconciliation is **mandatory**; durable updates are **conditional**. - Full scope card: update `memory/SPEC.md` / `memory/PLAN.md` as needed during or after scoping. - Light scope card: run the promotion checklist explicitly. If it stays light, canonical reconciliation may be a no-op; if it promotes, reconcile the durable change before build. -- Multi-card queue: keep the queue itself in `memory/CARDS.md`, but do not mirror those queued cards into `memory/PLAN.md` unless the frontier item itself changes. +- Multi-card queue: keep the queue itself in `memory/CARDS.md`, but do not mirror those queued slice cards into `memory/PLAN.md` unless the frontier item itself changes. At most, add a lightweight `Current execution pointer` in the frontier definition. When adding or updating an assumption, apply the same-item test first: diff --git a/.agents/skills/ln-spec/SKILL.md b/.agents/skills/ln-spec/SKILL.md index 103912f9..3ef76204 100644 --- a/.agents/skills/ln-spec/SKILL.md +++ b/.agents/skills/ln-spec/SKILL.md @@ -32,11 +32,11 @@ Write or update `memory/SPEC.md` following the [spec template](assets/spec-templ ### Verification Design boundary -ln-spec owns the **inner loop** of verification design: verification commands, verification policy, and inner-loop oracle items (type checks, fast unit tests, linting). Middle and outer loop oracle strategy, diagnostic assessment, and blind spots are owned by `ln-oracles`. Not every slice requires a full oracle-design pass, but slices involving LLM behavior, visual rendering, or compositional/system-level claims should route through `ln-oracles` before implementation. When writing or updating §Verification Design, preserve any content written by ln-oracles (§Verification Stance, §Diagnostic Assessment, §Oracle Strategy middle/outer tiers, §Design notes, §Acknowledged Blind Spots). +ln-spec owns the **inner loop** of verification design: verification commands, verification policy, and inner-loop oracle items (type checks, fast unit tests, linting). Middle and outer loop oracle strategy, diagnostic assessment, and blind spots are owned by `ln-oracles`. Not every scoped slice requires a full oracle-design pass, but frontier items or slices involving LLM behavior, visual rendering, or compositional/system-level claims should route through `ln-oracles` before implementation. When writing or updating §Verification Design, preserve any content written by ln-oracles (§Verification Stance, §Diagnostic Assessment, §Oracle Strategy middle/outer tiers, §Design notes, §Acknowledged Blind Spots). ### Traceability -If `memory/PLAN.md` exists, verify that changed assumptions and decisions still align with affected slices. If it does not exist yet, close the reference chain as far as current artifacts allow: assumptions should still name dependent decisions and validation approaches, and slice links can be added later by `ln-plan`. +If `memory/PLAN.md` exists, verify that changed assumptions and decisions still align with affected frontier items. If it does not exist yet, close the reference chain as far as current artifacts allow: assumptions should still name dependent decisions and validation approaches, and frontier links can be added later by `ln-plan`. ### Weight management @@ -66,9 +66,9 @@ Large cleanup is `ln-sync` work. When writing or patching, keep the touched area Every amendment must close its reference chain as far as the current lifecycle stage allows. After editing, verify: -- **New assumption** → has: dependent decision(s), validation approach, and implicated slice(s) in `memory/PLAN.md` **if `memory/PLAN.md` already exists** +- **New assumption** → has: dependent decision(s), validation approach, and implicated frontier item(s) in `memory/PLAN.md` **if `memory/PLAN.md` already exists** - **New decision** → has: dependent assumption(s), supersession note -- **New invariant** → has: establishing slice in `memory/PLAN.md` **if known**, protecting test (or `manual (outer loop)`), proved decision +- **New invariant** → has: establishing frontier item in `memory/PLAN.md` **if known** (or scoped slice if already defined), protecting test (or `manual (outer loop)`), proved decision - **New constraint** → has: rationale for exclusion - **New inner-loop oracle item** → names the invariant(s) it protects @@ -78,7 +78,7 @@ After filing the spec, present these options to the user (use `tool-ask-question | # | Label | Target | Why | | --- | ---------------- | ------------- | ------------------------------------------------- | -| 1 | Plan slices | `ln-plan` | Spec is complete, break it into slices | +| 1 | Plan frontier | `ln-plan` | Spec is complete, break it into frontier items | | 2 | Design oracles | `ln-oracles` | Spec needs middle/outer loop verification design | | 3 | Grill it more | `ln-grill` | Spec has gaps that need deeper understanding | | 4 | Back to triage | `ln-consult` | Direction needs reassessment | diff --git a/.agents/skills/ln-spec/assets/spec-template.md b/.agents/skills/ln-spec/assets/spec-template.md index bd362240..cf142dd4 100644 --- a/.agents/skills/ln-spec/assets/spec-template.md +++ b/.agents/skills/ln-spec/assets/spec-template.md @@ -3,7 +3,7 @@ Authority: requirements, constraints, assumptions, decisions, invariants, domain language, verification strategy. When re-running ln-spec: read this file first, preserve existing content, evolve sections that need change. - Cross-referenced by PLAN.md slices and spikes via §-prefixed section links. + Cross-referenced by PLAN.md frontier items and scoped slices via §-prefixed section links. Together with PLAN.md, this is the only canonical planning state; do not create sidecar spec ledgers without explicit permission. --> # [Project Name] @@ -19,7 +19,7 @@ ## Requirements + Each numbered for cross-reference from PLAN.md frontier items / scoped slices. --> 1. [Requirement] 2. ... @@ -28,15 +28,15 @@ + When invalidated: record in §Decisions, flag implicated frontier items in PLAN.md. --> -| # | Assumption | Confidence | Status | Dependent decisions | Implicated slices | Validation approach | +| # | Assumption | Confidence | Status | Dependent decisions | Implicated frontier items | Validation approach | | --- | ------------ | --------------- | -------------------------- | ------------------- | ----------------- | ------------------- | -| A1 | [hypothesis] | low/medium/high | open/validated/invalidated | [→ §Decisions #N] | [→ PLAN.md slice] | [how to falsify] | +| A1 | [hypothesis] | low/medium/high | open/validated/invalidated | [→ §Decisions #N] | [→ PLAN.md frontier id] | [how to falsify] | ## Decisions @@ -52,7 +52,7 @@ Once established, must not regress. Each links to the decision it proves and the tests that protect it. Established by ln-build/ln-spike traceability. - Referenced by PLAN.md slices (to establish / to respect). --> + Referenced by PLAN.md frontier items or scoped slices (to establish / to respect). --> | # | Invariant | Established by | Protected by | Proves | | --- | -------------- | -------------- | ------------ | ----------------- | @@ -66,13 +66,13 @@ | Term | Definition | | --------------- | --------------------------------------------------------------------------------------------- | -| **assumption** | A falsifiable belief accepted as true; tracked with confidence and status, linked to decisions and slices | +| **assumption** | A falsifiable belief accepted as true; tracked with confidence and status, linked to decisions and frontier items / scoped slices | | **decision** | A recorded choice that resolves a question; ordered, with supersession chain | | **invariant** | A structural property proven by implementation and protected by tests; must not regress | | **requirement** | A capability the system must provide | | **slice** | A thin end-to-end tracer-bullet path through all integration layers | | **spike** | A time-boxed throwaway investigation to answer one hard question | -| **phase** | A temporal grouping of slices and spikes in PLAN.md | +| **phase** | A temporal grouping of frontier items / scoped slices and spikes in PLAN.md | | **[Term]** | [Definition] | ## Verification Design @@ -94,7 +94,7 @@ ### Verification Policy - + P1[Path 1: deterministic] + M[Knowledge item changes
(direct edit, changeset apply,
review acceptance)] --> P1[Path 1: deterministic] M --> P2[Path 2: observer pass] P1 --> KE[Look up existing
knowledge_edges
(depends_on, derived_from,
constrains, refines, verifies)] KE --> RE1[Insert reconciliation_need
per affected pair
kind = 'supersedes' / 'needs_confirmation'] @@ -280,13 +293,13 @@ flowchart TD ### 5.2 Resolution -When the queue is resolved (by user, agent, or both), the matching `reconciliation_need` rows transition `open → resolved` and pick up a `resolved_at` timestamp. The actual resolution moves — accept a proposed change set, edit the target item, mark the issue irrelevant — produce knowledge-item mutations and (in time) patches. Those are not modelled here; they go through the same paths everything else does. +When the queue is resolved (by user, agent, or both), the matching `reconciliation_need` rows transition `open → resolved` and pick up a `resolved_at` timestamp. The actual resolution moves — accept a proposed changeset, edit the target item, mark the issue irrelevant — produce knowledge-item mutations and, once FE-701 lands, changesets. Those are not modelled here; they go through the same paths everything else does. ### 5.3 What this is *not* - Not a workflow state. Reconciliation is a graph signal, not a phase. `phase_outcome` is the workflow state primitive and is unchanged. -- Not a patch. `reconciliation_need` records *that* an issue exists; it does not describe *what* should change. The proposed change is a separate artefact: today in-memory in the patch-list UI, durable in the patch ledger when it lands. -- Not an audit log of edits. `turn_knowledge_item` and (later) the patch ledger own that. +- Not a changeset. `reconciliation_need` records *that* an issue exists; it does not describe *what* should change. The proposed change is a separate artifact: historically in-memory in the patch-list UI, durably in the changeset ledger when it lands. +- Not an audit log of edits. `turn_knowledge_item` and (later) the changeset ledger own that. ## 6. Migration @@ -304,7 +317,7 @@ Drizzle / SQLite. One ordered migration, columns added before the dependent colu - Backfill: for each spec, set `primary_chat_id` to the interview chat created in step 1. 4. **0017_reconciliation_need.sql** - Create `reconciliation_need` table with the partial unique index from §3.4. - - Include `caused_by_turn_id` now and nullable `caused_by_patch_id` as a patch-ledger placeholder. + - Include `caused_by_turn_id` now and nullable historical `caused_by_patch_id` as a future changeset-ledger placeholder. Code changes paired with migrations: @@ -321,13 +334,13 @@ No data loss. Every existing turn lands inside the interview chat of its spec; e ## 7. Out of scope (acknowledged adjacents) -- **Patch ledger.** Typed semantic patches with `prev_value` / `value` and explicit provenance, replacing the in-memory patch-list model. This RFC creates room for the ledger by separating chat from spec, but does not introduce the ledger itself. +- **Changeset ledger.** Typed semantic changesets with before/after values and explicit provenance, replacing the in-memory patch-list model. This RFC creates room for the ledger by separating chat from spec, but does not introduce the ledger itself. - **Phase routes / phase as primary UI concept.** The second meeting agreed phase should de-emphasise as UI but stay as a background signal for prompting. UI work is its own RFC; the data model here keeps `turn.phase` exactly as-is. - **Ontology sharpening (`invariant`, `example` as `knowledge_item.kind`).** Discussed in `memory/SPEC.md` D134 and D136. Pure ontology change; no impact on the chat / reconciliation substrate. -- **Decision shape rework.** The meeting concluded a decision should capture both *chosen* and *not chosen*, and that the `option` table can probably go away in favour of in-turn data. Both moves belong with the patch-ledger work; today's `option` table stays. +- **Decision shape rework.** The meeting concluded a decision should capture both *chosen* and *not chosen*, and that the `option` table can probably go away in favour of in-turn data. Both moves belong with changeset-ledger / decision-shape work; today's `option` table stays. - **Phase outcome enum redesign.** The meeting flagged the `proposed | confirmed | superseded` enum as "find a better idea". Out of scope; `phase_outcome` is unchanged. -- **Reconciliation agent loop.** Who reads `reconciliation_need` rows, in what order, how it presents review sets. Substrate is ready; the agent design is a separate RFC. -- **Side-chat UI changes for multi-thread.** Today may ship a single side-chat-per-spec through an in-memory patch-list surface; the `chat` table accommodates many but the UI can continue to render one until persistent side-chat UX catches up. User-surface version labels from older UI design docs are independent of substrate Phase 1 / 2 / 3 / 4 — see §10 for the mapping. +- **Reconciliation runtime.** Who reads `reconciliation_need` rows, in what order, how it presents review affordances, and how accepted resolutions become changesets. Substrate is ready; the runtime design is in `CONVERSATIONAL_WORKSPACE_RUNTIME.md`. +- **Side-chat UI changes for multi-thread.** Historical UI could ship a single side-chat-per-spec through an in-memory patch-list surface; the `chat` table accommodates many but the future user surface may be child chats, a thread table, or UI-rendered in-stream threads. User-surface version labels from older UI design docs are independent of substrate Phase 1 / 2 / 3 / 4 — see §10 for the historical mapping. ## 8. Verification stance @@ -346,10 +359,10 @@ Manual: spin up an existing spec database (a current `.brunch/` fixture), run mi ## 9. Open questions - **`turn.specification_id` retention.** Phase 1 intentionally keeps it as a softer migration: existing spec-scoped reads keep working while new writes populate `chat_id` and assertions prove both pointers agree. The end-state cleanup should drop it once hot paths and tests read ownership through `chat_id`, unless profiling proves the denormalized field pays for itself. -- **Side-chat `chat.parent_turn_id` or anchor item.** A side-chat is started *from* a graph item. Should the `chat` row record the anchor item id? Default proposal: don't model it on `chat`; use a later `chat_focus` table when durable focus is wanted. +- **Side-chat/thread focus.** A side conversation is started *from* a graph item. Should focus live on `chat`, a later `chat_focus` table, a new thread table, or thread-context rows? Historical default: don't model it on `chat`; current runtime synthesis leaves this to the thread/context substrate decision. - **Reconciliation `reason` shape.** Free text in V1. Once the reconciliation agent ships, `reason` may want to be structured (template id + slots). Default proposal: stay free-text until the agent design forces a shape. - **Reconciliation cascade-on-resolve.** When a `supersedes` need resolves, does that ever fan out into new reconciliation needs (because the resolution itself is a mutation)? Yes — and that is exactly the reentrancy point Lu flagged in the second meeting. Substrate already handles it: any mutation re-runs path 1 + path 2. The agent decides whether to bundle resolution into one review set or accept a follow-up cycle. No substrate change needed. -- **`option` table fate.** Meeting tentatively concluded the table can go away in favour of in-turn data. Out of scope here; tracked alongside the patch-ledger / decision-shape work. +- **`option` table fate.** Meeting tentatively concluded the table can go away in favour of in-turn data. Out of scope here; tracked alongside changeset-ledger / decision-shape work. - **`phase_outcome` enum redesign.** Tracked alongside the de-emphasise-phase-as-UI RFC. - **Multiple `reconciliation_need.kind`s for one pair.** The partial unique index gates only same-kind same-direction. A single source change could legitimately produce both `supersedes` *and* `needs_confirmation` against the same target; allowed by design. Confirm this is intended. @@ -359,15 +372,15 @@ Manual: spin up an existing spec database (a current `.brunch/` fixture), run mi | Phase | Substrate state | Enables (user-surface mapping) | |---|---|---| -| **Phase 1** *(this RFC)* | `chat` table; nullable `turn.chat_id`; `specification.primary_chat_id`; mirrored `chat.active_turn_id`; `reconciliation_need` table with lightweight provenance placeholders. Backfill migrations. New writes populate both legacy and chat pointers. No user-visible change: still one chat per spec, still one rope per chat, side-chat can continue to use an in-memory patch-list surface. | Foundation. Existing side-chat / graph-edit surfaces can ship against today's mutation paths regardless of order. Hard-edit cascade gets a clean reshape once it reads from `reconciliation_need` rather than ad-hoc REVISIT state. Persistent multi-thread side-chat and the architect loop become shippable without waiting on the full patch ledger. | -| **Phase 2** | Side-chat persistence: side-chat threads write `chat` rows with `kind = 'side_chat'` and persist their turns. Multiple side-chats per spec become possible at the data layer. | Persistent side-chat history and old-thread UI can activate. | -| **Phase 3** | Reconciliation agent loop reads `reconciliation_need` queue, presents review sets through the same patch list as the side-chat. | Side-chat V3 hard-edit cascade ships against the reconciliation agent (replaces the REVISIT modal). Architect loop's review surface inherits the same machinery. | -| **Phase 4** *(later)* | Patch ledger lands. `reconciliation_need.caused_by_patch_id` becomes populated for patch-caused needs. Decision-shape rework, option-table removal, and phase-outcome enum redesign happen here. | Architect loop's typed-patch version. Item versioning. Cross-surface undo / time-travel. | +| **Phase 1** *(this RFC; shipped)* | `chat` table; nullable `turn.chat_id`; `specification.primary_chat_id`; mirrored `chat.active_turn_id`; `reconciliation_need` table with lightweight provenance placeholders. Backfill migrations. New writes populate both legacy and chat pointers. No user-visible change: still one chat per spec, still one rope per chat, side-chat can continue to use an in-memory patch-list surface. | Foundation. Existing side-chat / graph-edit surfaces can ship against today's mutation paths regardless of order. Hard-edit cascade gets a clean reshape once it reads from `reconciliation_need` rather than ad-hoc REVISIT state. Persistent multi-thread side-chat and the architect loop become shippable without waiting on the full changeset ledger. | +| **Phase 2** *(historical substrate option)* | Side-chat persistence: side-chat threads write `chat` rows with `kind = 'side_chat'` and persist their turns. Multiple side-chats per spec become possible at the data layer. | Persistent side-chat history and old-thread UI could activate, unless the runtime track chooses child chats, a separate thread table, or UI-rendered threads. | +| **Phase 3** *(historical staging)* | Reconciliation agent loop reads `reconciliation_need` queue, presents review sets through the same patch-list-style surface as the side-chat. | V3.1 has shipped classifier output in the Pending review bridge surface; the future target is a reconciliation thread in the unified runtime. | +| **Phase 4** *(later; current name FE-701 changeset ledger)* | Changeset ledger lands. Reconciliation needs gain changeset-backed cause/resolution provenance. Decision-shape rework, option-table removal, and phase-outcome enum redesign may happen here or in adjacent slices. | Architect-loop proposals, item versioning, and cross-surface undo / time-travel become possible through changeset history. | ## 11. Traceability - **Replaces** the implicit "one rope per spec" assumption baked into `turn.specification_id` and `specification.active_turn_id`. -- **Unblocks** the patch ledger, the architect / generator loop horizon item, and persistent multi-chat side-chat history. +- **Unblocks** the changeset ledger, the architect / generator loop horizon item, and persistent multi-chat / thread history. - **Bounded by** D113 (no second durable workflow model — `chat` is *not* workflow state, it is a conversation-thread substrate; workflow state stays on `phase_outcome`). - **Reuses** existing `knowledge_item`, `knowledge_edge`, `turn_knowledge_item`, `option`, `phase_outcome`, `annotation` schemas as-is. -- **References** `memory/SPEC.md` decisions D135, D137, and D138 plus `docs/design/PATCH_LEDGER.md` for the deeper semantic mutation ledger. Supersedes older side-chat substrate assumptions while remaining compatible with the user-facing side-chat surface. +- **References** `memory/SPEC.md` decisions D135, D137, and D138 plus `docs/design/PATCH_LEDGER.md` for deeper semantic mutation history pressure. Supersedes older side-chat substrate assumptions while remaining compatible with the user-facing side-chat surface. diff --git a/docs/design/PATCH_LEDGER.md b/docs/design/PATCH_LEDGER.md index ea6ff254..357a643b 100644 --- a/docs/design/PATCH_LEDGER.md +++ b/docs/design/PATCH_LEDGER.md @@ -1,9 +1,23 @@ # Patch Ledger and Reconciliation -> Status: working design proposal. +> Status: **historical design pressure** — retained for semantic mutation history, reconciliation bases, target ordering, and phase-two ledger rationale. Future-facing schema and operation vocabulary is **changeset/change**, not patch/patch_change; the consolidated runtime concept lives in [CONVERSATIONAL_WORKSPACE_RUNTIME.md](./CONVERSATIONAL_WORKSPACE_RUNTIME.md). > Date: 2026-05-05. > Scope: Brunch runtime product persistence, not the file-backed development registry explored elsewhere. +## How to read this after the changeset vocabulary shift + +This document predates the final vocabulary choice. Treat it as an algorithm and rationale source, not as a naming authority. + +| Historical wording here | Current wording / authority | +|---|---| +| `patch` | `changeset` — one atomic semantic mutation bundle. | +| `patch_change` | `change` — one atomic operation inside a changeset. | +| `caused_by_patch_id`, `resolved_by_patch_id` | Future changeset-backed cause/resolution fields; final column names should be chosen by the FE-701 changeset-ledger design. | +| Patch list / reconciliation review set | Historical review-surface framing. Current runtime synthesis routes proposals through proposal turns and accepted changesets. | +| Target ordering and reconciliation bases | Still useful algorithmic pressure. Preserve these concepts when implementing reconciliation threads or graph-review repairs. | + +Do not introduce new schema, capability contracts, or operation ids with `patch` / `patch_change` unless deliberately referring to this historical design. + ## Why this note exists Brunch is moving from a single interview transcript toward an intent-graph workspace. A specification can now plausibly include: @@ -18,20 +32,20 @@ Brunch is moving from a single interview transcript toward an intent-graph works The current persistence model still treats `turn` as the main historical spine: turns belong directly to a `specification`, and knowledge items are linked back to turns through `turn_knowledge_item`. -That works for an interview-led product, but it becomes strained once semantic changes can originate outside the primary conversation. The proposal here is to separate three authorities: +That works for an interview-led product, but it becomes strained once semantic changes can originate outside the primary conversation. The proposal here is to separate three authorities. The original wording used `patch`; current canonical vocabulary uses `changeset` / `change` for that middle authority: ```text chat / turn: conversational provenance and replay -patch: +changeset / change: semantic mutation history for the intent graph reconciliation_need: semantic debt created when a change may affect existing graph truth ``` -The intent graph remains the current semantic truth. The patch ledger records how that truth changed. Reconciliation records what may now need renewed judgment. +The intent graph remains the current semantic truth. The changeset ledger records how that truth changed. Reconciliation records what may now need renewed judgment. ## Current Shape @@ -80,7 +94,7 @@ This means: ## Proposed Concepts -`docs/design/MULTI_CHAT.md` is now the concrete phase-one substrate proposal for chat containers and reconciliation needs. This document remains the deeper design pressure for future semantic mutation history, richer reconciliation targeting, ordering, and patch-backed provenance. +`docs/design/MULTI_CHAT.md` is now the concrete phase-one substrate reference for chat containers and reconciliation needs. This document remains the deeper design pressure for future semantic mutation history, richer reconciliation targeting, ordering, and changeset-backed provenance. ### Chat @@ -132,11 +146,11 @@ The schema should support a primary chat, but should not require the product mod Focus fields should be deferred. A chat may eventually focus on one item, one relation, several reconciliation needs, or a graph neighborhood. That likely wants a later `chat_focus` table rather than early nullable columns on `chat`. -### Turn Patch Anchor +### Turn semantic-state anchor -A turn should know the semantic state that preceded it. +A turn should know the semantic state that preceded it. Historical examples below say `patch`; current implementations should read this as a changeset or semantic-revision anchor. -Proposed addition: +Proposed addition, in this document's historical vocabulary: ```text turn @@ -144,25 +158,25 @@ turn preceding_patch_id ``` -`preceding_patch_id` points to the latest applied patch known to the chat at the moment the turn was created. This gives Brunch a durable historical anchor for reviving old chat threads. +Read `preceding_patch_id` as `preceding_changeset_id` if the FE-701 schema adopts changeset naming. The field points to the latest applied semantic mutation bundle known to the chat at the moment the turn was created. This gives Brunch a durable historical anchor for reviving old chat threads. Example: ```text -Chat C7 last had a turn after Patch P12. -Elsewhere, P13-P18 changed the intent graph. +Chat C7 last had a turn after Changeset C12. +Elsewhere, C13-C18 changed the intent graph. The user returns to C7. The new turn can inject context: "Since the last turn in this chat, these semantic changes happened elsewhere..." ``` -This is especially important once multiple chats can mutate one specification. Without a patch anchor, a dormant side chat can accidentally continue from an obsolete semantic worldview. +This is especially important once multiple chats can mutate one specification. Without a semantic-state anchor, a dormant side chat can accidentally continue from an obsolete semantic worldview. -If the patch ledger is deferred, this field should also be deferred unless Brunch introduces a lightweight semantic revision/checkpoint first. Avoid adding a dangling nullable patch pointer before there is a real patch or revision concept to point at. +If the changeset ledger is deferred, this field should also be deferred unless Brunch introduces a lightweight semantic revision/checkpoint first. Avoid adding a dangling nullable semantic-history pointer before there is a real changeset or revision concept to point at. -### Patch +### Patch *(historical name; now changeset)* -A `patch` is a semantic mutation set against the intent graph. +A `patch` in this document means what current docs call a `changeset`: a semantic mutation set against the intent graph. It is not a workflow event and should not answer questions like "what phase is the user in?" It answers questions like: @@ -172,10 +186,10 @@ It is not a workflow event and should not answer questions like "what phase is t - what previous semantic state did it replace? - what downstream graph truth may now be stale? -Proposed table: +Proposed table, in historical naming: ```text -patch +patch # current name: changeset id specification_id provenance_json @@ -201,7 +215,7 @@ status: Provenance may want to be a discriminated JSON value rather than only an enum plus nullable foreign keys: ```typescript -type PatchProvenance = +type ChangesetProvenance = // historical draft name: PatchProvenance | { kind: 'turn'; turn_id: number; chat_id: number; capture_kind?: 'observer_capture' | 'review_acceptance' } | { kind: 'user_direct_edit'; chat_id?: number; actor_id?: string } | { kind: 'reconciliation_acceptance'; chat_id?: number; review_set_id?: number } @@ -212,9 +226,9 @@ type PatchProvenance = This keeps provenance extensible without adding nullable columns for every initiator shape. The relational columns `initiator_chat_id` and `initiator_turn_id` may still be useful as indexed convenience fields, but they should mirror `provenance_json`, not become a second provenance truth. -`observer_capture` is usually initiated by a chat turn, but patch provenance should not collapse to "chat turn." A turn can initiate a patch; it is not the patch. +`observer_capture` is usually initiated by a chat turn, but changeset provenance should not collapse to "chat turn." A turn can initiate a changeset; it is not the changeset. -### Patch vs Change Naming +### Patch vs Change Naming *(resolved)* The proposed model has two levels: @@ -226,7 +240,7 @@ atomic mutation: one add/update/link/unlink/retire operation inside that unit ``` -Those can be named either way: +The naming choice was still open when this document was written: ```text Option A: @@ -238,7 +252,7 @@ Option B: change ``` -`changeset` / `change` may be the clearer database naming because it avoids overloading "patch" with source-control connotations and because "change" naturally names the atomic unit. Under that naming: +That choice is now resolved in favor of `changeset` / `change` because it avoids overloading "patch" with source-control connotations and because "change" naturally names the atomic unit. Under that naming: ```text changeset: @@ -248,18 +262,18 @@ change: id, changeset_id, operation, target_kind, target_id, before_json, after_json ``` -The design question is not the word. The invariant is that Brunch needs an atomic semantic mutation set containing one or more atomic changes. +The design question is not the word. The invariant is that Brunch needs an atomic semantic mutation set containing one or more atomic changes. The current canonical naming is `changeset` / `change`. -### Patch Change +### Patch Change *(historical name; now change)* -A `patch_change` is one operation inside a patch. +A `patch_change` in this document means what current docs call a `change`: one operation inside a changeset. -Proposed table: +Proposed table, in historical naming: ```text -patch_change +patch_change # current name: change id - patch_id + patch_id # current name: changeset_id operation target_kind target_id @@ -310,7 +324,7 @@ decision D constrains requirement R ```text item B changed, so item A may need review -patch P changed an older premise, so later descendants may need coherence review +changeset C changed an older premise, so later descendants may need coherence review verifier V invalidated criterion C, so requirement R may need review ``` @@ -326,7 +340,7 @@ reconciliation_need status reason caused_by_turn_id - caused_by_patch_id + caused_by_patch_id # historical placeholder; current concept: caused_by_changeset_id created_at resolved_at ``` @@ -343,23 +357,23 @@ status: resolved ``` -This deliberately keeps phase one smaller than the fully expressive model. The first table should represent one directed process obligation from a changed source item to an affected target item, dedupe simultaneously open needs by `(source_item_id, target_item_id, kind)`, and carry enough nullable provenance to be patch-compatible later. +This deliberately keeps phase one smaller than the fully expressive model. The first table should represent one directed process obligation from a changed source item to an affected target item, dedupe simultaneously open needs by `(source_item_id, target_item_id, kind)`, and carry enough nullable provenance to be changeset-compatible later. Future extensions can add: ```text basis / strength -source_patch_id +source_patch_id # current concept: source_changeset_id affected_relation_from_item_id affected_relation_to_item_id affected_relation -resolved_by_patch_id +resolved_by_patch_id # current concept: resolved_by_changeset_id structured reason payload ``` The `affected_relation_*` fields avoid requiring a separate `knowledge_edge.id` migration before this work can start. If `knowledge_edge` later receives a surrogate `id`, `reconciliation_need` can switch to `affected_edge_id`. -`resolved_at` exists in phase one because no-op dismissal and non-patch resolution are useful before the patch ledger exists. Once `resolved_by_patch_id` is available, the timestamp may remain denormalized convenience rather than the only resolution source of truth. +`resolved_at` exists in phase one because no-op dismissal and non-changeset resolution are useful before the changeset ledger exists. Once changeset-backed resolution is available, the timestamp may remain denormalized convenience rather than the only resolution source of truth. ## Reconciliation Bases @@ -391,8 +405,8 @@ Example: ```text The user directly edits Knowledge Item K4. -K4 was last updated by Patch P12. -Later patches P13-P31 created or updated nearby items from a context that may no longer hold. +K4 was last updated by Changeset C12. +Later changesets C13-C31 created or updated nearby items from a context that may no longer hold. Those later descendants receive soft reconciliation needs. ``` @@ -433,7 +447,7 @@ agent attempts reconciliation -> present a reviewable set of reconciliation changes -> user accepts or comments / requests changes -> agent revises and presents the set again - -> accepted changes are applied as a patch + -> accepted changes are applied as a changeset ``` The important difference from ordinary review sets is the agent's first move. Reconciliation should not immediately push every stale item to the user. The agent should attempt to repair, dismiss, or consolidate needs itself when the graph context is sufficient. @@ -466,8 +480,8 @@ Proposed flow: - add clarifying edge or example - ask the user a disambiguating question 9. The user accepts or requests changes. -10. Accepted reconciliation emits a new patch. -11. The accepted patch resolves, dismisses, or supersedes the needs. +10. Accepted reconciliation emits a new changeset. +11. The accepted changeset resolves, dismisses, or supersedes the needs. ``` This mirrors review-set ergonomics without pretending reconciliation is the same as requirements or criteria review. @@ -479,7 +493,7 @@ reconciliation review set v1 -> user requests changes with comments -> agent creates revised review set v2 -> user accepts - -> accepted reconciliation patch is applied + -> accepted reconciliation changeset is applied ``` Rejected or superseded reconciliation proposals should remain explainable provenance, but only accepted reconciliation should mutate the intent graph. @@ -496,7 +510,7 @@ group by affected target sort needs within target by: 1. strength 2. basis - 3. source item / source patch + 3. source item / source changeset 4. creation time build an affected-target graph from semantic relations collapse cycles into strongly connected components @@ -508,26 +522,26 @@ Direction matters. If `Requirement R` depends on `Assumption A`, and `A` changes Cycles should not block reconciliation. They should be collapsed into a single unit and presented as a coupled coherence problem. -If an accepted reconciliation patch changes an upstream target, downstream needs may become superseded or may need to be regenerated from the new patch. The reconciliation loop should therefore treat topological ordering as a work plan, not as a guarantee that one pass resolves every downstream target. +If an accepted reconciliation changeset changes an upstream target, downstream needs may become superseded or may need to be regenerated from the new changeset. The reconciliation loop should therefore treat topological ordering as a work plan, not as a guarantee that one pass resolves every downstream target. ## Can This Be Split Into Two Phases? -Yes, with one caveat: phase one should make `reconciliation_need` future-compatible with patches even if the `patch` table does not exist yet. +Yes, with one caveat: phase one should make `reconciliation_need` future-compatible with changesets even if the `changeset` table does not exist yet. The split is plausible because `chat` and `reconciliation_need` each relieve a current architectural pressure independently: - `chat` creates the missing conversation container below `specification` - `reconciliation_need` creates a product-visible place for staleness and coherence work -- `patch` later upgrades provenance from turn-centered or event-centered records into a true semantic mutation ledger +- `changeset` later upgrades provenance from turn-centered or event-centered records into a true semantic mutation ledger -The caveat is that historical descendance is only approximate before patches exist. Brunch can detect graph-based semantic dependency in phase one. It cannot precisely answer "which later semantic mutations descend from this older state?" until patch history exists. +The caveat is that historical descendance is only approximate before changesets exist. Brunch can detect graph-based semantic dependency in phase one. It cannot precisely answer "which later semantic mutations descend from this older state?" until changeset history exists. ## Phase 1: Multi-Chat Substrate and Reconciliation Need Goal: ```text -Allow multiple chats per specification and introduce durable reconciliation needs without requiring the full patch ledger. +Allow multiple chats per specification and introduce durable reconciliation needs without requiring the full changeset ledger. ``` Schema work: @@ -555,15 +569,15 @@ Phase-one reconciliation causes: ```text caused_by_turn_id = the turn whose observer capture or review action caused the need -caused_by_patch_id = null +caused_by_patch_id = null # historical placeholder for future changeset-backed provenance ``` -`caused_by_kind` is intentionally omitted in the concrete phase-one schema while patches do not exist: `caused_by_turn_id` names turn-caused needs, and `caused_by_patch_id` remains null as a placeholder. +`caused_by_kind` is intentionally omitted in the concrete phase-one schema while changesets do not exist: `caused_by_turn_id` names turn-caused needs, and the historical `caused_by_patch_id` placeholder should be read as future changeset-backed provenance. Phase-one limitations: - no exact before / after semantic diff -- no exact patch chronology +- no exact changeset chronology - no reliable historical descendance beyond turn-linked provenance heuristics - reconciliation can identify affected items, but cannot yet provide a full mutation audit @@ -578,50 +592,50 @@ Phase-one implementation slices: 5. Add deterministic helper to create needs from changed item plus `knowledge_edge` traversal. 6. Surface a minimal reconciliation queue in data loaders or development fixtures. -## Phase 2: Patch Ledger +## Phase 2: Changeset Ledger *(formerly Patch Ledger)* Goal: ```text -Make semantic mutations first-class and use patches as the source of reconciliation cause, audit, and historical descendance. +Make semantic mutations first-class and use changesets as the source of reconciliation cause, audit, and historical descendance. ``` -Schema work: +Schema work, translated to current vocabulary: -- add `patch` -- add `patch_change` -- add `caused_by_patch_id` and `resolved_by_patch_id` foreign keys if they were not enforced in phase one -- optionally add `knowledge_item.last_patch_id` -- optionally add `knowledge_edge.last_patch_id` or give edges surrogate ids +- add `changeset` +- add `change` +- add changeset-backed cause/resolution foreign keys if they were not enforced in phase one +- optionally add `knowledge_item.last_changeset_id` +- optionally add `knowledge_edge.last_changeset_id` or give edges surrogate ids Application work: -- route observer capture through patch creation -- route accepted review outputs through patch creation -- route direct user edits through patch creation -- route reconciliation acceptance through patch creation +- route observer capture through changeset creation +- route accepted review outputs through changeset creation +- route direct user edits through changeset creation +- route reconciliation acceptance through changeset creation - derive `turn_knowledge_item` as provenance compatibility or keep it as a secondary projection -- use patch chronology for historical descendance +- use changeset chronology for historical descendance -Patch application invariant: +Changeset application invariant: ```text -Every semantic change to knowledge graph truth is represented by exactly one applied patch_change inside one applied patch. +Every semantic change to knowledge graph truth is represented by exactly one applied change inside one applied changeset. ``` That invariant should eventually replace "every knowledge item traces to a turn" as the semantic-history rule. -Patch history should make revision counts and previous values straightforward: +Changeset history should make revision counts and previous values straightforward: ```text revision count for item K: - count applied patch_change rows where target_kind = knowledge_item and target_id = K + count applied change rows where target_kind = knowledge_item and target_id = K change history for item K: - applied patch_change rows for K ordered by patch.applied_at, including before_json and after_json + applied change rows for K ordered by changeset.applied_at, including before_json and after_json ``` -The same should hold for knowledge relations. That creates an important schema pressure: `knowledge_edge` needs stable identity if edge revision history is first-class. A composite key can identify the current relation, but it is awkward for history when a relation's source, target, or type changes. Before patch history becomes authoritative for edges, Brunch should either: +The same should hold for knowledge relations. That creates an important schema pressure: `knowledge_edge` needs stable identity if edge revision history is first-class. A composite key can identify the current relation, but it is awkward for history when a relation's source, target, or type changes. Before changeset history becomes authoritative for edges, Brunch should either: - add a surrogate `knowledge_edge.id` - or replace `knowledge_edge` with a stable relation record table @@ -646,15 +660,15 @@ The existing `turn.parent_turn_id` chain remains valid if all current turns in a In phase one, keep `turn_knowledge_item` unchanged. -In phase two, create migration patches only if the audit value is worth the complexity. A low-risk path is: +In phase two, create migration changesets only if the audit value is worth the complexity. A low-risk path is: ```text -one migration patch per specification: - provenance_json = { kind: "migration", migration_id: "patch-ledger-backfill" } - summary = "Backfilled existing knowledge graph before patch ledger introduction" +one migration changeset per specification: + provenance_json = { kind: "migration", migration_id: "changeset-ledger-backfill" } + summary = "Backfilled existing knowledge graph before changeset ledger introduction" ``` -This avoids inventing fake historical patches for every old observer capture. +This avoids inventing fake historical changesets for every old observer capture. ### Existing Knowledge Edges @@ -692,30 +706,30 @@ Phase one invariants: - every reconciliation need belongs to one specification - a reconciliation need's affected item or affected relation belongs to the same specification - `caused_by_turn_id`, when present, points to a turn in the same specification -- `caused_by_patch_id` remains null until patch tables exist +- the changeset-backed cause field remains null until changeset tables exist Phase two invariants: -- every semantic graph mutation is represented by an applied patch change -- every patch belongs to one specification -- every patch change belongs to one patch -- every patch target belongs to the same specification as the patch -- every patch has exactly one provenance kind -- a patch may have chat or turn provenance, but does not require it +- every semantic graph mutation is represented by an applied change +- every changeset belongs to one specification +- every change belongs to one changeset +- every changeset target belongs to the same specification as the changeset +- every changeset has exactly one provenance kind +- a changeset may have chat or turn provenance, but does not require it - hard reconciliation needs must name a concrete affected item or relation -- resolved reconciliation needs should name the patch that resolved or dismissed them when resolution changes graph state +- resolved reconciliation needs should name the changeset that resolved or dismissed them when resolution changes graph state ## Practical Recommendation Do phase one first. -The split is worthwhile because `chat` is a clear foundation for multi-conversation workspaces, and `reconciliation_need` is a useful product concept even before full semantic patch history exists. +The split is worthwhile because `chat` is a clear foundation for multi-conversation workspaces, and `reconciliation_need` is a useful product concept even before full semantic changeset history exists. But phase one should be honest about its limits: - it can support graph-based reconciliation well - it can support soft, heuristic coherence review -- it cannot fully support historical descendance until patches exist +- it cannot fully support historical descendance until changesets exist - it should not imply a complete audit trail The safest phase-one framing is: @@ -723,7 +737,7 @@ The safest phase-one framing is: ```text Introduce chat containers and reconciliation queues. Keep turn-centered provenance for now. -Design reconciliation causes so patch-backed provenance can replace turn-backed provenance later. +Design reconciliation causes so changeset-backed provenance can replace turn-backed provenance later. ``` Then phase two becomes an upgrade of semantic provenance, not a rewrite of the reconciliation product model. @@ -733,7 +747,7 @@ Then phase two becomes an upgrade of semantic provenance, not a rewrite of the r - Should `turn.specification_id` be removed eventually, or kept as a denormalized convenience? - Should `specification.active_turn_id` be removed as soon as `chat.active_turn_id` is stable, or kept as a temporary compatibility mirror? - Should `chat.kind = reconciliation` own one reconciliation review set, or can one reconciliation chat cover multiple sets? -- Should direct user edits create proposed patches first, or applied patches with later reconciliation? +- Should direct user edits create proposed changesets first, or applied changesets with later reconciliation? - Should `knowledge_edge` receive a surrogate `id` before reconciliation targets relations heavily? - What is the first deterministic relation policy for creating reconciliation needs from `knowledge_edge` traversal? -- How noisy is historical descendance in realistic workspaces, and should it be grouped by patch rather than item? +- How noisy is historical descendance in realistic workspaces, and should it be grouped by changeset rather than item? diff --git a/docs/design/README.md b/docs/design/README.md index 30ecdef7..ea29b883 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -10,21 +10,60 @@ Canonical project memory lives in: Use design documents for deeper argumentation, raw synthesis, alternatives, and qualifying principles that are too large for `memory/SPEC.md` or `memory/PLAN.md`. Promote conclusions into canonical memory through the `ln-spec` and `ln-plan` workflows before treating them as roadmap commitments. -Status language: +## Status language -- `raw synthesis / ideation` — broad source material; requires grilling before promotion. +- `source archive / raw synthesis` — broad source material preserved for provenance; active docs may cite it, but it is not live guidance. - `working design proposal` — a shaped proposal that may guide planning, but still needs canonical SPEC / PLAN links. +- `active synthesis` — the current cross-document concept map for a cluster; subsystem/source docs remain useful for details, but this doc owns the combined direction. +- `shipped substrate reference` — an RFC whose first implementation has landed; use it for invariants, migrations, and historical rationale, but check `memory/SPEC.md` / `memory/PLAN.md` for current status. +- `historical design pressure` — still valuable for unresolved questions or algorithms, but terminology or product shape has been superseded. +- `interim backlog` — shaped impulses that are deliberately not in the plan until their triggers fire. +- `future-facing draft` — intentionally deferred architecture map. - `archived` — historical context only; no longer live design guidance. -Current live design proposals: +## Live index -- `MULTI_CHAT.md` — concrete phase-one substrate for chat containers and reconciliation needs. -- `PATCH_LEDGER.md` — deeper semantic mutation history and reconciliation design pressure after the multi-chat substrate. -- `INTENT_SPEC_EVOLUTION.md` — broader intent-spec ontology and progressive checkability synthesis (raw, the source for the more focused docs below). -- `INTENT_GRAPH_SEMANTICS.md` — product-layer ontology, edge taxonomy, relation policy, and progressive-checkability binding. Canonical reference for FE-700. -- `BEHAVIORAL_KERNELS.md` — product-layer behavioral-kernel typology, kernel cards, signal-phrase routing, and the contrastive-question interviewer workflow. Canonical reference for FE-702 kernel probes. -- `DEV_WORKFLOW_EVOLUTION.md` — **dev-layer** trajectory for the `ln-*` skill family, the proposed file-backed spec registry, and the long-horizon convergence between dev and product ontologies. Distinct from the product-layer docs above; not part of `memory/SPEC.md`. -- `DEFERRED_RECONCILIATIONS.md` — interim backlog of shaped product-direction items (SPEC requirements, assumptions, PLAN horizon items, future design docs) that are ready for promotion but deliberately deferred until prerequisite work fires their triggers. Delete the file when all entries have promoted. +### Product ontology and strategy + +| Document | Role | +| --- | --- | +| `INTENT_GRAPH_SEMANTICS.md` | Product-layer ontology, edge taxonomy, relation policy, and progressive-checkability binding. Canonical design reference for FE-700. | +| `BEHAVIORAL_KERNELS.md` | Behavioral-kernel typology, kernel cards, signal-phrase routing, and contrastive-question workflow. Canonical design reference for kernel probes. | +| `SPEC_EVOLUTION_STRATEGIES.md` | FE-705-era synthesis for chat-local strategies, scenario options, graph review, proposal turns, relation directionality, and candidate bundles. Graduated into `memory/SPEC.md` / `memory/PLAN.md`; keep as rationale. | +| `AGENT_MUTATION_SURFACE.md` | Audit of agent-originated/adjoining mutation paths and the capability/changeset boundary needed before agents write durable truth. | + +### Conversational workspace runtime cluster + +Start with `CONVERSATIONAL_WORKSPACE_RUNTIME.md`. The other files in this cluster are retained source/subsystem references; do not read them as independent future roadmaps. + +| Document | Role | +| --- | --- | +| `CONVERSATIONAL_WORKSPACE_RUNTIME.md` | **Active synthesis** for the continuous workspace + unified chat + reconciliation + changeset-ledger concept. Owns the cluster supersession map and current open questions. | +| `MULTI_CHAT.md` | Shipped substrate reference for `chat`, `turn.chat_id`, `specification.primary_chat_id`, and `reconciliation_need`. Phase 2/3/4 rows are historical staging, not current sequence authority. | +| `SIDE_CHAT.md` | User-surface history and phasing for side-chat V1–V3.1, with V4 notes. Patch-list/top-bar and Pending review claims are bridge/history unless reaffirmed by the runtime synthesis. | +| `PATCH_LEDGER.md` | Historical design pressure for semantic mutation history, reconciliation bases, and target ordering. Future-facing vocabulary is `changeset` / `change`; use it for algorithms, not names. | +| `CONTINUOUS_WORKSPACE_HYBRID.md` | Workspace-shell shape exploration; owns the route-alias / workspace-controller / chart-backed-supervisor choice. | + +### Dev process and deferred impulses + +| Document | Role | +| --- | --- | +| `ln-skills/EVOLUTION.md` | Dev-layer trajectory for the `ln-*` skill family, `memory/` ontology, proposed file-backed spec registry, and possible dev/product ontology convergence. Not product SPEC. | +| `DEFERRED_RECONCILIATIONS.md` | Interim backlog for product impulses that are worthy but intentionally gated. Audit before promoting or retiring entries. | + +### Isolated / future-facing notes + +| Document | Role | +| --- | --- | +| `PORTABILITY_BOUNDARIES.md` | Future adapter/hosting/remote-workspace boundary map. | +| `GRAPH_KIND_CHIP_TOGGLE.md` | Standalone graph-view split-button chip proposal; audit against current horizon before implementation. | +| `README.md` | This index and local design-doc policy. | + +### Archived source + +| Document | Role | +| --- | --- | +| `../archive/design/INTENT_SPEC_EVOLUTION.md` | Raw synthesis / ideation source for the May 2026 intent-spec evolution work. Active docs above supersede its conclusions. | Schema reference artifacts are intentionally kept outside this design directory. The canonical generated DBML lives at `docs/schema.dbml` and is derived from `src/server/schema.ts`; do not add parallel `schema.dbml` or `schema.dbdiagram` copies under `docs/design/`. diff --git a/docs/design/SIDE_CHAT.md b/docs/design/SIDE_CHAT.md index 06d55fbb..70298594 100644 --- a/docs/design/SIDE_CHAT.md +++ b/docs/design/SIDE_CHAT.md @@ -1,20 +1,32 @@ # Side-Chat — Design Spec -> Output of brainstorm session 2026-04-30. Subsumes three previously-separate horizon items in `memory/PLAN.md`: graph-launched refinement (D128), trigger-popover composer, and revisit/edit mode (`docs/design/REVISIT_MODULE.md`). +> Output of brainstorm session 2026-04-30. Subsumes three previously-separate horizon items in `memory/PLAN.md`: graph-launched refinement (D128), trigger-popover composer, and revisit/edit mode (the archived revisit-module concept). > -> Status: **proposed** — pending review before transitioning to implementation plan. +> Status: **shipped through V3.1; V4 horizon reference** — V1/V2/V3.0/V3.1 user-surface phasing has landed through FE-674. Keep this doc for shipped side-chat history, V4 notes, and UI rationale; use [CONVERSATIONAL_WORKSPACE_RUNTIME.md](./CONVERSATIONAL_WORKSPACE_RUNTIME.md) for the consolidated future runtime direction. + +## How to read this after V3.1 + +This document is now a shipped-surface and horizon-reference record, not the active runtime synthesis. + +| Claim area | Current reading | +|---|---| +| Popover-to-panel side-chat, pinned context, and brand-halo UI | Shipped/near-shipped V1–V3 surface history and UI rationale. Useful when maintaining the current side-chat panel. | +| Patch list / top-bar staging surface | Historical V1/V2 design language. The durable future is `changeset` / `change`; the long-term user surface moves into in-stream threads per `CONVERSATIONAL_WORKSPACE_RUNTIME.md`. | +| Pending review section | Shipped V3.0/V3.1 bridge surface. Long-term reconciliation absorbs into a target-grouped reconciliation thread. | +| V4a side-chat persistence | Still a plausible substrate step, but now understood as part of the unified chat/thread runtime rather than a standalone tab-strip roadmap. | +| V4b patch ledger / item versioning / architect loop | Horizon. Use current vocabulary: changeset ledger, proposal turns, graph review, and architect proposals through HITL acceptance. | ## 1. Concept & Problem -Today, all interaction with Brunch's spec runs through one long interview thread: a linear back-and-forth in a single phase chat. When the user opens the structured spec view (graph view) and notices something they want to discuss, edit, annotate, or refine, they have no way to act on that item *in place* — they have to navigate back to the chat and try to reintroduce the topic, often without the system understanding which item they're talking about. +At the time this design was written, all interaction with Brunch's spec ran through one long interview thread: a linear back-and-forth in a single phase chat. When the user opened the structured spec view (graph view) and noticed something they wanted to discuss, edit, annotate, or refine, they had no way to act on that item *in place* — they had to navigate back to the chat and try to reintroduce the topic, often without the system understanding which item they meant. -The side-chat adds a second interaction surface: a popover-to-panel chat that opens *from* an item in the structured spec view, with selection-aware context, and that can produce durable changes to the spec through a unified review surface called the **patch list**. +The side-chat added a second interaction surface: a popover-to-panel chat that opens *from* an item in the structured spec view, with selection-aware context, and that can produce durable changes to the spec through the then-current review surface called the **patch list**. The long-term runtime direction now folds this surface into a unified chat/thread stream. **The side-chat subsumes three horizon items:** - **D128 graph-launched refinement** — the disabled `chat-with` placeholder on each row in `-structured-list-view.tsx` is the seam this design activates. - **Trigger-popover composer** (`/` commands, `@` knowledge mentions, `#` phase refs) — folded into the side-chat surface as in-chat affordances. -- **Revisit/edit mode + cascade preview** (`docs/design/REVISIT_MODULE.md`) — the side-chat panel hosts the cascade preview and the secondary-thread walk, replacing the modal in the current REVISIT design. +- **Revisit/edit mode + cascade preview** — the side-chat panel hosts the cascade preview and the secondary-thread walk, replacing the older revisit-module/modal design. ### At a glance — user flow @@ -178,7 +190,7 @@ The top-bar `Apply` button performs **bulk-apply** across all staged patches in ### 4.4 Why this matters -The patch list is **the unifying review surface for all spec mutations**. The same surface the architect loop (§7) will later use to deposit system-generated proposals for HITL review. Designing the side-chat around the patch list now means the architect loop has somewhere to deposit when it ships, with no second review UI to invent. +Historical V1/V2 reading: the patch list was designed as **the unifying review surface for all spec mutations** so later architect-loop proposals would have somewhere to deposit. Current target reading: the review unit is still HITL and batchable, but future durable semantics should be expressed as proposal turns and accepted changesets inside the unified runtime rather than a separate long-lived patch-list surface. ## 5. Edit Patch Routing @@ -210,7 +222,7 @@ When a patch with kind `edit` is applied, the system routes by **two questions i |---|---|---| | **None** | `affectedCount === 0` (item is a graph leaf with no downstream edges) | Apply directly. Single-item content update; brief inline confirmation card in the panel: "Updated `[X]`." | | **Soft** | `1 ≤ affectedCount ≤ 2` AND no anchor or affected item is in an active review set *(active = generated and not yet accepted)* | Apply directly with affected-item context. Patch lands directly; brief inline confirmation lists the affected items: "Updated `[X]`; `[Y]`, `[Z]` may need a refresh." No cascade preview or durable `reconciliation_need` rows. | -| **Hard** | High downstream count, OR any anchor or affected item is in an active review set | **Cascade preview** backed by `reconciliation_need` rows → batch-resolution mode in the side-chat panel (§5.3). The archived REVISIT_MODULE walk is superseded. | +| **Hard** | High downstream count, OR any anchor or affected item is in an active review set | **Cascade preview** backed by `reconciliation_need` rows → batch-resolution mode in the side-chat panel (§5.3). The archived [REVISIT_MODULE](../archive/design/REVISIT_MODULE.md) walk is superseded. | ### 5.2 Confidence model — V1 @@ -288,13 +300,13 @@ Surfacing rules: The side-chat's substrate dependencies have shifted as the multi-chat work landed. Two assumptions are unchanged; one is partly satisfied. -### A71 *(partly satisfied)*: patch / event-stream data model +### A71 *(partly satisfied)*: chat substrate plus semantic mutation ledger -The original framing — `spec → chat → turns` with diff patches as the persistence primitive — is split. In this stack, the `spec → chat → turns` half is supplied by downstack FE-697: a `chat` table, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, and a `reconciliation_need` queue with placeholder `caused_by_patch_id`. The patch ledger half remains horizon work tracked in `docs/design/PATCH_LEDGER.md`. +The original framing — `spec → chat → turns` with diff patches as the persistence primitive — is split. In this stack, the `spec → chat → turns` half is supplied by downstack FE-697: a `chat` table, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, and a `reconciliation_need` queue with a future semantic-mutation cause placeholder. The changeset ledger half remains horizon work tracked historically in `docs/design/PATCH_LEDGER.md` and currently in `memory/PLAN.md` as the semantic changeset ledger. **Implication for V3.** The cascade preview reads `reconciliation_need` rows directly (see §5.3, §13). Side-chat threads themselves stay in-memory through V3 — durable side-chat persistence is MULTI_CHAT.md Phase 2 / V4 and is **not** a V3 prerequisite. -**Implication if the patch ledger lands later:** `reconciliation_need.caused_by_patch_id` becomes populated; resolutions write patches; the in-memory patch list translates to `appendPatch(spec, patch[])`. No user-facing change to V3 surfaces. +**Implication if the changeset ledger lands later:** reconciliation needs gain changeset-backed cause/resolution provenance; resolutions write changesets; the in-memory patch list either retires into proposal-turn state or translates through a compatibility layer. No user-facing change to shipped V3 surfaces is required. ### A72: knowledge-item versioning @@ -304,7 +316,7 @@ History per knowledge item, preserved through edits. Anchors annotations to spec ### A73: architect / generator loop -Captured in §7. The side-chat is *user-driven*; the architect is *system-driven*. Both deposit into the patch list. Designing the side-chat's patch-list surface now means the architect has a review surface ready when it ships. +Captured in §7. The side-chat is *user-driven*; the architect is *system-driven*. Historical design routed both into the patch list; current design routes architect proposals through proposal turns and accepted changesets, with graph review as the safety oracle. ## 9. Phasing @@ -314,8 +326,8 @@ Captured in §7. The side-chat is *user-driven*; the architect is *system-driven | **V2** | Edit (router) · Drill-down · Propose-edge in the patch list. **None** and **Soft** edit tiers apply directly. **Hard** edit defers to a placeholder "feature coming" message. Refine routes through normal turn machinery. | | **V3.0** | Hard-edit apply opens `reconciliation_need` rows from existing graph edges (Path 1, deterministic). Cascade preview surfaces as a `Pending review` section inside the canonical patch-list overlay; **single per-row Resolve action** that idempotently transitions `open → resolved`. The V2 `deferred: true` server response and the "Hard impact — coming in V3 cascade preview" banner are removed. Acceptance Criterion #7 satisfied mechanically. No reconciliation agent. REVISIT modal stays archived. (Note: the original three-action design — `accept-on-target / edit-target / dismiss` — is collapsed to a single Resolve in V3.0 because the open→resolved transition is the same regardless of intent label; V3.1 reintroduces richer kinds via the agent.) | | **V3.1** *(shipped, FE-674 PRs #119–#124)* | Reconciliation classifier writes `agent_status` / `agent_classification` / `agent_proposal` per row. Pending review surface renders chips, Run-agent + polling (`POST /api/specifications/:id/reconciliation-needs/run-agent`), per-row Re-run (`POST /api/specifications/:id/reconciliation-needs/:needId/reset-agent`), per-class actions, and bulk Confirm-all / Apply-all-suggested. Substantive walk lands inside the side-chat panel using pinned-context conversation. Path 2 observer expansion still horizon. | -| **V4a** *(next, FE-675 V4a half)* | Side-chat client persists turns into `chat` / `turn` with `chat.kind='side_chat'`; "Old chats" tab strip activates. | -| **V4b** *(horizon, FE-675 V4b half + FE-701)* | Patch ledger lands. `reconciliation_need.caused_by_patch_id` populates; resolutions write typed patches; item versioning anchors annotations and soft-edit audit. Architect loop deposits into the same patch list. | +| **V4a** *(horizon / runtime-track input)* | Side-chat client persists turns into `chat` / `turn` with `chat.kind='side_chat'`; "Old chats" tab strip activates. Current runtime synthesis may instead render side conversations as in-stream threads. | +| **V4b** *(horizon, FE-675 V4b half + FE-701)* | Changeset ledger lands. Reconciliation needs gain semantic-mutation cause/resolution provenance; item versioning anchors annotations and soft-edit audit. Architect-loop proposals use the same HITL proposal/changeset pathway rather than committing graph truth directly. | ## 10. Verification Stance @@ -414,8 +426,8 @@ V-versions in §9 describe the *user surface*; substrate phases in `docs/design/ | V2 (Edit / Drill-down / Propose-edge, None+Soft tiers) | Phase 1 not required | Shipped against in-memory patch list; hard branch returns `deferred: true`. | | V3.0 *(shipped, FE-674 PRs #115-#118)* | Phase 1 read side | Hard apply writes `reconciliation_need` rows; UI reads the queue. Per-row Resolve / Edit-target / View-source-diff. No agent. | | V3.1 *(shipped, FE-674 PRs #119-#124)* | Phase 3 | Reconciliation classifier writes `agent_status` / `agent_classification` / `agent_proposal` per row. Pending review surface renders `` (six variants), Run-agent button with conditional 1s polling, per-row Re-run on classified/failed rows, per-class actions (`auto-confirm` → Confirm, `auto-edit` → View / Apply / Skip, `substantive` → Open side-chat via `useSideChat().openFor`), bulk Confirm-all (N) and Apply-all-suggested (N) iterating serially. **HTTP:** `POST /api/specifications/:id/reconciliation-needs/run-agent` and `POST /api/specifications/:id/reconciliation-needs/:needId/reset-agent` (§5.3). | -| V4a *(next, FE-675 V4a half)* | Phase 2 | Side-chat client persists turns into `chat` / `turn` with `chat.kind='side_chat'`; "Old chats" tab strip activates. §349 anchor decision still open. | -| V4b *(horizon, FE-675 V4b half)* | Phase 4 | Patch ledger (FE-701); item versioning; branched exploration; architect loop. | +| V4a *(horizon / runtime-track input)* | Phase 2 | Side-chat client persists turns into `chat` / `turn` with `chat.kind='side_chat'`; "Old chats" tab strip activates in this document's original model. Current runtime synthesis may fold this into in-stream threads. | +| V4b *(horizon, FE-675 V4b half)* | Phase 4 | Changeset ledger (FE-701); item versioning; branched exploration; architect loop. | **Decisions and assumptions that govern V3.0:** @@ -431,7 +443,7 @@ V-versions in §9 describe the *user surface*; substrate phases in `docs/design/ ## Traceability -- **Replaces** PLAN.md horizon items: graph-launched refinement (under D128), trigger-popover composer, revisit / edit mode + cascade preview (`docs/design/REVISIT_MODULE.md` becomes a sub-document of this design). +- **Replaces** PLAN.md horizon items: graph-launched refinement (under D128), trigger-popover composer, revisit / edit mode + cascade preview (the older revisit-module/modal concept is subsumed by this design). - **Reuses** D125 (typed relation policy), D127 (progressive-detail seam), D128 (graph view actionable workspace mode), Requirement 25 (revision card pattern). - **Adds** future assumptions A71 (patch/event-stream model), A72 (item versioning), A73 (architect loop). - **Bounded by** D80 (no turn-tree branching), D89 (card-owned input), D113 (no second durable workflow model), D66 (user authorizes). diff --git a/docs/design/SPEC_EVOLUTION_STRATEGIES.md b/docs/design/SPEC_EVOLUTION_STRATEGIES.md new file mode 100644 index 00000000..1045451e --- /dev/null +++ b/docs/design/SPEC_EVOLUTION_STRATEGIES.md @@ -0,0 +1,244 @@ +# Spec Evolution Strategies + +> Status: **design RFC — graduated into `memory/SPEC.md` / `memory/PLAN.md`**. +> Date: 2026-05-12. +> Scope: chat-local strategies for advancing a Brunch specification's intent graph from vague user intent toward phase-mature, reviewable semantic truth. +> +> Related docs: [`AGENT_MUTATION_SURFACE.md`](./AGENT_MUTATION_SURFACE.md), [`BEHAVIORAL_KERNELS.md`](./BEHAVIORAL_KERNELS.md), [`INTENT_GRAPH_SEMANTICS.md`](./INTENT_GRAPH_SEMANTICS.md), [`MULTI_CHAT.md`](./MULTI_CHAT.md), [`PATCH_LEDGER.md`](./PATCH_LEDGER.md). + +## Problem + +The current interviewer is grounded but slow. It uses a design-decision-tree drilldown strategy: ask phase-shaped questions, walk down the user's design tree, and gradually accumulate enough shared understanding for requirements and criteria. That produces high-provenance intent graph truth, but early users notice the question burden quickly. + +Brunch needs alternative spec-evolution strategies that reduce user burden without weakening the graph into plausible but incoherent generated prose. The FE-705 `brunch agent` / probe-harness branch is therefore not only a CLI feature; it is the first practical strategy test harness. It lets external probes drive the real Brunch lifecycle, generate drilldown-based completed-spec fixtures, and compare alternative strategy outputs before committing product UI. + +## Core model + +A **strategy** is a chat-local policy for advancing semantic state. It decides: + +- what context it reads, +- what question / offer / candidate artifact it produces, +- what output unit it treats as coherent, +- what authority it has to commit graph truth, +- what review or validation must happen before commit, +- what evidence it contributes toward semantic maturity / phase advancement. + +A strategy is not specification-level semantic truth. In the multi-chat model, one specification can have many chats, each with its own strategy and resumable context. + +A Brunch `turn` is assistant/system-first: the assistant/system asks, offers, proposes, or reports something; the user response completes the bundle. Observer/runtime assessment reads the whole bundle, because the assistant/system part gives the user's response its meaning. + +A chat should have at most one open frontier turn. In normal operation, every active/resumable chat should have one open frontier turn, even if it is a scripted frontier such as the first offer in a side-chat. If a chat has no open turn, focusing it may generate a continuation frontier based on chat strategy, chat kind, latest semantic maturity / `phase`, and staleness. + +## Strategy taxonomy + +### Step-by-step drilldown + +Current default. The interviewer asks phase-shaped questions at increasing detail until shared understanding is sufficient. + +- **Strength:** high provenance; each claim is supported by user answers. +- **Weakness:** long and user-burdensome. +- **Commit shape:** incremental canonical changesets after ordinary turn observation / review. + +### Scenario options + +Low-friction strategy for impatient, under-informed, or underspecified users. Brunch asks enough to identify the product/use-case typology, then generates 2–3 coherent candidate graph bundles with named tradeoff profiles. + +- **Strength:** users react to concrete options rather than authoring the whole design. +- **Weakness:** one-shot generation can produce plausible but generic, contradictory, or unsupported graph structure. +- **Commit shape:** candidate graph bundles, accepted cleanly or accepted with explicit open issues. + +### Targeted cases + +Kernel-driven contrastive elicitation from [`BEHAVIORAL_KERNELS.md`](./BEHAVIORAL_KERNELS.md). The interviewer detects active behavioral kernels and asks concrete divergent cases whose classifications emit typed artifacts directly. + +- **Strength:** lower-friction than drilldown, more grounded than whole-spec generation. +- **Weakness:** needs kernel cards, artifact schemas, validators, ordering, and cross-kernel deduplication. +- **Commit shape:** validated kernel artifacts such as decisions, invariants, examples/counterexamples, criteria, and typed edges. + +### Graph review + +Quality-oriented critique that can run over any graph, whether drilldown-created, scenario-generated, imported, or edited. + +- **Question:** where is this graph weak, thin, overconfident, unsupported, ambiguous, generic, uncheckable, or missing structure? +- **Commit shape:** findings start as turn-owned structured artifacts; accepted repairs may later apply changesets. + +### Graph reconciliation + +Repair-oriented process over known disturbance or process debt such as open `reconciliation_need` rows. + +- **Question:** given this specific change/conflict, what existing graph truth must be repaired, confirmed, dismissed, or escalated? +- **Commit shape:** changesets that edit items/edges and/or resolve/open reconciliation needs. + +### Topology-driven targeting + +Internal targeting machinery, not a user-facing strategy for now. Once a graph exists, Brunch can rank next questions, reviews, or repairs by topology: high-fanout low-confidence assumptions, decisions without rejected alternatives, criteria without targets, conflicting constraints, etc. + +## Semantic history and proposal turns + +Turns are conversational provenance and replay. They should not remain the only historical spine once multiple chats, direct edits, review passes, verifier feedback, and candidate bundles can mutate graph truth. + +The future semantic spine is the **changeset ledger**: + +```text +changeset: + one atomic semantic mutation set + +change: + one atomic add/update/link/unlink/retire/etc. inside the changeset +``` + +A changeset mutates a specification from one semantically / structurally valid graph state to another, including any `reconciliation_need` rows opened or resolved by that mutation. The data changes and changeset record must succeed or fail together. The changeset boundary is the smallest atomic unit that preserves semantic coherence: if applying only half the mutation would leave the graph incoherent, it belongs in one changeset. + +A graph-review finding, candidate proposal, or reconciliation suggestion is not itself a changeset until accepted or acted on. It is the assistant/system half of an open frontier turn. The turn completes when the user responds, and only then may the runtime apply a changeset. + +Proposal turns should share a small normalized completion vocabulary: + +- `accept` — authorize the proposal; may apply a changeset. +- `reject` — decline without semantic mutation; narrow because rejection can leave or create process debt. +- `revise` — request a new coherent proposal; maps to labels like "Request changes". +- `ask_followup` — ask for explanation before deciding. +- `defer` — intentionally park the matter. +- `regenerate` — recreate the offer, especially when stale or low-quality. + +Only `accept` applies a proposal turn's semantic changeset. Other proposal actions may create process metadata or successor turns, but should not directly mutate intent graph truth. If a no-edit outcome resolves process debt, model it as accepting a proposal whose changeset resolves the relevant need. + +Proposal/finding artifacts should start as turn-owned structured assistant parts. A standalone proposal or proposed-changeset model should wait until batch review, assignment, expiry, cross-chat surfacing, or independent proposal lifecycle demands it. + +When a turn opens, it should stamp the latest applied changeset id for the specification — for example `turn.opened_at_changeset_id` or `turn.base_changeset_id`. This is not provenance; it is the graph revision the assistant/system offer was based on. First-cut staleness is conservative: if a turn remains open while `specification.latest_changeset_id` advances, the open offer is considered stale and the product offers regeneration / refresh rather than neighborhood-level diffing. + +## Direct editing + +Direct editing is a sibling mutation path, not proposal revision. + +In explicit edit mode, the user may make pending direct changes to one or more intent items. When they exit/apply edit mode, Brunch computes affected incident edges and opens required `reconciliation_need` rows under relation policy; direct item changes and reconciliation needs commit together in one changeset. Direct editing is safe because incoherence risk becomes explicit process debt, not because arbitrary edits are forbidden. + +Review-set direct edits have a special consequence. If the user directly edits proposed review-set items, accepting the review set as-is is no longer valid. `accept` should be disabled; `request changes` becomes a reconciliation-oriented action such as `request reconciliation`. The edited candidate/review set must be reconciled before it can become canonical truth. + +## Relation directionality + +The current `knowledge_edge` relation names mix directionality. `depends_on` and `derived_from` naturally read downstream-to-upstream; `constrains` and `verifies` often read upstream-to-downstream or evidence-to-claim. That becomes risky once edges drive reconciliation. + +FE-700 may break existing relation names/records while expanding the ontology, but forcing every useful edge verb into one dependency direction risks distorting the graph around one operation. The graph must serve display, prompt context, export trace, requirements projection, reconciliation, critique, verification, candidate generation, and explanation. + +Rule: + +> Edge verbs should be semantically clear; operational direction belongs in relation policy. + +Every relation kind should declare: + +- canonical sentence, e.g. `{source} verifies {target}`, +- inverse display sentence, +- graph-display / export / staleness / reconciliation / criteria-help / weak-suggestion participation, +- what happens when source changes, +- what happens when target changes. + +Code should not infer reconciliation behavior from raw edge direction. Direct edit and hard-impact cascade should enumerate incident accepted edges and ask relation policy which endpoint, if any, receives a `reconciliation_need`. + +Contrastive kernels may pressure a further ontology expansion. Kernel questions naturally surface artifacts such as `alternative`, `question`, `ambiguity`, `candidate`, and rejected options. FE-700 should leave room for these artifacts, but the first implementation can represent them as examples, decisions, constraints, or proposal-local structures until durable top-level kinds prove necessary. + +## Candidate graph bundles + +`scenario_options` produces speculative but coherent candidate worlds, not loose item lists. A candidate bundle should contain: + +- short name and scenario summary, +- intended maturity stage, +- tradeoff profile, +- generated items and edges, +- required core items, +- optional/swappable items, +- known risks, +- graph-review findings, +- provenance / epistemic labels, +- commit preconditions. + +User review should be bundle-level by default: `Use this`, `Revise`, `Regenerate`, or ask follow-up. Arbitrary item-level pick-and-choose risks incoherence. Partial acceptance is only safe when the accepted subset is semantically closed or the system brings along required supporting items/edges. + +Candidate readiness should distinguish clean acceptance from acceptance with represented problems: + +- `draft` — generated but not checked, +- `reviewing` — background review running, +- `reviewed_clean` — acceptable normally, +- `reviewed_with_issues` — acceptable only if open issues become durable, +- `blocked` — cannot be accepted without repair/regeneration. + +`reviewed_with_issues` can still be accepted if Brunch durably represents the problems, for example by opening a follow-on graph-review frontier turn or by creating appropriate problem records / `reconciliation_need` rows in the accepting changeset. Imperfect graph states are allowed if their problems are explicit and durable, not hidden. + +Broader graph-review issues should start as turn-owned structured artifacts. `reconciliation_need` remains the only first-class problem table for now, scoped to coherence / staleness process debt caused by relation impacts. A generalized `graph_issue` / `problem` table is a future option if review findings need cross-chat querying, filtering, assignment, badges, or lifecycle independent of turns. + +## Product sequencing + +The most desired product surfaces are: + +1. first-turn strategy choice for a new chat/spec start, +2. a mid-interview "speed this up" / "show me strong options" affordance. + +Engineering still needs part of `graph_review` to make scenario generation credible. `scenario_options` can be the first product-facing strategy while graph review remains an internal oracle used to critique, repair, and score generated bundles before they are committed. + +For mid-interview acceleration, branch into a new or reused side-chat / strategy chat rather than switching the primary interview chat in place. The side-chat branch receives a context pack — not a raw transcript dump — containing spec identity, maturity/phase, summarized goal/context, accepted graph truth, important edge neighborhoods, current frontier question if relevant, unresolved assumptions, and recent turns only when they explain user style or intent. + +The first `speed this up` mode should **complete the current direction**: treat accepted graph truth as fixed premises and fill in plausible missing structure. A more radical "show alternatives that challenge prior assumptions" mode is feasible but deferred. + +Scenario generation should present 2–3 options with named tradeoff profiles. Candidate quality gates should be latency-tiered: + +- fast synchronous gates before display: parse validity, schema validity, coarse fixed-premise check, no obvious contradiction, and tradeoff summary present; +- async gates after display: deeper graph review, coverage, checkability gaps, provenance warnings, repair/refinement. + +The existing observer-style async capture mechanism could generalize into an async semantic worker queue for capture / review / refine / repair. Users can read initial candidates while background review improves readiness. If a candidate is accepted with open issues, Brunch should open or reuse a graph-review chat with a frontier turn summarizing remaining issues and asking what to address first. + +## Concern map and dependencies + +### Semantic substrate — highest coordination + +Owns ontology expansion, relation policy directionality, changeset/change ledger, `turn.opened_at_changeset_id`, `specification.latest_changeset_id`, chat-local strategy metadata, and one-open-frontier-per-chat invariants. + +Likely areas: `src/server/schema.ts`, `src/server/db.ts`, `src/server/knowledge-relationship-policy.ts`, future changeset modules, [`INTENT_GRAPH_SEMANTICS.md`](./INTENT_GRAPH_SEMANTICS.md), [`PATCH_LEDGER.md`](./PATCH_LEDGER.md). + +Sequential dependencies: relation policy before robust reconciliation/direct-edit cascade; changesets before productized candidate acceptance; turn staleness depends on latest changeset tracking. + +### Strategy / proposal artifacts — parallelizable + +Owns candidate bundle shapes, graph-review finding shapes, proposal turn artifacts, candidate statuses, and normalized proposal responses. + +Likely areas: `src/server/parts.ts`, `src/server/turn-artifacts.ts`, a possible `strategy-artifacts` module, context packs, prompt scenarios. + +Can start before durable changesets if artifacts remain turn-owned and do not commit canonical truth. + +### Graph-review oracle — supports scenario options + +Owns review rubric, graph critique prompt, candidate quality gates, accept-with-issues semantics, and follow-on review turns. + +Likely areas: new graph-review prompt/context pack, `src/server/scenario-runner.ts`, `scripts/agent-probes/`. + +Can run probe-only before product UI; needs enough FE-700 ontology/relation policy to be meaningful. + +### Scenario-options strategy — first product-facing acceleration + +Owns 2–3 candidate bundles, tradeoff summaries, fast validation, async review/refine/repair handoff, and clean/with-issues acceptance. + +Likely areas: `src/server/prompts/candidate-spec-system.md`, `src/server/context-pack/candidate-spec.ts`, scenario runner/probe harness, later side-chat UI. + +Depends on graph-review minimum oracle and, for canonical acceptance, changeset ledger. + +### Async semantic workers — staged infrastructure + +Own capture / review / refine / repair background work. Can begin as observer-style in-process tasks before durable queue tables exist. + +### Reconciliation / direct edit — adjacent but distinct + +Owns edit mode, affected-edge enumeration, relation-policy-driven `reconciliation_need` creation, reconciliation chat behavior, and review-set request-reconciliation behavior. + +Likely areas: `src/server/edit-impact.ts`, `src/server/edit-route.ts`, `src/server/reconciliation-need.test.ts`, side-chat/patch-list UI. + +Depends on relation-policy directionality; eventually depends on changesets for atomic direct-edit history. + +## FE-705 implication + +The `brunch agent` JSONL seam is a strategy test harness: + +- drive current drilldown headlessly, +- produce completed-spec fixture candidates, +- preserve workspace state for curation, +- compare strategy outputs against known-good or semi-golden graphs, +- exercise Brunch-owned mutation authority rather than direct DB shortcuts. + +This lets Brunch evaluate strategy outputs before exposing them as product modes. diff --git a/docs/design/DEV_WORKFLOW_EVOLUTION.md b/docs/design/ln-skills/EVOLUTION.md similarity index 95% rename from docs/design/DEV_WORKFLOW_EVOLUTION.md rename to docs/design/ln-skills/EVOLUTION.md index d4c382c3..3731f404 100644 --- a/docs/design/DEV_WORKFLOW_EVOLUTION.md +++ b/docs/design/ln-skills/EVOLUTION.md @@ -6,13 +6,13 @@ > > This document is **not** part of `memory/SPEC.md` because it does not describe Brunch the product. It is the canonical design home for the **dev layer**: how Brunch is built. Conclusions that affect product behavior should still be promoted into `memory/SPEC.md` through `ln-spec`, but most of the material here describes self-tooling rather than user-facing capability. > -> Source synthesis: external agent conversations captured in [`docs/design/INTENT_SPEC_EVOLUTION.md`](./INTENT_SPEC_EVOLUTION.md). That synthesis treats both the product layer and the dev layer in the same document; this note splits the dev-layer trajectory out so the layers stop colliding. +> Source synthesis: external agent conversations captured in [`docs/archive/design/INTENT_SPEC_EVOLUTION.md`](../../archive/design/INTENT_SPEC_EVOLUTION.md). That synthesis treats both the product layer and the dev layer in the same document; this note splits the dev-layer trajectory out so the layers stop colliding. ## Why this note exists The intent-spec branching conversation produced two parallel trajectories: -1. A **product-layer** direction — Brunch should evolve from eliciting planning specs toward eliciting intent specs, with progressive checkability, behavioral kernels, semantic edges, and graph-first context. Most of that material has now landed in `memory/SPEC.md` (Requirements 38–41, A77–A87, D125, D134–D142, I109–I112, and the Lexicon entries for `intent graph` / `progressive checkability` / `behavioral kernel` / `context pack` / `scenario runner`) or in sibling design docs (`MULTI_CHAT.md`, `PATCH_LEDGER.md`, `INTENT_SPEC_EVOLUTION.md`). +1. A **product-layer** direction — Brunch should evolve from eliciting planning specs toward eliciting intent specs, with progressive checkability, behavioral kernels, semantic edges, and graph-first context. Most of that material has now landed in `memory/SPEC.md` (Requirements 38–41, A77–A87, D125, D134–D142, I109–I112, and the Lexicon entries for `intent graph` / `progressive checkability` / `behavioral kernel` / `context pack` / `scenario runner`), focused design docs (`MULTI_CHAT.md`, `PATCH_LEDGER.md`), or the archived source synthesis (`../../archive/design/INTENT_SPEC_EVOLUTION.md`). 2. A **dev-layer** direction — the same critique, applied recursively to Brunch's *own* spec workflow. The current `memory/SPEC.md` is doing many jobs at once and the markdown-mediated nature of the document creates real cognitive cost on contributing LLMs. The conversation proposed a file-backed canonical spec registry with deterministic checkers and generated views. None of this has landed anywhere except as a one-line horizon item in `memory/PLAN.md` ("Structured development spec registry"). @@ -155,7 +155,7 @@ The point is not that the current system is broken — it works, and `ln-sync` e ## Proposed dev-layer trajectory -The trajectory is the one the source synthesis captures in §10–11 of [`INTENT_SPEC_EVOLUTION.md`](./INTENT_SPEC_EVOLUTION.md), but framed here as a self-tooling experiment for *this* repo, not as a product proposal. +The trajectory is the one the source synthesis captures in §10–11 of [`INTENT_SPEC_EVOLUTION.md`](../../archive/design/INTENT_SPEC_EVOLUTION.md), but framed here as a self-tooling experiment for *this* repo, not as a product proposal. ### Target shape @@ -257,7 +257,7 @@ The structural argument for convergence is strong: The structural argument against immediate convergence is also strong: -- They have different persistence needs. The dev layer is diffable, branchable, reviewable in PRs — files. The product layer is interactive, multi-user, resume-precise — SQLite. (Source: [`INTENT_SPEC_EVOLUTION.md`](./INTENT_SPEC_EVOLUTION.md) §11.) +- They have different persistence needs. The dev layer is diffable, branchable, reviewable in PRs — files. The product layer is interactive, multi-user, resume-precise — SQLite. (Source: [`INTENT_SPEC_EVOLUTION.md`](../../archive/design/INTENT_SPEC_EVOLUTION.md) §11.) - They have different mutation interfaces. The dev layer mutates through editor + CLI. The product layer mutates through interview turns, observer captures, and graph edits. - They have different operational metadata. The dev layer cares about test coverage and CI gates; the product layer cares about workflow phase, frontier ownership, review acceptance, and chat ownership. @@ -307,7 +307,7 @@ The decision rule: ## References -- [`INTENT_SPEC_EVOLUTION.md`](./INTENT_SPEC_EVOLUTION.md) §10–11 — source synthesis for the registry trajectory and the persistence adapter split. -- [`AGENTS.md`](../../AGENTS.md) — current operational protocols, verification harness, naming conventions. +- [`INTENT_SPEC_EVOLUTION.md`](../../archive/design/INTENT_SPEC_EVOLUTION.md) §10–11 — source synthesis for the registry trajectory and the persistence adapter split. +- [`AGENTS.md`](../../../AGENTS.md) — current operational protocols, verification harness, naming conventions. - `.agents/skills/ln-*/SKILL.md` — current implementations of the dev-workflow skills. - `memory/PLAN.md` horizon item "Structured development spec registry" — the one-line pointer this document expands. diff --git a/docs/design/ln-skills/README.md b/docs/design/ln-skills/README.md new file mode 100644 index 00000000..1432cbb3 --- /dev/null +++ b/docs/design/ln-skills/README.md @@ -0,0 +1,9 @@ +# ln-skills Design Notes + +This directory holds design rationale for Brunch's `ln-*` agent-skill workflow and related dev-layer self-tooling. + +These documents are not executable skills. Runtime skill instructions live under `.agents/skills/ln-*/`; accepted operational protocols belong in `AGENTS.md` or `docs/praxis/`; canonical product truth remains in `memory/SPEC.md` and `memory/PLAN.md`. + +| Document | Role | +| --- | --- | +| `EVOLUTION.md` | Dev-layer trajectory for the `ln-*` skill family, `memory/` ontology, proposed file-backed spec registry, and possible dev/product ontology convergence. Not product SPEC. | diff --git a/docs/praxis/graphite-workflow.md b/docs/praxis/graphite-workflow.md index 092e5049..3e246a9f 100644 --- a/docs/praxis/graphite-workflow.md +++ b/docs/praxis/graphite-workflow.md @@ -1,6 +1,6 @@ # Graphite Workflow -Graphite manages the stacked branch structure. Every plan-level `memory/PLAN.md` frontier item gets its own branch; the stack mirrors PLAN.md dependencies. Here, a **frontier item** means one named work item in the plan itself — not a scope card or an implementation sub-slice discovered during `ln-scope` / `ln-build`. Those refinements stay on the same branch unless `ln-plan` is rerun and splits the frontier into separate PLAN.md items. +Graphite manages the stacked branch structure. Every plan-level `memory/PLAN.md` frontier item gets its own branch; the stack mirrors PLAN.md dependencies. Here, a **frontier item** means one named canonical work item in the plan itself, preferably keyed by a stable id in `Frontier Definitions` and ordered in `Sequencing` — not a scope card or an implementation slice discovered during `ln-scope` / `ln-build`. Those refinements stay on the same branch unless `ln-plan` is rerun and splits the frontier into separate PLAN.md items. ## git vs gt boundary @@ -24,9 +24,9 @@ Use **gt** (via `/cli-graphite`) for stack-aware operations: ## Branch granularity - Branch / Linear-issue granularity follows the containing `memory/PLAN.md` frontier item. -- A frontier item is the plan-level work item; scope cards and implementation sub-slices are execution detail inside it. -- `ln-scope` may narrow one frontier item into multiple buildable sub-slices or consecutive scope cards; keep them on one branch. -- If several consecutive scope cards are prepared ahead of time, keep that execution queue in `memory/CARDS.md`; do not split branches just to mirror commit-sized steps. +- A frontier item is the plan-level work item; scope cards and implementation slices are execution detail inside it. +- `ln-scope` may narrow one frontier item into multiple buildable slices or consecutive scope cards; keep them on one branch. +- If several consecutive scope cards are prepared ahead of time, keep that execution queue in `memory/CARDS.md`; do not split branches or duplicate detailed slice history in `memory/PLAN.md` just to mirror commit-sized steps. - Only create a new branch when starting a different frontier item, or after `ln-plan` explicitly splits the frontier into separate PLAN.md items that should stack independently. - If scoping shows the current frontier item is too large, revise `memory/PLAN.md` first, then align the branch stack to the revised frontier. @@ -40,7 +40,7 @@ Use **gt** (via `/cli-graphite`) for stack-aware operations: ``` gt create {prefix}/fe-XXX-keywords # new branch for one PLAN.md frontier item -# ... implement one or more scoped sub-slices on this branch ... +# ... implement one or more scoped slices on this branch ... git add && git commit # plain git for commits npm run verify # gate before submit gt submit # push + create/update PR diff --git a/docs/praxis/worktree-agents.md b/docs/praxis/worktree-agents.md index 5ac3a64d..f95f5572 100644 --- a/docs/praxis/worktree-agents.md +++ b/docs/praxis/worktree-agents.md @@ -1,6 +1,6 @@ # Worktree Agent Protocol -Use `isolation: "worktree"` to run parallel agent builds in isolated copies of the repository. This enables concurrent slice implementation without merge conflicts during development. +Use `isolation: "worktree"` to run parallel agent builds in isolated copies of the repository. This enables concurrent scoped-slice implementation without merge conflicts during development. ## Known limitations @@ -12,7 +12,7 @@ Use `isolation: "worktree"` to run parallel agent builds in isolated copies of t ### Before spawning 1. Ensure the main worktree is clean (`git status` — no uncommitted changes). -2. Identify which slices can run in parallel (check PLAN.md `## Dependencies` and `### Parallelism opportunities`). +2. Identify which frontier items or scoped slices can run in parallel (check PLAN.md `## Sequencing` → `Parallel / Low-conflict` and `## Dependencies`). 3. Each agent gets a complete task description — it has no context from the parent conversation. ### During execution @@ -30,7 +30,7 @@ Use `isolation: "worktree"` to run parallel agent builds in isolated copies of t ## When not to use worktree agents -- When slices have direct dependencies (one needs the other's output). +- When frontier items or scoped slices have direct dependencies (one needs the other's output). - When the task requires interactive human steering (e.g. prompt iteration, visual design). - When the risk of merge gaps exceeds the time saved by parallelism. - For non-build tasks (spikes, reviews, spec work) — these don't benefit from isolation. diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index 90855cb5..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,442 +0,0 @@ - - -# Scope cards — V3.1 + node-edit completion - -The frontier (PLAN.md §Next item 2) is **Side-chat V3.1 — agent-grouped reconciliation resolution**. The user's direction (T-019e08b9 conversation) folds in two adjacent improvements that close out node editing on the cascade surface: showing the source diff inline (Card A) and an Edit-target affordance per need (Card B). These compose against the V3.0 seam and are independent of the V3.1 agent. - -Cards 1-3 are done and live in settled seams. Card 4 is a Figma-aligned visual polish pass over those surfaces (chat panel, staged-patches strip, Pending review section, direct-edit toolbar) that lands before the V3.1 agent UI builds on top — the agent's per-row status chips and proposal-diff actions reuse the polished `DiffPopover` and toolbar contracts that Card 4 introduces. Card 5 is the V3.1 agent backend (next, full scope card). Cards 6-7 (V3.1 client UI + bulk actions) are NOT queued yet — they depend on `ln-oracles` settling the LLM verification strategy and on what slice 5 actually feels like. - ---- - -## 1 — Source-content snapshots on `reconciliation_need` (server) — `done` - -### Objective - -Each `reconciliation_need` row carries the source item's content snapshot from immediately before and after the edit that opened it, so downstream surfaces (Pending review row, V3.1 agent pre-image) can render or reason about the actual change without re-querying mutable item history. - -### Acceptance Criteria - -- ✓ `reconciliation_need` table has two new nullable columns: `source_previous_content TEXT`, `source_current_content TEXT`. -- ✓ `OpenReconciliationNeedInput` accepts and persists `sourcePreviousContent` and `sourceCurrentContent`; `openReconciliationNeed` writes them through. -- ✓ `handleApplyEdit` (edit-route hard path) passes the existing `previousContent` and the just-applied `parsed.data.content` into every `openReconciliationNeedIfAbsent` call. -- ✓ `ReconciliationNeedRecord` (shared type) gains the two fields so the client query exposes them. -- ✓ `GET /api/specifications/:id/reconciliation-needs` returns the new fields without breaking existing test payload assertions. -- ✓ Existing partial-unique-index dedupe still applies — re-applying the same edit does not open new rows or overwrite snapshots. -- ✓ Migration `0018_reconciliation_need_source_snapshots.sql` is generated via the standard drizzle pipeline. - -### Verification Approach - -- Inner: extend `cascade-producer.test.ts` / `reconciliation-need.test.ts` for the new columns; extend `edit-route.test.ts` to assert snapshots arrive on opened needs; extend `reconciliation-needs-route.test.ts` for the response shape. `npm run verify`. - -### Promotion checklist - -- [ ] Requirement change? **No** — extends existing Requirement 10 surface, no new requirement. -- [ ] Assumption change? **No** — A88 is preserved; A80 explicitly pre-authorizes extending the queue table with provenance fields. -- [ ] Non-trivial design decision? **No** — D139 already routes cascade through `reconciliation_need`; this just attaches the source delta to the queue row instead of re-deriving it. -- [ ] New seam-level invariant? **No** — snapshots are advisory render data, not load-bearing state. `caused_by_turn_id` already exists for stronger provenance. -- [ ] Crosses >2 major seams? **No** — schema + producer + shared type + route response. -- [ ] First touch in unfamiliar seam? **No** — same code paths as PR #115/#116. -- [ ] Cannot name containing seam from live docs? **No** — `cascade-producer.ts`, `db.openReconciliationNeed*`, `reconciliation-needs-route.ts`, SPEC.md A80/A88 + I112/I113. - -→ Stays light. - ---- - -## 2 — Source diff rendered inline on each Pending review row (client) — `done` - -### Objective - -Each row in `` shows the source item's before/after as a `` so the user can read what changed without leaving the cascade surface. - -### Acceptance Criteria - -- ✓ When `source_previous_content` and `source_current_content` are both present and non-equal, the row renders a `` block under the source/target reference line. -- ✓ When either snapshot is null (legacy rows opened before Card 1), the row renders today's bare layout — no diff block, no error. -- ✓ Diff styling reuses FE-665's `` component verbatim; no new diff library, no duplicated tokenization. -- ✓ The diff block does not push the Resolve button below the fold of the overlay's normal scroll height — verified by snapshot of an existing test fixture. -- ✓ A small "Source change" label sits above the diff so it's not confused with the (future) target diff. - -### Verification Approach - -- Inner: extend `pending-review-section` component test (or add one) using the existing fixture in `__tests__/reconciliation-need-fixtures.ts` extended with snapshots; assert `` renders only when snapshots are present and non-equal. -- Outer: manual walkthrough of an apply that opens 4 needs — the diff should make the source change legible at a glance. - -### Promotion checklist - -- [ ] All seven items: **No**. Pure UI composition over an existing component using fields just added in Card 1. No new decisions, no new seams. - -→ Stays light. - ---- - -## 3 — "Edit target" affordance per Pending review row (client + reuse) — `done` - -### Objective - -Each row gets an "Edit target" button that expands an inline textarea pre-filled with the target item's current content; saving runs through the existing edit pipeline (`PATCH /knowledge-items/:id`) and then resolves the need (`POST /api/specifications/:id/reconciliation-needs/:needId/resolve`). - -### Acceptance Criteria - -- ✓ Each row exposes `[ Edit target ]` alongside the existing `[ Resolve ]`. -- ✓ Clicking Edit target expands an inline textarea with the target item's current content; ⌘↵ saves, esc cancels. -- ✓ Save calls `PATCH /knowledge-items/:id` (or whichever existing edit-route endpoint FE-657 uses) with content + rationale; on success, calls the existing resolve endpoint and refetches the needs query. -- ✓ If the edit returns `impact === 'hard'` with new opened needs, the new needs surface immediately in the same Pending review section — re-entrant cascade works without a page reload. -- ✓ While save is in flight, both Edit target's Save button and the row's Resolve are disabled. -- ✓ Existing per-row Resolve behavior is unchanged when the editor is collapsed. -- ✓ The target's current content used to pre-fill the textarea comes from a single source of truth — either the existing knowledge-items query already mounted on the overlay surface, or a newly threaded field on `ReconciliationNeedRecord`. Pick one and document the choice in the commit body. - -### Verification Approach - -- Inner: extend the pending-review section test to drive the inline-edit flow against a mocked edit endpoint; assert the resolve endpoint is called only after the edit succeeds; assert re-entrant cascade rows appear after save. -- Outer: manual walkthrough — open a hard apply that creates needs, edit one target inline, confirm the cascade rerenders and Resolve clears the row. - -### Promotion checklist - -- [ ] Requirement change? **No** — already implicit in Requirement 10's HITL contract. -- [ ] Assumption change? **No**. -- [ ] Non-trivial design decision? **Possibly** — choosing whether the target content is read from the items query vs threaded onto `ReconciliationNeedRecord`. Both are reversible. Not promoting unless the chosen direction surfaces a durable invariant. -- [ ] New seam-level invariant? **No**. -- [ ] Crosses >2 major seams? **No** — UI + existing edit-route + existing resolve endpoint. -- [ ] First touch in unfamiliar seam? **No**. -- [ ] Cannot name containing seam? **No** — `pending-review-section.tsx`, FE-657 inline-edit pattern, V2 edit-route. - -→ Stays light. - ---- - -## 4 — Side-chat / pending-review polish — Figma alignment + `DiffPopover` primitive (client) — `done` - -### Objective - -Bring the V3.0 side-chat panel, staged-patches strip, pending-review section, and direct-edit toolbar in line with the HASH-SgAI Figma design language (file `nTw9n0blCJm1j9t22Jo72d`, node `969:13119`) and Linear-chat minimal-chrome conventions: kind-accent tints replace ad-hoc grays, a shared `DiffPopover` primitive replaces inline diff expanders, action chrome shrinks toward icon-only ghost shapes, and FE-only vocabulary unifies as `Note` / `Edit mode`. No backend or contract changes. The four sub-sections form a cohesive pass — split during build is allowed, but the kind-accent tint system, button shapes, and vocabulary must land together to feel coherent. - -### Acceptance Criteria - -**S1 — Vocabulary + chat-panel chrome** (`side-chat-popover.tsx`) - -- ✓ FE-only string changes, no type renames: - - Annotate button label `Annotate` → `Note`; aria `Annotate item` → `Add a note`. - - Annotation composer aria `Annotation composer` → `Note composer`; summary placeholder `Summary` → `Title`; body placeholder `Note body` → `Details`. - - Edit-mode button label `Edit` → `Edit mode` (off) / `Edit on` (active); tooltip → `Toggle edit mode — your messages propose changes for review`. - - Promote-from-drawer aria `Add … to chat context` → `Add … to context`. - - Patch kinds (`'annotate' | 'edit' | 'edge' | 'drill-down'`) and `mode` prop values stay unchanged at the type level. -- ✓ Top-right floating header buttons (layout-toggle, close) shrink from 24×24 to 20×20 ghost. -- ✓ The current right-side action row above the input (`[Annotate] [Edit]`) is removed. Both actions move: - - `+ note` becomes a 24×24 ghost icon button inside the input card's left action row, next to the disabled `+` attach button (`NotebookPen` icon, label only on hover/aria). - - Notes(N) drawer button (rendered when `existingAnnotations.length > 0`) sits to its right in the same input-card left action row. Drawer still opens upward as a popover. -- ✓ `Edit mode` becomes a thin strip rendered **below** the input card (separate full-width row, ~28px tall): `[PencilLine icon] Edit mode [toggle pill on right]`. When `mode === 'edit'`: strip bg `${kindAccent}10`, input placeholder swaps to `Suggest an edit…`, toggle pill reads `Edit on`. -- ✓ When `kindAccent` is null (untyped pinned item), all kind-accent tint values fall back to `#5424ff` so existing untyped-kind behavior is preserved. - -**S2 — Staged-patches strip + diff color system + `DiffPopover`** - -- ✓ Staged-patches `
` background flips from `bg-wash/60` to `${kindAccent}0a` with `${kindAccent}1f` border. Row hover bg `${kindAccent}05`. -- ✓ Per-row layout: `[kind chip with kind-specific lucide icon] [truncated title] [↗ view diff chip] [impact chip] [× discard]`. - - Kind chip rendered on every staged patch (note / edit / edge / drill-down), 10px font, kind-accent-tinted bg. - - `[↗ view diff]` chip rendered only when `kind === 'edit' && currentContent !== newContent`. Click opens `` anchored to this chip. - - Today's inline `
` expander is removed. - - Discard `×` shrinks to 14×14 ghost icon (`X` lucide), opacity 0 by default, opacity 1 on row hover or focus-within. - - Impact chip moves to the right of the title (not next to kind chip). -- ✓ Footer Undo / Apply become 28×28 icon-only: - - Undo: `Undo2` lucide, ghost (no bg), hover bg `${kindAccent}14`. Aria `Undo last change`. Hidden when `!canUndo`. - - Apply: `Check` lucide, solid `${kindAccent}` bg, white icon, ring shadow. Aria `Apply N change(s)`. Stays the eye-anchor. -- ✓ The current `Saving change…` status moves into a small inline label adjacent to Apply (right-aligned), not its own row. -- ✓ A new component `` lands at `src/client/components/diff-popover.tsx`: - - Props: `{ open, onClose, anchor, before, after, title, kindChip?, kindAccent? }`. - - Floating popover, no backdrop dim. Click-outside and ESC close. - - Viewport-aware absolute positioning (above by default, below if no space above). No floating-ui dependency. - - Max-width 480px. Header bar `[kind chip] [title (truncated)] [✕]` over `${kindAccent}10` bg. Body ``. Container border `${kindAccent}1f`. -- ✓ ``'s inline tint colors (warm-amber removed, cool-blue added) are unchanged — the visual diff signal stays decoupled from kind-accent so it remains universally readable. - -**S3 — `PendingReviewSection` redesign** (`pending-review-section.tsx`) - -- ✓ Strip background softens from `rgba(255,219,168,0.35)` to `rgba(255,219,168,0.18)`. -- ✓ Strip header becomes `[AlertCircle, amber] N pending reviews` — count + icon, no chevron, no resolve-all. -- ✓ Per-row layout: - ``` - ┃ [supersedes|confirm chip with Replace|CheckCircle2 icon] #ID · {target excerpt} - ┃ from #ID was edited [↗ view source diff] - ┃ [✎ edit] [✓ resolve] - ``` - - Left vertical bar (`┃`) is 2px wide, `rgba(255,219,168,0.6)` neutral-amber for v1 (target-item-kind enrichment is the deferred follow-up below). - - Title shows raw `#ID` followed by `·` and the first ~80 chars of `target_current_content`, single-line truncate. - - Sub-line `from #ID was edited` rendered only when `source_previous_content` and `source_current_content` are both present and differ. The `[↗ view source diff]` chip opens `` (same primitive from S2). - - Today's inline `` block under the row is removed; the diff is reachable only through the chip. - - Action row: `[✎ edit]` and `[✓ resolve]` 24×24, opacity 0.6 default, opacity 1 on row hover or focus-within. Edit is ghost. Resolve is solid `${kindAccent}` (target-kind-accent fallback to neutral amber when target kind unknown for v1). Tooltips and aria carry the labels. -- ✓ State icons: - - Resolving in-flight: `Check` swaps to `Loader2` spinner. - - Saving in-flight (inline edit form): `Check` swaps to `Loader2` spinner. -- ✓ Inline edit form (when `editDrafts` has the row's id) wraps textarea + Cancel/Save in `${kindAccent}10`-tinted card with `${kindAccent}1f` border. Cancel and Save shapes match S4's direct-edit toolbar contract. -- ✓ Existing data-attribute selectors (`data-need-id`, `data-need-kind`, `data-edit-target-form`) stay so the existing tests still resolve rows. - -**S4 — Direct-edit toolbar** (`ItemEditTextarea` in `routes/specification/$id/-structured-list-view.tsx`) - -- ✓ Textarea drops `shadow-[var(--shadow-card)]` and the heavy `border-rule`. Border `border-[${kindAccent}1f]`; focus ring `${kindAccent}33` at 2px (down from `ring-3`). Background stays `bg-background`. -- ✓ Cancel becomes icon-only — drop the word `Cancel`, keep the `X` icon, set `aria-label="Cancel edit"` and `title="Cancel"`. -- ✓ Save loses the hard-coded blue gradient and `ring-1 ring-[#1060d6]`. Becomes small kind-accent-solid (`bg-[${kindAccent}]`), white text, `Check` icon + word `Save`, `size="xs"` retained. Disabled state: `opacity-40`, no special bg. -- ✓ Keyboard hint row (`⌘↵ save · esc cancel`) stays unchanged. -- ✓ Vertical footprint shrinks ~6px versus today (the dropped shadow and thinner ring carry the savings). -- ✓ The same toolbar contract is reused inside `PendingReviewSection`'s inline edit form — same Cancel / Save composition, same kindAccent ring derivation, same disabled-state recipe. - -**Cross-section invariants** - -- ✓ Kind-accent values are derived everywhere from the existing `kindAccentHex` map in `knowledge-card.tsx`. No new color tokens; tints are computed inline via hex+alpha string concatenation following the existing precedent in `side-chat-popover.tsx`. -- ✓ All four surfaces share the same `kindAccent` fallback (`#5424ff`) when a kind cannot be determined. -- ✓ `npm run verify` passes with all existing component tests, plus extensions for the new primitive and the relabeled affordances. - -### Verification Approach - -- **Inner**: - - `side-chat-popover.test.tsx`: extend to assert (a) Note/Edit-mode label changes, (b) `+ note` button lives inside the input card's left action row, (c) Edit-mode strip renders below the input card with the toggle reflecting `mode`, (d) Undo/Apply are icon-only, (e) staged-patch rows expose `[↗ view diff]` chip when content differs, (f) discard `×` only visible on row hover/focus. - - `pending-review-section.test.tsx`: extend to assert (a) `[↗ view source diff]` chip opens `` instead of inline ``, (b) per-row `[✎ edit]` and `[✓ resolve]` icon buttons render with correct ARIA, (c) inline edit form uses the new toolbar shape (icon-only Cancel, kindAccent Save), (d) `Loader2` spinner replaces `Check` during in-flight states. - - New `diff-popover.test.tsx`: rendering with/without `kindChip`, ESC closes, click-outside closes, viewport-aware position falls back to below when no space above, focus management. - - Structured-list-view tests: extend `ItemEditTextarea` assertions to confirm icon-only Cancel and small kindAccent Save (no blue gradient), keyboard hints unchanged. - - `npm run verify` (lint + format + tests + build). -- **Outer**: manual walkthrough — open side-chat on items of three different kinds, stage and diff-popover-inspect a few edits, apply, undo. Trigger a hard cascade, walk the Pending review surface, source-diff popover, edit-target inline, save, see re-entrant cascade rerender. Direct-edit a row from the structured-list view, confirm the toolbar feels thin and ⌘↵ flow is unchanged. - -### Promotion checklist - -- [ ] Requirement change? **No** — pure visual + vocabulary polish over already-shipped V3.0 surfaces. No new product capability. -- [ ] Assumption change? **No** — A88 (Path 1 sufficiency) and A80 (HITL contract) untouched. -- [ ] Non-trivial design decision? **Possibly** — the `DiffPopover` primitive's shape (anchored vs modal, kindAccent vs neutral chrome) is a small reusable contract. Reversible if a future surface needs a different popover shape. -- [ ] New seam-level invariant? **No** — kind-accent tints are render-time derived, not stored. -- [ ] Crosses >2 major seams? **No** — four components in `src/client/`, one new primitive, no server / shared / contract changes. -- [ ] First touch in unfamiliar seam? **No**. -- [ ] Cannot name containing seam from live docs? **No** — `side-chat-popover.tsx`, `pending-review-section.tsx`, `content-diff.tsx`, `structured-list-view.tsx#ItemEditTextarea`, all in PR #115/#116/#117 territory. - -→ Stays light. - -### Polish follow-up — reference-code & target-kind enrichment on the listing endpoint (deferred) - -Card 4's S3 keeps raw `#ID` references and a neutral-amber row left bar because the current `GET /api/specifications/:id/reconciliation-needs` payload does not carry `target_reference_code`, `target_title`, `source_reference_code`, or `target_item_kind`. A small follow-up card (~30 lines in `reconciliation-needs-route.ts` plus a join per row) can enrich these fields, after which: - -- The Pending review row title flips from `#12 · {excerpt}` to `AS-12 · {excerpt}`. -- The sub-line flips from `from #9 was edited` to `from AS-9 was edited`. -- The row left bar derives its color from the target's `kindAccentHex` instead of the v1 neutral amber. -- Resolve button bg derives from target-kind-accent. - -Queue this only after Card 4 ships and the v1 polish has corpus signal. - ---- - -## 5 — V3.1 agent backend (schema + classifier + run-agent endpoint) — `done` (full scope card) - -### Target Behavior - -`POST /api/specifications/:id/reconciliation-needs/run-agent` classifies every open `reconciliation_need` row in the given specification whose `agent_status` is `null`, persisting one of `{auto-confirm, auto-edit, substantive}` plus an optional text proposal per row, while transitioning each row through `null → queued → classifying → classified | failed`. - -### Boundary Crossings - -``` -→ POST /api/specifications/:id/reconciliation-needs/run-agent (route) -→ handleRunReconciliationAgent (server/reconciliation-agent-route.ts, new) -→ list open + agent_status=null needs (db.ts; existing query, new filter) -→ enrich each need with sourceItem + targetItem (existing getKnowledgeItem) -→ classifyNeed(need, sourceItem, targetItem, getRelationKind, llm) → { classification, proposal? } (server/reconciliation-agent.ts, new pure function) - → loadPrompt('reconciliation-classifier') (prompt-loader; new asset src/server/prompts/reconciliation-classifier.md) - → generateText({ model, system, prompt }) on the AI SDK adapter already used by side-chat-route - → parse single-shot response into label + optional proposal -→ updateReconciliationNeedAgentFields(needId, { agent_status, agent_classification, agent_proposal }) (db.ts; new helper, transitions one row at a time) -→ 200 OK { specId, ranAt, classifiedCount, failedCount } (route response) -``` - -### Risks and Assumptions - -``` -- RISK: LLM returns a label outside the three-value vocabulary - → MITIGATION: classifyNeed validates against the literal union; on parse failure, transition to 'failed' with the parser error message persisted into agent_proposal as 'Parse error: ...'. - -- RISK: classifying N needs in a single request with a synchronous LLM call blocks the route past the typical proxy timeout when N is large - → MITIGATION: V3.1 first cut runs in-process with a per-need iteration so partial progress persists; the route returns once the loop completes. Single-digit open-need counts per spec (same as the N+1 caveat in Card 3) keep this acceptable for the MVP. Promote to a queue substrate (BullMQ / pg-boss / inline scheduler) only if outer-loop walkthroughs surface user-visible blocking. - -- RISK: Re-running the agent against rows already classified clobbers prior classification - → MITIGATION: route filters strictly on agent_status IS NULL; per-row Re-run (slice 6) re-sets a single row to null first, so the re-run path stays explicit and per-need. - -- ASSUMPTION: The lifecycle (null → queued → classifying → classified | failed) plus the three-label vocabulary is enough seam to support slices 6-7 (status chips, action buttons) without further schema change. → VALIDATE: build slice 6 against the schema as-is; if a new column appears in slice 6 (e.g. confidence score, retry count), promote that as an A### at slice-6 scoping. → memory/SPEC.md §Assumptions A88 (Path 1 sufficiency) is the umbrella; this is a sub-assumption under it. - -- ASSUMPTION: Single-shot LLM call (one prompt → one structured response, no tool use, no multi-turn) is sufficient classification quality for the three-label decision when the prompt has source previous + current content (Card 1) and target current content (Card 3) in context. → VALIDATE: the middle-loop golden-fixture corpus (see Verification Approach) is the only oracle that proves this; if classification is unstable across runs at temperature 0, promote to multi-shot or add confidence scoring as a follow-up slice. → memory/SPEC.md §Acknowledged Blind Spots row "V3.1 classifier multi-run determinism" already names this; current mitigation is the per-need Re-run button shipping in slice 6. -``` - -No spike required — both LLM seam (`generateText` via the existing AI SDK adapter) and the prompt registry (`prompt-loader` + markdown assets) are already in production use. The classifier is novel only in *what* it classifies, not *how* it talks to the model. - -### Acceptance Criteria - -``` -✓ schema: reconciliation_need.test.ts — three new nullable columns (agent_status TEXT, agent_classification TEXT, agent_proposal TEXT) round-trip through openReconciliationNeed* and the listing query; defaults are all null on existing rows -✓ schema: migration 0019_reconciliation_need_agent_columns.sql is hand-written + journal entry added (per HANDOFF.md non-TTY caveat); structural test asserts column presence -✓ classifier (state-machine, stubbed LLM): reconciliation-agent.test.ts — happy path null → queued → classifying → classified with label='auto-confirm' on a leaf need; auto-edit returns a non-null proposal; substantive returns null proposal -✓ classifier (state-machine, stubbed LLM): reconciliation-agent.test.ts — failure path null → queued → classifying → failed when the stub throws; agent_classification stays null; agent_proposal carries the error message -✓ classifier (state-machine, stubbed LLM): reconciliation-agent.test.ts — invalid label from the stub transitions to failed with a 'Parse error: ...' proposal; agent_classification stays null -✓ classifier (pure): reconciliation-agent.test.ts — classifyNeed is pure: same (need, source, target, relationKind) input + stubbed LLM returning the same string yields the same { classification, proposal } output -✓ route: reconciliation-agent-route.test.ts — POST .../run-agent returns 200 with { classifiedCount, failedCount } and persists agent_status/classification on every previously-null open need; rows already classified stay untouched -✓ route: reconciliation-agent-route.test.ts — POST .../run-agent on a spec with zero open needs returns 200 with { classifiedCount: 0, failedCount: 0 } -✓ route: reconciliation-agent-route.test.ts — POST .../run-agent on a missing or non-owned spec returns the same 404 / 403 shape as the existing reconciliation-needs route (auth parity) -✓ wire: reconciliation-needs-route.test.ts — GET .../reconciliation-needs response now exposes agent_status, agent_classification, agent_proposal on every row; existing test fixtures stay typesafe by adding null defaults to makeNeed -✓ wire: ReconciliationNeedRecord (shared type) gains the three fields with doc-comments naming the lifecycle and label vocabulary -``` - -### Verification Approach - -``` -- Inner: deterministic state-machine tests over the lifecycle with a stubbed classifier (per SPEC.md row 553); structural unit tests for new schema columns + classifyNeed purity; route-level tests for the run-agent endpoint and the listing-endpoint wire-shape change. `npm run verify`. - -- Middle: golden-fixture corpus of (source change, target content, relation kind) → expected classification tuples, evaluated against the live AI SDK adapter behind a recorded-or-live model. **Seed bootstrap (this slice ships the seed; the corpus harness itself is built incrementally as classification probes lands)**: - 1. (no semantic source change, target unchanged, depends_on) → auto-confirm - 2. (rename "user" → "customer" in source, target verbatim references "user", refines) → auto-edit, proposal replaces "user" with "customer" in target text - 3. (constraint loosened in source, target encodes the older constraint, constrains) → substantive, proposal null (judgment required) - 4. (added counterexample to source, target unaffected, illustrates) → auto-confirm - 5. (verifier replaced in source, target derives_from old verifier, derived_from) → substantive - Per SPEC.md §Verification Design row 554, the corpus lives outside `npm run verify` (recorded-or-live model adapter). The five seed tuples land as a test-resources directory next to reconciliation-agent.test.ts so slice 6/7 can extend them; the harness that runs the corpus against the live adapter is its own slice (not this one). This slice's middle-loop deliverable is **the seed corpus + the prompt asset that the corpus exercises**, not the runner. - -- Outer: deferred to after slice 7 (UI actions land), per SPEC.md row 555. The walkthrough on dense specs validating A88 is the only ring that says whether grouping helps. -``` - -### Promotion notes - -- New invariant lands as **I114** in SPEC.md §Invariants: lifecycle + label vocabulary + structural recoverability (`agent_proposal` text-only, never auto-applied; `failed` is reachable from `classifying` and is recoverable via per-need Re-run in slice 6). Add I114 row to SPEC.md during build (per ln-scope traceability rule for full cards). The "planned I114" placeholder already in SPEC.md rows 553-554 gets replaced with the live id. -- No new D### unless the in-process loop turns out to be the wrong shape under outer-loop walkthrough; per HANDOFF.md it stays a deliberate MVP choice with a documented promotion trigger. -- A88 stays open — this slice does **not** validate it; slice 7 outer-loop walkthrough does. - ---- - -## 6 — V3.1 agent client UI — Run agent + status chips + per-row Re-run (client + small server seam) — `next` (full scope card) - -### Target Behavior - -A user with open `reconciliation_need` rows can trigger the V3.1 classifier from the Pending review header, see each row's live classification state as a chip (`null` / `queued` / `classifying` / `auto-confirm` / `auto-edit` / `substantive` / `failed`), and re-run classification on any single row from its action rail — all within the existing `` polished surface, with no per-class action behavior yet (those land in Card 7). - -### Boundary Crossings - -``` -→ header (src/client/components/pending-review-section.tsx — Card 4 polished surface) -→ (NEW small component; lives in header next to the existing kind-counts row) -→ POST /api/specifications/:id/reconciliation-needs/run-agent (existing, Card 5) -→ useSpecificationOpenReconciliationNeeds query (existing; verify it returns agent_* fields per Card 5 slice 4) -→ conditional refetchInterval (1000ms while ANY need.agent_status ∈ {'queued','classifying'}, otherwise off) -→ Per-row (NEW; six variants; reuses kind-chip pattern from Card 4) -→ Per-row [↻ Re-run] action button (visible only when agent_status ∈ {'classified','failed'}) -→ POST /api/specifications/:id/reconciliation-needs/:needId/reset-agent (NEW minimal route; idempotent agent_status → null + immediate POST run-agent for that row) -→ db.resetReconciliationNeedAgentFields(needId) (NEW one-liner helper; clears agent_status / agent_classification / agent_proposal on one row) -→ Optimistic chip swap on Run / Re-run click (chip flips to 'queued' immediately; polling reconciles) -``` - -### Risks and Assumptions - -- RISK: 1-second polling against the listing endpoint while several needs classify in parallel could hammer the server → MITIGATION: conditional polling — `refetchInterval: 1000` ONLY when at least one open need is `queued` or `classifying`; idle state has no polling. Single-digit need counts per spec (existing N+1 caveat from Card 3) keep this cheap. If outer-loop reveals load issues, drop to 2000ms or switch to event-stream notification (a Card 6.5 follow-up, not pre-scoped). -- RISK: Per-row Re-run requires resetting `agent_status` before the run-agent route's `IS NULL` filter will re-process the row → MITIGATION: dedicated `POST .../:needId/reset-agent` endpoint that clears the three agent_* fields on one row in a single statement, then calls the same classifier pipeline used by the spec-level run-agent route (factor the inner loop into a shared helper if cleanest). One transaction; idempotent. -- RISK: "Run agent" button state must reflect global agent state (running / idle / partial) and disable correctly to prevent double-runs → MITIGATION: derive button state from query data (`hasInflight = some need has agent_status ∈ {queued, classifying}`); the spec-level run-agent route is already idempotent on rows where `agent_status IS NOT NULL`, so duplicate clicks during polling are harmless but disabling avoids confusing UX. -- ASSUMPTION: The GET listing endpoint already exposes `agent_status`, `agent_classification`, `agent_proposal` per Card 5 slice 4 [2026-05-08]. → VALIDATE: read `src/server/reconciliation-needs-route.ts` listing path before implementing; if not exposed, this is a one-line extension to the response shape and the `ReconciliationNeedRecord` shared type. → memory/SPEC.md I114 (lifecycle persistence). -- ASSUMPTION: Six chip variants (null / queued / classifying / auto-confirm / auto-edit / substantive / failed) cover all states the user sees today; the `failed` chip carries enough information (icon + label + tooltip showing `agent_proposal` error text) for the user to decide whether to Re-run without opening a separate panel. → VALIDATE: outer-loop walkthrough on dense graphs; if `failed` rows need richer detail, add a hover-popover in Card 7 along with the auto-edit DiffPopover. -- ASSUMPTION: Card 6 alone is enough surface to validate A88 (does grouping help legibility?). → VALIDATE: outer-loop walkthrough on a dense real spec immediately after build; the qualitative read is whether the chip vocabulary makes the queue actionable at a glance vs. the V3.0 flat list with per-row Resolve. → memory/SPEC.md A88. If A88 invalidates, route through `/ln-spec` before scoping Card 7. - -### Acceptance Criteria - -``` -✓ pending-review-section.test.tsx — Run agent button renders in the section header when ≥1 open need has agent_status=null; clicking dispatches exactly one POST .../run-agent call -✓ pending-review-section.test.tsx — Run agent button is disabled (and tooltip explains) while any open need has agent_status ∈ {'queued','classifying'}; re-enabled when all rows reach a terminal state -✓ pending-review-section.test.tsx — progress strip ("Agent: M of N classified") renders only while in-flight; counter derives from agent_status states; hidden at rest -✓ pending-review-section.test.tsx — each row renders a ClassificationChip matching agent_status (seven fixtures for null / queued / classifying / auto-confirm / auto-edit / substantive / failed); failed chip shows agent_proposal tooltip on hover -✓ pending-review-section.test.tsx — per-row Re-run button visible only when agent_status ∈ {'classified','failed'}; click sends POST .../:needId/reset-agent; resulting chip transitions queued → classifying → terminal within the polling window -✓ specification-open-reconciliation-needs.test.tsx — refetchInterval is 1000ms when any need is queued/classifying; falsy when all needs are terminal or null (no polling at rest) -✓ reconciliation-needs-route.test.ts — new POST .../:needId/reset-agent: 200 + classifier dispatch on valid open need; 404 on unknown need; 200 + no-op when row is already null; auth parity with existing /resolve route -✓ classification-chip.test.tsx — pure render snapshot of all six variants; accessibility labels are present and distinguishable -✓ npm run verify — no unrelated regressions in existing pending-review-section, reconciliation-needs-route, or query tests -``` - -### Verification Approach - -``` -- Inner: component test extensions in pending-review-section.test.tsx for the seven UI cases above; query-layer test for refetchInterval gating; route-level test for the new reset-agent endpoint; standalone ClassificationChip variant tests. `npm run verify` covers all. -- Middle: not applicable — no LLM judgment changes; classifier prompt unchanged from Card 5. The middle-loop golden-fixture corpus seeded in Card 5 stays as-is. -- Outer: manual walkthrough — open a spec with a dense knowledge graph (≥10 items with mixed typed edges); make a hard-impact edit that opens 4–6 needs across both `supersedes` and `needs_confirmation` kinds; click Run agent; observe chips cycle through queued → classifying → terminal within ~5s; re-run two rows manually; ASSESS A88: can the user interpret the classification pattern without coaching? Is the queue more actionable than V3.0's flat list with per-row Resolve? Capture qualitative notes — these are the A88 signal that gates Card 7 scoping. -``` - -### Promotion checklist - -- [ ] Requirement change? **No** — Requirement 10 already names the HITL contract. -- [x] Assumption change? **Maybe** — A88's outer-loop validation lands here. Build does not change A88; the walkthrough either upholds it, refines it (e.g., "grouping helps but only with class-count summary in header"), or invalidates it (e.g., "users miss substantive needs in a mixed-chip list"). No SPEC update at scope time; route through `/ln-spec` post-walkthrough only if invalidated. -- [x] Non-trivial design decision? **Yes** — (a) chip vocabulary (six variants + tooltip-driven failure detail), (b) per-row reset endpoint shape (`POST /reset-agent` vs. generic `PATCH /agent-status` mutation), (c) polling cadence + conditionality. All three are reversible inside the route + component contracts; document choices in the commit body. No D### needed unless walkthrough reveals a load-bearing constraint. -- [ ] New seam-level invariant? **No** — I114 (classifier lifecycle) already established by Card 5; Card 6 surfaces it without changing it. The `agent_proposal` text-only / never-auto-applied invariant remains untouched (Card 7 will lean on it for `auto-edit` Apply). -- [x] Crosses >2 major seams? **Yes** — listing route (verify-only), new reset-agent route, query hook, component surface, three new sub-components. Justifies full scope. -- [ ] First touch in unfamiliar seam? **No** — same code paths as Card 4 polish + Card 5 backend. -- [ ] Cannot name containing seam? **No** — `pending-review-section.tsx`, `useSpecificationOpenReconciliationNeeds`, `reconciliation-needs-route.ts`. - -→ Stays full scope on (b) + (c) + multi-seam crossing. SPEC.md unchanged at scope time; reconcile only if A88 walkthrough invalidates. - ---- - -## 7 — V3.1 per-class actions + bulk (client + listing extension) — `done` (full scope card) - -### Target Behavior - -Each classified row exposes the action appropriate to its classification (`auto-confirm` → Confirm; `auto-edit` → Apply suggested / Skip with a `` preview; `substantive` → Open side-chat) and the section header exposes two bulk actions ("Confirm all (N)" and "Apply all suggested (N)") that iterate client-side over the existing per-row endpoints, closing V3.1's user-facing surface end-to-end. - -### Boundary Crossings - -``` -→ (src/client/components/pending-review-section.tsx — Card 6 surface) -→ Per-row action rail (new buttons mounted next to existing Resolve / Edit / Re-run) - ├── auto-confirm → Confirm → resolveReconciliationNeedRequest (existing) - ├── auto-edit → Apply → editKnowledgeItemRequest({ content: agent_proposal }) + resolveReconciliationNeedRequest - ├── auto-edit → Skip → resolveReconciliationNeedRequest (existing) - ├── auto-edit → View proposal → opens with target current → agent_proposal - └── substantive → Open side-chat → useSideChat().openFor({ kind, id, referenceCode, content }) -→ Header bulk row (next to Run agent / progress strip) - ├── Confirm all (N) → iterate auto-confirm rows → Promise.allSettled(resolveReconciliationNeedRequest…) - └── Apply all suggested (N) → iterate auto-edit rows with proposals → Promise.allSettled(editKnowledgeItemRequest + resolveReconciliationNeedRequest…) -→ GET /api/specifications/:id/reconciliation-needs listing endpoint extension - └── ReconciliationNeedView gains target_item_kind + target_reference_code (closes Card 4 deferred follow-up) -→ ReconciliationNeedRecord (shared type) gains the two read-time fields -``` - -### Risks and Assumptions - -- RISK: Bulk client-side iteration can fire N concurrent PATCHes against the same spec, which may stress the server's edit-route under hard-impact cascade (one bulk Apply could open many needs) → MITIGATION: serialize bulk operations (await each request before starting the next) rather than parallelize; the user-perceived latency is dominated by network round-trips, not parallelism, and serialization keeps cascade-opening predictable. Trade-off documented in commit body. -- RISK: Apply suggested writes the raw `agent_proposal` into the target item without giving the user a chance to edit it → MITIGATION: the View-proposal `` lets the user preview before clicking Apply; if Card 7 walkthrough surfaces that the user wants to edit-before-apply, promote that to a follow-up card (would re-use Card 3's inline-textarea machinery seeded with `agent_proposal`). -- RISK: Substantive Open-side-chat opens an ephemeral conversation that disappears on refresh (V4a side-chat persistence isn't shipped yet) → MITIGATION: accepted for V3.1; the side-chat is anchored to the target item with its current content as pinned context, which is enough for one substantive walk. V4a persistence makes the same affordance durable without changing the entry contract. -- RISK: The listing-endpoint extension touches a shared response shape (`ReconciliationNeedView`, `ReconciliationNeedRecord`) consumed by existing tests and component fixtures → MITIGATION: both new fields are nullable on the shared type; existing fixtures (`reconciliation-need-fixtures.ts`) default both to null, so older tests stay green. The Confirm / Apply / Open-side-chat buttons all need-check for non-null before enabling. -- ASSUMPTION: `useSideChat()` returning `null` (no SideChatHost mounted) is the right gate to hide the Open-side-chat button. → VALIDATE: outer-loop walkthrough on a route that has SideChatHost; render unit test that asserts the button hides when the context is null. -- ASSUMPTION: Confirm-all / Apply-all-suggested are scoped per-classification (not "everything the agent classified") because the actions are semantically distinct. → VALIDATE: outer-loop walkthrough; if users want a single "apply everything" affordance, that's a follow-up after seeing the multi-button feel. - -### Acceptance Criteria - -``` -✓ reconciliation-needs-route.test.ts — GET .../reconciliation-needs response now includes target_item_kind and target_reference_code on every row (defaults null when target item missing) -✓ pending-review-section.test.tsx — auto-confirm row exposes Confirm button; click calls resolveReconciliationNeedRequest once -✓ pending-review-section.test.tsx — auto-edit row exposes View-proposal + Apply + Skip; View opens the existing DiffPopover with target current vs agent_proposal; Apply calls editKnowledgeItemRequest with content=agent_proposal then resolveReconciliationNeedRequest; Skip calls resolveReconciliationNeedRequest only -✓ pending-review-section.test.tsx — substantive row exposes Open side-chat button; click invokes useSideChat().openFor with the target item's kind / id / referenceCode / content -✓ pending-review-section.test.tsx — substantive row hides Open-side-chat when useSideChat() returns null (no host mounted) -✓ pending-review-section.test.tsx — header exposes "Confirm all (N)" only when ≥1 auto-confirm row exists; click resolves each auto-confirm row serially -✓ pending-review-section.test.tsx — header exposes "Apply all suggested (N)" only when ≥1 auto-edit row with non-null agent_proposal exists; click applies each in sequence -✓ pending-review-section.test.tsx — auto-edit rows lacking agent_proposal are excluded from "Apply all suggested" iteration but still expose Skip / Resolve individually -✓ npm run verify — no regressions across server route tests, query tests, component tests -``` - -### Verification Approach - -``` -- Inner: pending-review-section.test.tsx for the per-class action affordances, bulk header visibility, and dispatch correctness; reconciliation-needs-route.test.ts for the listing-extension fields -- Middle: not applicable — no LLM judgment changes; classifier output (auto-confirm / auto-edit / substantive) is the input contract, not the output -- Outer: manual walkthrough — open a spec with mixed classifications (≥2 auto-confirm, ≥2 auto-edit, ≥1 substantive); use per-row actions on a few; use one bulk action; observe the row leaves the section atomically on each resolve. THIS is the second A88 signal: do the action affordances close the loop legibly, or do users hesitate on which button to use? Capture qualitative notes — this is where V3.1 ends. -``` - -### Promotion checklist - -- [ ] Requirement change? **No** — Requirement 10 already names the HITL contract for accept-on-target / edit-target / dismiss. -- [x] Assumption change? **Yes** — A88 (Path 1 sufficiency without agent) gets its second outer-loop validation here; together with Card 6's walkthrough, this is the canonical signal on whether agent grouping helps. No SPEC update at scope time; route through `/ln-spec` post-walkthrough only if invalidated. -- [x] Non-trivial design decision? **Yes** — (a) bulk semantics (serialize vs parallelize), (b) Apply path bypasses the inline-edit affordance (raw application of agent_proposal), (c) substantive handoff ships without persistence (V4a-blocked but functional). All reversible. Document in commit bodies; no D###. -- [ ] New seam-level invariant? **No** — reuses Card 5's I114 and Card 6's polling contract. -- [x] Crosses >2 major seams? **Yes** — listing route extension, shared type, component, side-chat context. Full scope. -- [ ] First touch in unfamiliar seam? **No** — same code paths as Cards 3 / 5 / 6. -- [ ] Cannot name containing seam? **No**. - -→ Full scope card. SPEC.md unchanged at scope time. After Card 7 + walkthrough lands, run `/ln-sync` to retire CARDS.md (frontier exhausted) and update PLAN.md's §Recently Completed. - ---- - -## Not yet queued - -(Nothing remaining for FE-674 after Card 7. Next frontier is the V4a side-chat persistence promotion already in PLAN.md §Next item 3 — re-scope via `/ln-scope` once V3.1 closes and the §349 anchor decision is made.) diff --git a/memory/PLAN.md b/memory/PLAN.md index 9c069c7f..2a8f3d41 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -1,205 +1,400 @@ - + -## Active - -1. **Continuous workspace / phase-addressable interview surface** — cumulative center pane with realized phase sections, one chat runtime per specification, sidebar section navigation, scroll/focus behavior, and the single actionable frontier preserved at the current reachable phase. - - Why now / unlocks: workflow read/write ownership is extracted (FE-616); the multi-chat substrate ships chat containers below the specification, so continuous workspace can adopt one visible runtime without smuggling in a second durable workflow model. Side-chat V3.0 + V3.1 just closed, so the cascade surface is stable; no remaining V2/V3 placeholder blocks the workspace work. - - Traceability: A58; D86, D87, D110, D113, D114; I24, I102. - - Design doc: `docs/design/CONTINUOUS_WORKSPACE_HYBRID.md`. - -## Next +# Plan -2. **Side-chat persistence — V4a (multi-chat Phase 2 substrate)** — side-chat client persists its turns into the existing `chat` / `turn` tables with `chat.kind='side_chat'`, loads prior side-chat sessions on remount, and surfaces an "Old chats" affordance per pinned item / spec. Phase 1 substrate (FE-697, [2026-05-06]) already shipped the `chat` table, nullable `turn.chat_id`, and `specification.primary_chat_id`; nothing schema-side blocks this — only the client write path and a per-spec session listing remain. This is the V4a half of FE-675; V4b (item versioning + branched exploration) stays in Horizon, gated on FE-701. - - Why later: side-chat threads stay in-memory through V3 by design (SIDE_CHAT.md §5.3); applied patches and `reconciliation_need` rows already persist independently. With V3.1 closed and the cascade surface settled, V4a becomes the next user-facing surface to light up — but Card 1 (server-side persistence) and Cards 2+ both depend on MULTI_CHAT.md §349's open question (anchor field on `chat` row vs deferred `chat_focus` table); route through `/ln-spec` or `/ln-spike` before scoping Cards 2+. - - Linear: FE-675 (umbrella; per-substrate phase note on FE-675 rather than a new sub-ticket). - - Traceability: Requirement 39; A82, A83; D138. - - Design doc: `docs/design/MULTI_CHAT.md` §10 Phase 2; `docs/design/SIDE_CHAT.md` §9 V4 row (V4a half only). - -3. **Prompt/context scenario substrate (FE-698 continuation)** — continue the FE-698 substrate after the foundation slices: packaged prompt assets, the first observer context-pack path, deterministic no-provider scenario capture, and the agent mutation-surface audit are complete, but FE-698 still has live follow-up work. - - Linear: FE-698. Pi harness spike: FE-635. - - Status: partially complete, not retired. Completed foundation: prompt registry + markdown prompt loading, observer-capture and web-research context-pack composition, scenario runner capture skeleton / seeded snapshots, prompt-source explicitness, mutation-surface audit / terminology cleanup, capability registry metadata surfaced in scenario artifacts, fake-adapter web-research scenario execution capture, a probe-only Anthropic AI SDK scenario adapter, and safe scenario execution error summaries. Outstanding FE-698 follow-up: more context-pack scenarios beyond observer capture / web research, broader read-only/proposal-only harness execution probes, and/or the Pi adapter spike. OpenRouter/default-provider setup is deferred to the first-run provider setup frontier. - - Why now / unlocks: multi-chat removes the single transcript spine as default agent context, while ontology, observer, candidate-spec, web research, behavioral-kernel, architect, and post-spec decomposition work all need shared prompt/context machinery. This prevents every future agent feature from inventing its own prompt-context hack and lets LLM-heavy flows be tested before UI work. - - Recommended shape: define the next FE-698 slice around one of the remaining prompt/context seams. Likely candidates: additional context-pack scenarios for next-question, candidate-spec, web research, reconciliation, architect, or decomposition probes; a narrow execution-probe path using the existing Anthropic API key / fake adapters; or the FE-635 Pi SDK/RPC spike. Keep provider credential UX, shared production AI runtime/provider resolution, execution adapters as product truth, and durable mutating handlers out of scope. The key rule is that future agent-originated writes must go through Brunch-owned handlers rather than direct ORM access. Registry naming should follow `docs/design/AGENT_MUTATION_SURFACE.md`: product nouns plus semantic verbs, with intent-graph mutations converging on `changeset.submit` / `changeset.apply` and atomic `change` variants rather than many ad hoc mutating tools. - - Verification approach: inner-loop prompt-loader/context-pack unit tests plus seeded scenario snapshots; middle-loop multi-run prompt probes should be designed before judging generative quality. - - Traceability: Requirements 40, 41, 42; A84, A85, A86, A87; D139, D140, D141, D142, D143; I112. - - Design docs: `docs/design/INTENT_SPEC_EVOLUTION.md`; `docs/design/MULTI_CHAT.md`; `docs/design/AGENT_MUTATION_SURFACE.md` (agent-originated mutation audit and registry input); Pi SDK docs as spike input. - -4. **Intent graph semantics + progressive checkability foundation** — refine the ontology and relation policy so the graph can represent invariants, examples/counterexamples, constraint subtypes, narrowed decisions, witness strength, and checkability gaps as source/destination material for future generative features. - - Linear: FE-700. - - Why now / unlocks: candidate generation, behavioral kernels, architect proposals, and downstream verification-aware decomposition need a sharper semantic target than the current exploration/review ontology. - - Recommended shape: add `invariant` and `example` as first-class durable kinds; subtype examples (positive / negative / edge-case / trace / not-relevant); narrow `decision` per the decision-capture criteria; enrich `constraint` subtypes (non_goal / scope / technical / policy / resource / compatibility / environmental); add `criterion` subtypes (acceptance / test / manual_review / runtime_check / proof / observability) and `invariant` subtypes (state / transition / authority / provenance / consistency / security / data_integrity); add `checkability` and `witness strength` fields on intent items per the progressive-checkability ladder; introduce the five-family relation taxonomy (justification / dependency / boundary / refinement / verification) plus first-class negative relations (`rules_out`, `counterexample_for`); add edge epistemic metadata (`support`, `status`, `provenanceTurnId`, `rationale`); land a relation-policy registry whose axes distinguish `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, and `weak_suggestion` participation. Full enumerations and worked examples in `docs/design/INTENT_GRAPH_SEMANTICS.md`. - - Verification approach: corpus/fixture observer probes comparing old vs refined ontology; graph-review manual assessment for precision/noise; context-pack probe outputs must show authority and witness labels. - - Traceability: Requirement 38; A77, A78, A80, A81, A84; D134, D136, D137, D139, D140. - - Design docs: `docs/design/INTENT_GRAPH_SEMANTICS.md` (canonical reference); `docs/design/INTENT_SPEC_EVOLUTION.md` (broader synthesis context). - -5. **Generative prompt probes before UI** — use the scenario substrate to prototype web research, behavioral kernels, candidate-spec completion, and post-spec design/oracle/decomposition flows against intent-graph fixtures before committing product surfaces. - - Linear: FE-702 for post-spec decomposition probes; FE-649 and FE-640 are productization children under FE-698. - - Why now / unlocks: proves whether progressive checkability and graph-first context can be taught to agents, and de-risks the next generation of UI features. - - Recommended shape: start with one web-research context/query scenario, the first three behavioral kernels (`state & lifecycle`, `containment & topology`, `authority & capability`) per the v0.1 kernel ontology, candidate-spec set generation, and exploratory oracle/decomposition scenarios inspired by `.agents/skills/ln-design/` and `.agents/skills/ln-oracles/`. Each kernel probe should follow the kernel-card structure (detection signals, contrastive question templates, artifact schema, validators) and emit typed intent items / intent edges per `docs/design/INTENT_GRAPH_SEMANTICS.md`. Outputs remain probe artifacts or proposal-only structures, not committed graph mutations. - - Verification approach: scenario-runner fixtures, raw output review, structured parse validation, and qualitative scorecards before product UI. - - Traceability: Requirements 20, 21, 31, 32, 40, 41; A67, A68, A80, A85, A87; D126, D127, D139, D141. - - Design docs: `docs/design/BEHAVIORAL_KERNELS.md` (kernel ontology + cards); `docs/design/INTENT_GRAPH_SEMANTICS.md` (artifact target). - - -## Horizon - -### Intent graph and reconciliation - -- **Semantic changeset ledger** — make semantic mutations first-class once non-primary surfaces can change intent-graph truth. - - Linear: FE-701. - - Recommended shape: one `changeset` contains one or more atomic `change` records. Use `changeset` / `change` as canonical schema and operation vocabulary; `patch` / `patch_change` remain historical design-doc terms only. Connect `reconciliation_need.caused_by_changeset_id` once changesets exist. - - Depends on: multi-chat substrate + reconciliation needs; prompt/context context packs for reconciliation scenarios. - - Traceability: A71, A82, A83; D135, D138, D140. - - Design doc: `docs/design/PATCH_LEDGER.md` (historical file name; future vocabulary is changeset/change). - -- **Relation-first observer capture enrichment** — after the next ontology/relation-policy probes, broaden observer relationship extraction across the refined ontology where edge support and operational participation are understood. - - Recommended shape: keep `runObserver()` as the public turn-owned seam, but feed it scenario-specific context packs and validate output through the relation-policy registry. The FE-639 first cut has landed; remaining work should be driven by corpus/manual proving. - - Depends on: prompt/context substrate; intent graph semantics + progressive checkability foundation. - - Traceability: Requirements 30, 38, 40; A66, A81, A84; D125, D136, D137, D139, D140; I109. - -- **Architect / generator loop** — autonomous agent that iterates over the intent graph and proposes semantic changes for HITL review through the same future changeset / reconciliation pathway as user-driven edits. - - Recommended shape: keep productized architect proposals behind multi-chat + reconciliation + semantic changesets; use the scenario substrate for shadow/proposal-only probes first. - - Traceability: A73, A85, A87; D139, D141; depends on chat containers + reconciliation needs and semantic changeset ledger. - -- **Side-chat V4b — item versioning + branched exploration** — once the patch ledger lands, item versioning unblocks dangling-annotation repair and soft-edit audit; branched exploration lets drill-downs / past-turn edits / revisits coexist with the original chain. FE-675 V4b half. - - Depends on: FE-701 patch ledger; V4a side-chat persistence (Next item 2). - - Traceability: A72, A73, A85; D139, D141. - - Design doc: `docs/design/SIDE_CHAT.md` §9 V4 row (V4b half). - -### User-facing capabilities - -- **First-run provider setup** — make missing LLM credentials visible on the dashboard, add a shared AI runtime provider seam for interviewer / observer model construction, support UI-entered keys through XDG-compliant user auth state, and evaluate whether OpenRouter should become the preferred onboarding provider while preserving Anthropic-specific capabilities or explicit degradation. - - Linear: FE-633 covers the OpenRouter/default-provider part; dashboard credential UX + XDG key storage may need a sibling issue if split from provider proving. - - Recommended shape: prove the provider resolver first with current Anthropic behavior, then spike OpenRouter against tool use, structured output, and reasoning/thinking options before making it the default. The dashboard should expose credential status without leaking secret values and offer setup before the user starts a specification. - - Traceability: Requirements 34, 35, 36; A74, A75; D130, D131, D132; I106. - -- **Workspace hygiene / `.brunch/` gitignore assist** — detect whether generated local state is already ignored and, with explicit confirmation, add an idempotent `.gitignore` entry or create `.gitignore` when absent. - - Linear: FE-648. - - Recommended shape: keep this as a deterministic local mutation with preview/confirmation semantics; it can ship independently, but the dashboard is the natural surface because it already explains workspace binding and first-run setup. - - Traceability: Requirement 37; A76; D133; I107. - -- **Productized web research capability** — web search and page-fetch tools as interviewer-invoked context gathering, surfaced as preface cards after the scenario substrate proves query framing, tool ergonomics, and provisional-context handling. - - Linear: FE-649. - - Depends on: prompt/context scenario substrate and web-research probe. - - Traceability: Requirements 20, 21, 40, 41; D99, D112, D139, D142. - -- **Dashboard result summaries and completeness metrics** — progress visibility across specifications. - -- **Two-axis interview framing** — adapt interviewer setup and questioning to the full `greenfield <> brownfield` by `end-to-end build <> incremental feature` matrix instead of treating partial-scope work as a special case. - - Linear: FE-638. - - Traceability: Requirement 29; A65; D124. - -- **Productized candidate-spec completion assist** — replace skip-only remainder handling with a `fill in the rest for me` path that generates candidate specs, implications, tradeoffs, and likely typed knowledge for reaction-based refinement after prompt probes prove useful output. - - Depends on: prompt/context scenario substrate; intent graph semantics + progressive checkability foundation; candidate-spec generation probe. - - Traceability: Requirement 31, 40; A67, A77, A78, A85; D126, D134, D136, D139. - -- **Progressive detail / recursive deflation** — support broad-pass interviewing with explicit next-level-of-detail actions rather than one uniform depth-first drill-down. - - Linear: FE-637. - - Recommended shape: pair ordinary grounding/design question turns with a turn-owned breadth-skeleton artifact that makes current coverage visible and exposes a structured detail reaction (`deepen this area`, `continue broad pass`, `sufficient for now`). The chosen reaction should steer the next same-phase frontier turn instead of introducing a separate detail workflow. - - First cut should optimize for `broad question -> choose one area to deepen next -> focused successor question -> refreshed breadth skeleton`, while keeping the same detail-focus intent reusable later from chat or graph surfaces. - - Traceability: Requirement 32; A67, A68; D127. - -- **Spatial canvas layout for graph view** — add the spatial DAG layout as a second layout choice inside graph mode, alongside the structured-list route. Same projection seam, same intent contract; only the layout strategy changes. - - Recommended shape: a layout switch inside the existing `/specification/$id/graph` route that transforms the same `EntitiesData` projection into a spatial scene with viewport / selection / focus / path-highlighting. First cut should optimize for `select node -> inspect -> launch refinement` through the multi-chat substrate. - - Depends on: graph view structured-list ship. Richer node actions depend on multi-chat / reconciliation rather than the old side-chat conceptual roadmap. - - Traceability: Requirement 33; A69; D128. - -- **Graph view active-path render filter + scope toggle** — render only active-path items by default in graph view, with a `Show all` toggle in the header that flips to the full whole-spec set. Both subsets project from the same in-memory `mode=project-wide` data; no second fetch. - - Depends on: server data-layer change for active-path membership exposure. - - Traceability: Requirement 33; D128, D129; I102. - -### Infrastructure / tooling - -- **Structured development spec registry** — prototype file-backed canonical spec records, deterministic checks, generated markdown views, and task-local slices for Brunch's own development workflow (the `ln-*` skill family). - - Status: design horizon, not a migration commitment. Self-tooling experiment for the dev layer; not part of the product roadmap. - - Recommended shape: follow the `memory/spec/{schema,records,generated,tools}/` trajectory and the 5-step migration path (stable IDs → sidecar files → stop editing generated md → `spec:check` in the verify gate → task-local slices). First-adopter candidate: a bounded sub-area such as the multi-chat substrate's records, not the full SPEC. - - Traceability: D134. - - Design doc: `docs/design/DEV_WORKFLOW_EVOLUTION.md` (canonical reference, including the three-layer framing and convergence question); `docs/design/INTENT_SPEC_EVOLUTION.md` (broader synthesis context). - -- **Portability boundaries** — split durable store/read-model, interview session runtime, and workspace capability provider if Brunch targets hosted, remote, embedded, or sandbox-backed operation. - - Status: deferred. Some enabling seams already exist (query domains, workflow projector, no persisted `cwd` on specifications), but adapter-backed portability is not on the live roadmap. - - Deep design source: `docs/design/PORTABILITY_BOUNDARIES.md`. -- Headless interview driver for scripted end-to-end probes. -- MCP server adapter for core operations. -- Git-friendly file-based persistence representation for diffable exported specs. -- Typed fixture-builder convergence for happy-path tests. +## Context + +The interaction model is mature: four-phase interview, interviewer-autonomous question format, phase-agnostic preface cards with workspace exploration, structured review with per-item commenting, observer knowledge extraction, workflow ownership extraction, distribution hardening, graph view's structured-list peer route, the first relation-first observer capture seam, the multi-chat substrate, side-chat V3.0 hard-impact cascade, and side-chat V3.1 agent-grouped reconciliation resolution all ship as working product. + +The next product arc is a **continuous conversational workspace** plus a stronger semantic/generative substrate. Continuous workspace is active in a parallel lane and gives the chat runtime a stable phase-addressable host. The FE-705 branch contributes an integration substrate — a local agent capability CLI and external LLM-as-user probe harness — that should be reconciled into main before graph-review and scenario-options work depends on generated completed-spec fixtures. After that, the highest-coordination work is intent-graph semantics and the semantic changeset ledger; lower-coordination provider, gitignore, and web-research work can proceed in parallel. + +The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agent-mutation design notes are reconciled into one direction. `docs/design/MULTI_CHAT.md` is the substrate document. `docs/design/SIDE_CHAT.md` describes side-chat V1 / V2 / V3.0 / V3.1 / V4 phasing on top of that substrate. `docs/design/PATCH_LEDGER.md` remains historical deeper design pressure for semantic mutation history, but canonical future-facing vocabulary is `changeset` / `change`. The product-layer ontology trajectory is split out as `docs/design/INTENT_GRAPH_SEMANTICS.md` and `docs/design/BEHAVIORAL_KERNELS.md`; broader synthesis lives in `docs/archive/design/INTENT_SPEC_EVOLUTION.md`. FE-705's branch-local strategy/proposal notes add scenario options, graph-review oracle, chat-local strategies, and concern/dependency mapping; those notes should become a canonical design doc when the branch is integrated. The dev-layer self-tooling trajectory lives in `docs/design/ln-skills/EVOLUTION.md`. + +## Sequencing + +### Active + +1. `continuous-workspace` — in progress in parallel lane — stable phase-addressable host for the chat runtime. +2. `agent-fixture-substrate` — branch-complete off main, reconciling — FE-705 integration substrate for JSONL agent capability CLI and LLM-as-user probes. + +### Next + +1. `intent-graph-semantics` — highest-coordination semantic substrate after FE-705 reconciliation. +2. `changeset-ledger` — semantic history spine needed before canonical proposal acceptance and productized scenario options. +3. `graph-review-scenario-options` — artifact-only critique/probe lane; can advance in parallel with FE-700 if it does not commit canonical graph truth. +4. `productized-scenario-options` — user-facing acceleration surface after FE-700 semantics, FE-701 changesets, and graph-review probes. + +### Parallel / Low-conflict + +- `first-run-provider-setup` — provider/key UX and runtime seam can progress independently of semantic-stack work. +- `workspace-gitignore-assist` — small workspace hygiene surface with low overlap. +- `productized-web-research` — waits on prompt/context scenario substrate for probe quality, but can remain separate from semantic schema work. + +### Horizon + +- `relation-first-observer-enrichment` +- `architect-generator-loop` +- `server-mini-library-compartmentalization` +- `side-chat-persistence-v4a` +- `side-chat-v4b-item-versioning` +- `dashboard-summaries` +- `spatial-graph-layout` +- `graph-view-active-path-filter` +- `mcp-adapter` +- `file-based-persistence` +- `typed-fixture-builder-convergence` +- `structured-development-spec-registry` +- `portability-boundaries` + +## Frontier Definitions + +### continuous-workspace + +- **Name:** Continuous workspace / phase-addressable interview surface +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** in-progress +- **Objective:** Replace per-phase rendering boundaries with a cumulative center pane, realized phase sections, one chat runtime per specification, sidebar section navigation, scroll/focus behavior, and preservation of the single actionable frontier at the current reachable phase. +- **Why now / unlocks:** Workflow read/write ownership is extracted, the multi-chat substrate ships chat containers below the specification, and side-chat V3.0/V3.1 closed the cascade surface. This gives future side-chat persistence, strategy chats, and graph/workspace routes a stable host without introducing a second durable workflow model. +- **Acceptance:** Realized phase sections remain legible, future sections stay unreachable until valid, navigation is focus/scroll state only, and the current phase retains exactly one actionable frontier/recovery/handoff/completion affordance. +- **Verification:** Manual workspace walkthroughs across kickoff-ready, active, review-active, recovery, close-to-next-phase, resume/reload, and future-phase deep-link states; regression tests around route/workflow state where available. +- **Traceability:** A58; D86, D87, D110, D113, D114; I24, I102. +- **Design docs:** `docs/design/CONTINUOUS_WORKSPACE_HYBRID.md`; umbrella synthesis in `docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md`. + +### agent-fixture-substrate + +- **Name:** FE-705 integration — agent capability CLI + LLM-as-user fixture probe +- **Linear:** FE-705 +- **Kind:** structural +- **Status:** branch-complete / reconciling +- **Objective:** Integrate the branch-complete local `brunch agent` JSONL capability adapter and external probe runner so agents can drive the real Brunch interview flow through Brunch-owned contracts rather than privileged ORM access. +- **Why now / unlocks:** Prompt/context and graph-review probes need realistic graph/transcript fixtures, but hand-authoring those fixtures is chicken-and-egg. A JSONL capability adapter lets an external LLM-as-user drive the real lifecycle through the same mutation authority future agents must use, pressure-testing tool-call vocabulary, chat readiness, resource identity, fixture curation, and import-boundary discipline. Pi comparison remains FE-635 after this seam has a real Brunch use case to compare against. +- **Acceptance:** Server-owned capability contracts and JSONL protocol/session code are integrated; the probe runner uses only the JSONL client/process boundary; fixture-candidate artifacts preserve scenario briefs, model policy, generated transcripts, and workspace-state inspection without becoming Brunch authority. +- **Verification:** Contract/dispatcher tests, JSONL protocol/session tests, import-boundary tests, fake process tests, opt-in real-provider smoke, and fixture-candidate structure/readiness checks. +- **Traceability:** Requirement 43; A89; D143, D147; I114. Also protects Requirements 40, 41, 42 by making prompt/context and mutation-surface probes executable through a real adapter. +- **Design docs:** `docs/design/AGENT_MUTATION_SURFACE.md`; `docs/archive/design/INTENT_SPEC_EVOLUTION.md`; FE-705 branch artifacts until rebased. + +### intent-graph-semantics + +- **Name:** Intent graph semantics + relation-policy directionality foundation +- **Linear:** FE-700 +- **Kind:** structural +- **Status:** not-started +- **Objective:** Refine the ontology and relation policy so the graph can represent invariants, examples/counterexamples, constraint subtypes, narrowed decisions, witness strength, checkability gaps, and operational edge behavior as source/destination material for future generative features. +- **Why now / unlocks:** Candidate generation, behavioral kernels, graph review, scenario-options acceleration, architect proposals, direct-edit cascade, and downstream verification-aware decomposition all need a sharper semantic target than the current exploration/review ontology. This semantic-layer lane is most likely to collide with parallel work, so it should land before broad observer enrichment or canonical candidate-bundle acceptance. +- **Acceptance:** `invariant` and `example` are first-class durable kinds; examples are subtyped; `decision` is narrowed; `constraint`, `criterion`, and `invariant` semantics are enriched; `checkability` and witness strength are represented; relation families, negative relations, edge epistemic metadata, and relation-policy directionality are explicit. +- **Verification:** Corpus/fixture observer probes comparing old vs refined ontology; relation-policy unit tests for mixed-direction relations; graph-review manual assessment for precision/noise; context-pack probe outputs show authority, witness, relation support, and directionality labels. +- **Traceability:** Requirement 38; A77, A78, A80, A81, A84; D134, D136, D137, D139, D140. +- **Design docs:** `docs/design/INTENT_GRAPH_SEMANTICS.md`; `docs/archive/design/INTENT_SPEC_EVOLUTION.md`; FE-705 strategy/proposal notes for relation directionality. + +### changeset-ledger + +- **Name:** Semantic changeset ledger + proposal-turn staleness +- **Linear:** FE-701 +- **Kind:** structural +- **Status:** not-started +- **Objective:** Introduce the semantic history spine that separates graph mutation history from conversational turn ancestry. +- **Why now / unlocks:** Scenario bundle acceptance, direct-edit atomicity, accepted-with-issues flows, stale proposal detection, graph-review repairs, side-chat V4b item versioning, and future architect/reconciliation agents all need a durable semantic mutation boundary. Without it, productized scenario-options can stay probe-only but cannot safely commit candidate bundles. +- **Acceptance:** Schema and operation vocabulary use `changeset` / `change`; specifications track latest semantic changeset; proposal turns carry base/opened changeset identity; `reconciliation_need.caused_by_changeset_id` is connected; non-accept proposal actions cannot mutate graph truth; a changeset is the smallest atomic unit preserving semantic coherence. +- **Verification:** DB atomicity tests for changeset + changes + reconciliation_need writes, staleness tests for open proposal turns across multi-chat changes, and capability/transition tests proving non-accept actions cannot mutate graph truth. +- **Traceability:** Requirements 39, 42, 44; A71, A79; D135, D138, D143. +- **Design docs:** `docs/design/PATCH_LEDGER.md` (historical filename; future vocabulary is changeset/change); FE-705 strategy/proposal notes for semantic history and proposal turns. + +### graph-review-scenario-options + +- **Name:** Graph-review oracle + scenario-options probes +- **Linear:** FE-702 for graph-review / scenario probes; FE-649 and FE-640 remain productization children under FE-698 where relevant +- **Kind:** structural +- **Status:** not-started +- **Objective:** Build the internal critique path and artifact-only candidate bundle probes before product UI. +- **Why now / unlocks:** Product wants first-turn strategy choice and mid-interview acceleration, but engineering needs graph-review critique to make generated candidate bundles credible. This lane can advance in parallel with FE-700 if it stays artifact-only and does not commit canonical graph truth. +- **Acceptance:** Candidate graph bundle and graph-review finding artifacts exist; graph-review prompt/context pack and rubric cover coherence, fixed-premise respect, coverage, tradeoff honesty, checkability, granularity, scenario fidelity, epistemic labels, provenance, and downstream usefulness; candidate readiness is classified as `draft` / `reviewing` / `reviewed_clean` / `reviewed_with_issues` / `blocked`; broader graph-review issues remain turn-owned unless querying/filtering needs prove otherwise. +- **Verification:** Scenario-runner fixtures, FE-705 JSONL-generated completed-spec fixtures, raw output review, structured parse validation, qualitative scorecards, and comparison against drilldown-produced graphs. Middle/outer-loop oracle design should decide when fixture candidates become golden. +- **Traceability:** Requirements 20, 21, 31, 32, 40, 41, 43, 44; A67, A68, A80, A85, A87, A89; D126, D127, D139, D141, D147. +- **Design docs:** `docs/design/BEHAVIORAL_KERNELS.md`; `docs/design/INTENT_GRAPH_SEMANTICS.md`; `docs/design/AGENT_MUTATION_SURFACE.md`; FE-705 strategy/proposal notes. + +### productized-scenario-options + +- **Name:** Productized scenario-options / candidate-spec completion assist +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** blocked +- **Objective:** Replace skip-only remainder handling with first-turn strategy choice and a mid-interview `speed this up` path that generates reviewed candidate graph bundles with tradeoffs, completing the current direction by default. +- **Why now / unlocks:** This is the likely first user-visible alternative to long drilldown, but product UI waits on graph-review probes, FE-700 semantics, and FE-701 changesets. Until then, scenario-options remain artifact/proposal-only. +- **Acceptance:** Users can choose or request acceleration via scenario options; generated bundles preserve accepted graph truth as fixed premise, present tradeoff profiles, and become canonical only through coherent accepted changesets with known issues represented as follow-on review/process debt. +- **Verification:** Probe comparison against direct drilldown, graph-review scorecards, accepted-with-issues flow tests once changesets exist, and manual user-flow review for trust/comprehension. +- **Traceability:** Requirements 31, 40, 44; A67, A77, A78, A85, A90, A91; D126, D134, D136, D139, D151, D152. +- **Design docs:** FE-705 strategy/proposal notes until canonicalized; `docs/design/BEHAVIORAL_KERNELS.md`; `docs/design/INTENT_GRAPH_SEMANTICS.md`. + +### first-run-provider-setup + +- **Name:** First-run provider setup +- **Linear:** FE-633 covers the OpenRouter/default-provider part; dashboard credential UX + XDG key storage may need a sibling issue if split from provider proving +- **Kind:** bounded feature +- **Status:** not-started +- **Objective:** Make missing LLM credentials visible on the dashboard, add a shared AI runtime provider seam for interviewer/observer model construction, support UI-entered keys through XDG-compliant user auth state, and evaluate whether OpenRouter should become the preferred onboarding provider while preserving Anthropic-specific capabilities or explicit degradation. +- **Why now / unlocks:** Can proceed independently and reduces first-run friction for real users and probe workflows. +- **Acceptance:** Dashboard surfaces provider credential status before specification creation; setup flow stores UI-entered keys outside the project workspace; interviewer/observer construction routes through a shared provider seam. +- **Verification:** Unit tests for provider precedence/storage paths, manual first-run walkthroughs, and provider capability spike for model naming, structured output, tool use, and reasoning/thinking support. +- **Traceability:** Requirements 34, 35, 36; A74, A75; D130, D131, D132; I106. +- **Design docs:** none yet beyond SPEC/PLAN entries. + +### workspace-gitignore-assist + +- **Name:** Workspace hygiene / `.brunch/` gitignore assist +- **Linear:** FE-648 +- **Kind:** bounded feature +- **Status:** not-started +- **Objective:** Detect whether generated local state is already ignored and, with explicit confirmation, add an idempotent `.gitignore` entry or create `.gitignore` when absent. +- **Why now / unlocks:** Low-conflict guardrail that reduces accidental commits of local Brunch state. +- **Acceptance:** The app detects absent, present, and already-covering ignore states; previews repository mutation; mutates `.gitignore` only after explicit confirmation; append/create behavior is idempotent and content-preserving. +- **Verification:** Unit tests for ignore detection/append behavior and manual dashboard walkthrough with absent, present, and already-covering `.gitignore` states. +- **Traceability:** Requirement 37; A76; D133; I107. +- **Design docs:** none yet beyond SPEC/PLAN entries. + +### productized-web-research + +- **Name:** Productized web research capability +- **Linear:** FE-649 +- **Kind:** structural +- **Status:** not-started +- **Objective:** Add web search and page-fetch tools as interviewer-invoked context gathering, surfaced as preface cards after the scenario substrate proves query framing, tool ergonomics, and provisional-context handling. +- **Why now / unlocks:** Extends the same phase-agnostic preface-card model to external research, but should wait for prompt/context scenario substrate proof so web research does not become an ad hoc tool surface. +- **Acceptance:** Research tools are invoked through interviewer context gathering, outputs render as provisional preface cards paired with questions, and observer capture treats the validated full turn as atomic. +- **Verification:** Prompt/context scenario probes for query framing and tool-output summarization, plus manual review of provisional-context handling. +- **Traceability:** Requirements 20, 21, 40, 41; D99, D112, D139, D142. +- **Design docs:** FE-698 prompt/context scenario substrate references; future productized research notes if needed. + +### relation-first-observer-enrichment + +- **Name:** Relation-first observer capture enrichment +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** horizon +- **Objective:** Broaden observer output across the refined ontology without flooding the graph. +- **Why now / unlocks:** First cut is shipped; enrichment waits for FE-700 relation policy so observer output can become semantically richer while preserving prompt-budgeted compact anchors and user trust. +- **Acceptance:** Observer extraction captures richer relation families and operational metadata with abstention under weak support. +- **Verification:** Observer corpus probes, graph/export review for precision/noise, and context-pack output review. +- **Traceability:** Requirements 30, 38, 40; A66, A81, A84; D125, D136, D137, D139, D140; I109. +- **Design docs:** `docs/design/INTENT_GRAPH_SEMANTICS.md`. + +### architect-generator-loop + +- **Name:** Architect / generator loop +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** horizon +- **Objective:** Explore an autonomous agent that iterates over the intent graph and proposes semantic changes for HITL review through the same future changeset/reconciliation pathway as user-driven edits. +- **Why now / unlocks:** Related to scenario-options but broader; keep productized architect proposals behind multi-chat, reconciliation, and semantic changesets. Use the scenario substrate for shadow/proposal-only probes first. +- **Acceptance:** Shadow/proposal-only architect outputs can be compared against user-driven edits without mutating canonical graph truth. +- **Verification:** Scenario substrate probes and human comparison against accepted user edits. +- **Traceability:** A73, A85, A87; D139, D141. +- **Design docs:** `docs/design/BEHAVIORAL_KERNELS.md`; future design doc if promoted. + +### server-mini-library-compartmentalization + +- **Name:** Server mini-library compartmentalization +- **Linear:** unassigned in this plan snapshot +- **Kind:** refactor +- **Status:** horizon +- **Objective:** Refactor growing server seams into plural public roots with same-named private subtrees where FE-698 / FE-705 pressure has made boundaries too implicit. +- **Why now / unlocks:** Near-term refactor candidate after FE-705 integration, not product roadmap work. +- **Acceptance:** Candidate seams such as `fixtures.ts`, `context-packs.ts`, `prompts.ts`, `scenario-runner.ts`, `entity-apis.ts`, and `agent-apis.ts` hide private implementation subtrees behind stable public roots where real pressure exists. +- **Verification:** Existing test suite plus import-boundary review. +- **Traceability:** code organization convention in `AGENTS.md`. +- **Design docs:** none. + +### side-chat-persistence-v4a + +- **Name:** Side-chat persistence — V4a (multi-chat Phase 2 substrate) +- **Linear:** FE-675 umbrella, V4a half +- **Kind:** structural +- **Status:** horizon +- **Objective:** Persist side-chat client turns into the existing `chat` / `turn` tables with `chat.kind='side_chat'`, load prior side-chat sessions on remount, and surface an "Old chats" affordance per pinned item/spec. +- **Why now / unlocks:** Deprioritized below continuous workspace and semantic/generative substrate. Phase 1 substrate already ships schema support; the remaining decision is the anchor model (`chat` row anchor fields vs deferred `chat_focus` table). +- **Acceptance:** Side-chat sessions survive remount/reload and remain coherent with graph truth without introducing a second workflow model. +- **Verification:** Persistence/reload tests and manual side-chat walkthroughs. +- **Traceability:** Requirement 39; A82, A83; D138. +- **Design docs:** `docs/design/MULTI_CHAT.md` §10 Phase 2; `docs/design/SIDE_CHAT.md` §9 V4 row. + +### side-chat-v4b-item-versioning + +- **Name:** Side-chat V4b — item versioning + branched exploration +- **Linear:** FE-675 umbrella, V4b half +- **Kind:** structural +- **Status:** horizon +- **Objective:** Add item versioning and branched exploration once the changeset ledger lands. +- **Why now / unlocks:** Item versioning unblocks dangling-annotation repair and soft-edit audit; branched exploration lets drill-downs, past-turn edits, and revisits coexist with the original chain. +- **Acceptance:** Prior item versions are queryable for diff/comparison/audit while active-path projection always reflects latest semantic truth. +- **Verification:** Changeset-backed versioning tests, revisit cascade tests, and annotation repair walkthroughs. +- **Traceability:** A72, A73, A85; D139, D141. +- **Design docs:** `docs/design/MULTI_CHAT.md`; `docs/design/PATCH_LEDGER.md`. + +### dashboard-summaries + +- **Name:** Dashboard result summaries and completeness metrics +- **Linear:** unassigned in this plan snapshot +- **Kind:** bounded feature +- **Status:** horizon +- **Objective:** Improve progress visibility across specifications. +- **Why now / unlocks:** Lower-priority product surface after core workspace and semantic substrate stabilize. +- **Acceptance:** Dashboard communicates spec progress/completeness without implying false closure. +- **Verification:** Manual dashboard walkthroughs. +- **Traceability:** Requirements 8, 13, 15. +- **Design docs:** none. + +### spatial-graph-layout + +- **Name:** Spatial canvas layout for graph view +- **Linear:** unassigned in this plan snapshot +- **Kind:** bounded feature +- **Status:** horizon +- **Objective:** Add the spatial DAG layout as a second layout choice inside graph mode, alongside the structured-list route. +- **Why now / unlocks:** Graph view already ships as a structured-list peer route; spatial layout follows once relation density and graph interaction needs justify it. +- **Acceptance:** Users can switch between structured-list and spatial canvas layouts without changing projection semantics or action contracts. +- **Verification:** Manual graph-view walkthroughs at low/high edge density plus visual regression if available. +- **Traceability:** Requirement 33; A69, A70; D128. +- **Design docs:** graph-view sections in SPEC; future graph-view design notes if promoted. + +### graph-view-active-path-filter + +- **Name:** Graph view active-path render filter + scope toggle +- **Linear:** unassigned in this plan snapshot +- **Kind:** bounded feature +- **Status:** horizon +- **Objective:** Render only active-path items by default in graph view, with a `Show all` toggle. +- **Why now / unlocks:** Lower-priority graph legibility improvement after core graph semantics and projection surfaces stabilize. +- **Acceptance:** Active-path filtering is default, user can inspect all items, and edge rendering remains honest under both scopes. +- **Verification:** Graph-view fixtures for active-path/all toggles. +- **Traceability:** D128 and graph-view requirements. +- **Design docs:** none. + +### mcp-adapter + +- **Name:** MCP server adapter for core operations +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** horizon +- **Objective:** Expose future adapter over capability contracts, not direct ORM/route wrappers. +- **Why now / unlocks:** Deferred until capability contracts stabilize through FE-705 and real agent/probe use. +- **Acceptance:** MCP tools wrap Brunch-owned capability contracts and preserve resource identity, authority metadata, and mutation semantics. +- **Verification:** Contract adapter tests and import-boundary tests. +- **Traceability:** Requirements 42, 43; D143, D147. +- **Design docs:** `docs/design/AGENT_MUTATION_SURFACE.md`. + +### file-based-persistence + +- **Name:** Git-friendly file-based persistence representation for diffable exported specs +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** horizon +- **Objective:** Explore a diffable file representation for exported/durable spec truth. +- **Why now / unlocks:** Deferred until product ontology and changeset semantics are clearer. +- **Acceptance:** File representation preserves intent graph meaning and review/export boundaries without becoming a second source of truth. +- **Verification:** Round-trip and diff-fixture tests if promoted. +- **Traceability:** Product direction from planning specs toward intent specs; D134, D135. +- **Design docs:** future design needed if promoted. + +### typed-fixture-builder-convergence + +- **Name:** Typed fixture-builder convergence for happy-path tests +- **Linear:** unassigned in this plan snapshot +- **Kind:** hardening +- **Status:** horizon +- **Objective:** Converge test fixtures around typed builders that represent current product semantics. +- **Why now / unlocks:** Useful after semantic schema work stabilizes so tests do not fossilize obsolete ontology names. +- **Acceptance:** Happy-path tests can create coherent specs/chats/turns/intent graph state through typed builders with minimal duplication. +- **Verification:** Existing test suite, fixture API review, and migration of representative tests. +- **Traceability:** I48, I109, I111, I112. +- **Design docs:** none. + +### structured-development-spec-registry + +- **Name:** Structured development spec registry +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural / process +- **Status:** horizon +- **Objective:** Prototype file-backed canonical spec records, deterministic checks, generated markdown views, and task-local slices for Brunch's own development workflow. +- **Why now / unlocks:** Self-tooling experiment, not product functionality. It would make `memory/SPEC.md` / `memory/PLAN.md` generated views over structured records to reduce drift and merge conflicts. +- **Acceptance:** Generated views preserve current planning ergonomics while reducing merge churn and cross-reference drift. +- **Verification:** Deterministic generation checks and branch-conflict dry runs. +- **Traceability:** dev-layer trajectory only; not product-layer ontology. +- **Design docs:** `docs/design/ln-skills/EVOLUTION.md`. + +### portability-boundaries + +- **Name:** Portability boundaries +- **Linear:** unassigned in this plan snapshot +- **Kind:** structural +- **Status:** horizon +- **Objective:** Split durable store/read-model, interview session runtime, and workspace capability provider if Brunch targets hosted, remote, embedded, or sandbox-backed operation. +- **Why now / unlocks:** Future architecture boundary map for non-local deployments or adapter-backed execution. Deferred until hosted/remote/sandbox operation becomes a product goal. +- **Acceptance:** Boundary map supports hosted/remote/sandbox decisions without prematurely abstracting the local-first product. +- **Verification:** Architecture review and spike if product direction changes. +- **Traceability:** portability assumptions in design docs; current local-first constraint in SPEC. +- **Design docs:** `docs/design/PORTABILITY_BOUNDARIES.md`. ## Recently Completed -- [2026-05-11] **Side-chat V3.1 — agent-grouped reconciliation resolution** (FE-674, PR #124 + downstack) — closes the V3.x arc end-to-end. Server: `POST /api/specifications/:id/reconciliation-needs/run-agent` (spec-level classifier loop) and `POST /api/specifications/:id/reconciliation-needs/:needId/reset-agent` (per-row Re-run) walk every awaiting open need through I114's `null → queued → classifying → classified | failed` lifecycle; agent_classification persists one of `auto-confirm` / `auto-edit` / `substantive`; agent_proposal carries an optional text suggestion. Client: `` renders six visual variants per row; `` in the Pending review header with conditional 1s polling while any need is in flight; per-row Re-run on classified/failed rows; per-class action buttons (`auto-confirm` → Confirm, `auto-edit` → View proposal + Apply + Skip, `substantive` → Open side-chat via `useSideChat().openFor`); bulk Confirm-all (N) and Apply-all-suggested (N) iterate serially over existing per-row endpoints. Listing endpoint extended with `target_item_kind` + `target_reference_code` to feed the Open-side-chat handoff. Verified: `npm run verify` 1178 / 1179 pass (one unrelated `side-chat-route` flake). **Watch**: A88 outer-loop walkthrough has not yet happened — empirical signal on whether agent grouping helps legibility vs V3.0's flat list remains open; capture qualitative notes during the next manual walkthrough on a dense spec. -- [2026-05-11] FE-698 reconciliation context-pack slice — Added a proposal-only reconciliation prompt/context scenario that renders open reconciliation needs with source/target anchors, reason/status, prompt/context fingerprints, and read-only capability metadata. This is substrate-only: no FE-674 need lifecycle endpoint, overlay action, side-chat reducer, or durable mutation behavior. Verified: `npm run verify`. Watch: next FE-698 work can move to broader read-only/proposal-only probes and the Pi adapter spike without treating this pack as a resolution agent. -- [2026-05-08] **Side-chat V3.0 — hard-impact cascade through `reconciliation_need`** (FE-674, PR #115 + #116 + #117) — three-card stack closes V3.0. Card 1 (PR #115): server `cascade-producer` + `getDownstreamEdges` + `openReconciliationNeedIfAbsent`; hard-impact apply mutates the source and opens one need per typed dependency edge; response shape adds `openedNeedIds`; partial-unique-index dedupe. Card 2 (PR #116): drop deferred banner; new `GET /api/specifications/:id/reconciliation-needs` endpoint and `useSpecificationOpenReconciliationNeeds` query; patch-list overlay renders a Pending review section listing open needs with kind chip and source/target references. Card 3 (PR #117): idempotent `POST /api/specifications/:id/reconciliation-needs/:needId/resolve` endpoint and per-row Resolve button; mutation pending state disables the button mid-flight. Verified: `npm run verify` (1063 tests, 0 lint warnings). Watch: A88 (Path 1 sufficiency without agent) is partially validated mechanically — full validation depends on outer-loop walkthrough on dense graphs. V3.1 (agent-grouped resolution) shipped 2026-05-11; richer per-row kinds beyond single Resolve are V3.1. SIDE_CHAT.md §9 updated to reflect the V3.0 single-action shape. -- [2026-05-08] FE-674 planning sync — reconciled `docs/design/SIDE_CHAT.md` §5.3 / §8 / §9 / §13 against the downstack FE-697 substrate; SPEC.md adds A88 (Path 1 sufficiency without agent), D146 (cascade routes through `reconciliation_need`, `deferred: true` apply contract removed at V3.0 ship), I113 (apply opens at least one need per typed dependency edge), and rewrites Acceptance Criterion 7. Doc-only, no `src/` touched. PR #110 stacked on FE-704. -- [2026-05-08] FE-698 prompt/context follow-up hardening — Candidate-spec prompt scenarios no longer advertise durable changeset submission, prompt scenario artifacts report schema version 2 for the fingerprinted shape, scenario definitions require typed context data, empty prompt assets are cached correctly, context-pack anchors use intent vocabulary, and `context-pack.ts` now remains the public entry point over private scenario-specific context-pack modules. Verified: `npm run verify`. Watch: this is still FE-698 continuation hardening; broader generative quality review and additional scenario probes remain later slices. -- [2026-05-08] FE-698 prompt/context remediation + candidate scenario — Prompt scenario definitions are now discriminated by scenario kind, candidate-spec scenarios render deterministic no-provider proposal artifacts from typed context packs, scenario artifacts include prompt/context fingerprints, server prompt asset copying mirrors current source assets, prompt golden coverage protects production prompt text, and the build-boundary prompt test writes isolated output. Verified: `npm run verify`. Watch: full generative quality review for candidate-spec output remains a later execution/probe slice. -- [2026-05-08] FE-698 scenario execution error hardening — Scenario execution failures now serialize safe deterministic summaries: API-key-like provider errors are redacted, non-Error rejections avoid object dumps, and ordinary errors remain reviewable. Verified: `npm run verify`. -- [2026-05-08] FE-698 Anthropic scenario adapter — Added a probe-only Anthropic AI SDK adapter behind the existing `PromptScenarioModelAdapter` seam. Web-research prompt scenarios now map rendered prompts to AI SDK system content and rendered context packs to user prompt content under mocked tests, with unsupported providers rejected before model construction. Verified: `npm run verify`. Watch: this is not the shared AI runtime provider seam; OpenRouter/provider-neutral routing, credential UX, Pi, web tools, CLI/UI, persistence, and Brunch mutations remain out of scope. -- [2026-05-08] FE-698 prompt scenario execution probe — Web-research prompt scenarios can now execute through an injected fakeable model adapter and serialize `succeeded` / `failed` execution results with raw output or deterministic error text, while no-provider artifacts remain deterministic `not-run` snapshots. Structured parsing is explicitly `not-applicable` for this prose-only web-research path. Verified: `npm run verify`. Watch: real provider adapters, Pi, web tools, CLI/UI, persistence, and mutating Brunch handlers remain out of scope for this foundation slice. -- [2026-05-07] FE-698 prompt/context foundation slices — Packaged markdown prompt registry + observer and web-research context-pack foundations + scenario runner capture skeleton/composition + agent mutation-surface audit + capability registry metadata. Server interviewer, observer, side-chat, and web-research role prompts now load from markdown assets through a typed prompt registry; observer capture and web-research probes render typed scenario-specific context packs; seeded prompt scenarios compose production prompts with typed context-pack output into deterministic no-provider probe artifacts; and scenario artifacts can declare validated Brunch capability contracts. Review fixes moved observer prompt composition into a pure module and made prompt scenario prompt sources explicit. The agent mutation-surface audit inventories current and projected agent-originated write paths as input to later handler slices. Verified: `npm run verify` for code slices; audit verified by code-search/document consistency. This is a completed foundation within FE-698, not retirement of the whole FE-698 frontier; the live continuation remains in `Next`. -- [2026-05-07] Side-chat V2 — Edit / Drill-down / Propose-edge plumbing (FE-673, PR #97) — added `edit`, `edge`, and `drill-down` patch kinds. Server `classifyEditImpact` returns `none | soft | hard`; soft applies directly with undo, hard returns `deferred: true` placeholder (removed at V3.0 ship). Client: patch-list reducer + three applier factories with real undo handlers. Verified: `npm run verify` (935 tests, 19 new). Watch: `SideChatPopover` Edit-mode reachability and cascade UX evolve with continuous workspace; V3.0 removed the hard-impact deferred banner. -- [2026-05-06] Multi-chat substrate + reconciliation needs (FE-697) — `chat` table with one interview chat per spec, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, plus the `reconciliation_need` queue with directed source/target items, narrow `kind`/`status`, partial unique index on open rows, cascade FK. Spec creation inserts spec + interview chat in one transaction; `advanceHead` is transactional. No user-visible change. Verified: `npm run verify` (673 tests) plus manual fixture playback (39 specs / 81 turns / dual-pointer equivalence). A82 / A83 validated for Phase 1. -- [2026-05-01] Side-chat V1.1 — Explore vertical slice. End-to-end graph-launched chat interaction shipped: prompt builder, POST `/side-chat` SSE endpoint, popover host, graph-view wiring, SSE consumer, and active-button activation. Follow-up refactor collapsed pending assistant text into the message list and extracted `SideChatHost` so activation is a tree-mount fact. -- [2026-05-04] Graph view structured-list peer route — `/specification/$id/graph` now renders project-wide entities through the structured-list layout with relationship subsections, relation chips, empty state, row controls, and a back-to-chat affordance. Follow-up active-path filtering and spatial canvas remain horizon work. Verified: `npm run verify` in the FE-643 slice family. - +- [2026-05-11] `side-chat-v3-1-agent-grouped-reconciliation` — Done: FE-674 / PR #124 + downstack closed the V3.x arc end-to-end with spec-level classifier route, per-row reset route, agent classification lifecycle, chips, per-class actions, and bulk Confirm-all / Apply-all-suggested. Verified: `npm run verify` 1178 / 1179 pass with one unrelated `side-chat-route` flake. Watch: A88 outer-loop walkthrough on a dense spec remains open to assess legibility vs V3.0's flat list. +- [2026-05-11] `fe-698-reconciliation-context-pack` — Done: added proposal-only reconciliation prompt/context scenario rendering open reconciliation needs with source/target anchors, reason/status, prompt/context fingerprints, and read-only capability metadata. Verified: `npm run verify`. Watch: next FE-698 work can broaden read-only/proposal-only probes and Pi adapter spike without treating this pack as a resolution agent. +- [2026-05-08] `side-chat-v3-0-hard-impact-cascade` — Done: FE-674 / PR #115 + #116 + #117 shipped hard-impact cascade through `reconciliation_need`, Pending review listing, and idempotent resolve. Verified: `npm run verify` (1063 tests, 0 lint warnings). Watch: A88 mechanical grouping remains only partially validated until outer-loop walkthrough on dense graphs. Older history: `docs/archive/PLAN_HISTORY.md` ## Dependencies ```text -TRACK A — Agent/semantic substrate -multi-chat-substrate + reconciliation-needs (completed) - ├──→ prompt/context scenario substrate (completed) - │ ├──→ intent graph semantics + progressive checkability (next) - │ ├──→ generative prompt probes before UI (next) - │ │ ├──→ productized web research capability (horizon) - │ │ ├──→ productized candidate-spec completion assist (horizon) - │ │ └──→ post-spec oracle/decomposition frontier (probe/future product) - │ └──→ continuous-workspace (active, independent UI track but graph-context aware) - └──→ semantic-changeset ledger (horizon) - ├──→ relation-first observer enrichment (horizon, after ontology/policy probes) - └──→ architect-loop (horizon, proposal-only until changeset/reconciliation path) - -TRACK B — Graph/workspace surfaces -graph-view-structured-list (completed) - ├──→ active-path-filter-and-scope-toggle (horizon, blocked on server data-layer) - ├──→ spatial-canvas-layout (horizon) - └──→ multi-chat-substrate + reconciliation-needs (completed) - ├──→ side-chat-V2-plumbing (completed, FE-673 PR #97) - │ └──→ side-chat-V3.0-cascade-through-reconciliation_need (completed, FE-674) - │ └──→ side-chat-V3.1-agent-grouped-resolution (completed, FE-674 PR #124) - │ └──→ side-chat-persistence-V4a (next, FE-675 V4a half) - └──→ semantic-changeset ledger (horizon) - └──→ side-chat-V4b-item-versioning-+-branched-exploration (horizon, FE-675 V4b half) - -TRACK B — Infrastructure -multi-chat-substrate (completed) - ├──→ semantic-changeset ledger (horizon) - └──→ continuous-workspace (next) - - - -UNBLOCKED HORIZON -first-run provider setup (needs provider spike / scope) -workspace hygiene gitignore assist (bounded, dashboard-surface candidate) -intent-spec ontology + progressive checkability (needs probe) -relation-first observer capture (first cut complete, needs enrichment proving) -knowledge-edge semantics policy (discussion/design before observer expansion) -web-research tools (gate ready, needs tool impl) -dashboard metrics -two-axis interview framing -progressive detail / recursive deflation -revisit / edit-mode (reshaped by reconciliation needs + changeset ledger) -structured development spec registry (tooling experiment) -portability boundaries (deferred until substrate goal exists) +TRACK A — Workspace shell (parallel colleague lane) +continuous-workspace + ├──→ stable host for side-chat-persistence-v4a + └──→ workspace-aware graph / structured-list peer routes + +TRACK B — Agent fixture substrate +prompt/context scenario substrate foundation (completed) + └──→ agent-fixture-substrate + ├──→ generated completed-spec fixture candidates + ├──→ graph-review-scenario-options + └──→ Pi harness comparison (future, FE-635) + +TRACK C — Semantic substrate (highest coordination) +multi-chat-substrate + reconciliation-needs (completed) + ├──→ intent-graph-semantics + │ ├──→ relation-first-observer-enrichment + │ ├──→ robust direct-edit / reconciliation cascade policy + │ └──→ graph-review-scenario-options becomes semantically meaningful + └──→ changeset-ledger + ├──→ canonical scenario bundle acceptance + ├──→ direct-edit atomicity with caused_by_changeset_id + ├──→ stale open proposal detection + └──→ architect-generator-loop / verifier/import mutation provenance + +TRACK D — Strategy probes and product acceleration +agent-fixture-substrate + intent-graph-semantics + └──→ graph-review-scenario-options + └──→ productized-scenario-options + ├──→ absorbs / reshapes two-axis interview framing + └──→ absorbs / reshapes progressive detail / recursive deflation + +TRACK E — Low-conflict parallel work +first-run-provider-setup +workspace-gitignore-assist +productized-web-research + +LOWER-PRIORITY / DEFERRED +side-chat-persistence-v4a / side-chat-v4b-item-versioning +spatial-graph-layout + graph-view-active-path-filter +dashboard-summaries +mcp-adapter / file-based-persistence / typed-fixture-builder-convergence +structured-development-spec-registry +portability-boundaries ``` - diff --git a/memory/SPEC.md b/memory/SPEC.md index 2931a640..34a38bce 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -11,7 +11,7 @@ surfaces in `src/` schema and at runtime. The dev-workflow trajectory (the `ln-*` skill family, the proposed file-backed spec registry, and the long-horizon convergence between dev and - product ontologies) lives in `docs/design/DEV_WORKFLOW_EVOLUTION.md`. --> + product ontologies) lives in `docs/design/ln-skills/EVOLUTION.md`. --> # Brunch v2 — Spec Elicitation Tool @@ -90,6 +90,8 @@ Post-launch, Brunch should support specification work across two axes rather tha 40. Prompt and context engineering are first-class server subsystems: prompts and reusable policy doctrines live as inspectable markdown assets, while typed context-pack builders derive scenario-specific intent-graph renderings for interviewer, observer, research, candidate synthesis, behavioral kernels, reconciliation, architect, and downstream decomposition probes. 41. Agent-heavy future capabilities can be tested before product UI exists through a lightweight scenario substrate that runs prompt/context packs against seeded graphs or transcript fixtures, captures raw and structured outputs, and supports harness comparison. Scenario execution may use the existing Anthropic API key or fake adapters for probes, but first-run provider setup, credential storage, OpenRouter defaulting, and the shared production AI runtime seam belong to the provider setup frontier. Pi may be evaluated as a lower-level agent harness, especially for tool experiments and pre-UI probes, but Brunch product authority over durable workflow, replay, graph mutation, and reconciliation remains explicit. 42. Agent-originated mutations of Brunch data use one typed server-owned mutation surface regardless of caller. Internal interviewer/observer flows, scenario probes, CLI/TUI harnesses, Pi or other harness adapters, and future external agents may not mutate durable Brunch state by calling the ORM directly; they must invoke stable mutation handlers with input/output schemas, authority metadata, replay policy, and reconciliation/changeset-ledger semantics. Read-only capability contracts may share the same registry shape, but the hard invariant is single-entry mutation authority. +43. A local agent capability CLI can expose Brunch-owned capability contracts over long-lived JSONL stdin/stdout so an external probe runner or harness can drive the real specification flow without privileged ORM access. The CLI is an adapter over capability contracts, not a separate product API: calls carry explicit resource identifiers, read commands distinguish structured `get` / `list` data from agent-facing `read` projections with affordance hints, and mutating commands stay small and procedural around spec lifecycle requests, chat readiness, and turn response submission. The LLM-as-user scenario brief, model choice, fixture curation, and probe artifacts belong to an external probe runner that talks to the CLI like any other agent. +44. Specifications can evolve through multiple chat-local strategies rather than one global interviewer mode. A chat's first frontier turn may offer or declare its strategy (`step_by_step`, `scenario_options`, `targeted_cases`, `graph_review`, `reconciliation`), and every active/resumable chat should have at most one open assistant/system-first frontier turn waiting for a user completion action. Proposal turns use normalized completion semantics (`accept`, `reject`, `revise`, `ask_followup`, `defer`, `regenerate`); only acceptance of a proposal turn may apply that proposal's semantic changeset. Mid-interview acceleration should branch into a side-chat / strategy chat that completes the current direction from context-packed graph truth, while graph-review critique remains the internal oracle for judging and repairing generated candidate bundles. ## Assumptions @@ -135,7 +137,11 @@ Post-launch, Brunch should support specification work across two axes rather tha | A85 | A lightweight prompt scenario substrate will let Brunch validate LLM-heavy product directions faster than building UI first, if it captures rendered prompts, context packs, model settings, raw outputs, structured parses, and human review notes as repeatable artifacts. | medium | open | D139 | Run multi-scenario prompt probes for observer ontology, behavioral kernels, candidate-spec assist, and downstream oracle/decomposition before productizing their UI. | | A86 | Pi can serve as a useful pre-UI agent harness or tool-spike backend without forcing Brunch to adopt Pi as its production agent runtime, as long as integration remains adapter-shaped and Brunch-owned authority/replay/mutation semantics stay outside the harness. | low | open | D142 | Spike Pi SDK or RPC with in-memory sessions, custom tools, controlled prompts, and Brunch graph context packs; evaluate event capture, tool ergonomics, provider handling, packaging, and isolation. | | A87 | Verification-aware post-spec decomposition can be explored as agent scenarios before it is a Brunch product surface: intent graph truth plus progressive checkability can feed design alternatives, oracle strategy, execution slices, and orchestration constraints. | low | future | D141 | Prototype decomposition and oracle-design probes inspired by `ln-design` and `ln-oracles`; compare outputs for traceability to requirements, invariants, examples, criteria, and blind spots. | -| A89 | Making Brunch CLI-addressable will be valuable only if the CLI is generated or validated from the same Brunch-owned agent capability contracts that power other adapters; a hand-written CLI surface is likely to drift on verbs, flags, schemas, introspection, async recovery, and mutation authority. | medium | future | D143, D147 | Before productizing a CLI, prototype a narrow generated/contract-checked command set and audit it against agent-native constraints: non-interactive execution, uniform JSON, bounded output, enumerated errors, idempotent/recoverable mutations, stable vocabulary, machine-readable introspection, and explicit profile/delivery/feedback seams. | +| A89 | A long-lived local JSONL agent capability CLI can drive the real Brunch interview flow well enough for external LLM-as-user probes to produce credible completed specification fixtures, while keeping product resources explicit in every call and using ambient process state only for runtime plumbing such as DB handles, provider config, and in-flight generation bookkeeping. | medium | open | D143, D147, Requirement 43 | Prototype the minimal `brunch agent` JSONL loop over capability contracts, then run small LLM-as-user scenarios end-to-end through `chat.ensureReady`, `chat.read`, `turn.submitResponse`, `spec.requestPhaseClosure`, and `spec.requestExport`. Validate that probe logs are replayable, no probe code imports DB/product handlers directly, and no durable operation ledger is needed for the first readiness semantics. | +| A90 | Users who ask to speed up a long interview will prefer a side-chat that generates 2–3 reviewed scenario options completing the current direction over continuing the primary drilldown, provided existing accepted graph truth is treated as fixed premise by default. | medium | open | D126, D148, D151, Requirement 44 | Probe scenario-options against drilldown fixtures and run manual flow review: do users understand the tradeoff profiles, preserve trust in prior answers, and return to the primary interview when generated options disappoint? | +| A91 | Graph-review critique can make scenario-generated candidate bundles safe enough for product use without requiring perfect one-shot generation, if candidate readiness distinguishes `reviewed_clean`, `reviewed_with_issues`, and `blocked`, and if accepted-with-issues immediately opens durable follow-on review work. | medium | open | D151, D152, Requirement 44 | Run candidate bundle probes with graph-review scoring and human review; verify accepted-with-issues flows create a graph-review frontier or appropriate reconciliation needs rather than hiding defects. | +| A92 | A conservative global staleness rule for open proposal turns — stale when `specification.latest_changeset_id` differs from `turn.opened_at_changeset_id` — is acceptable before neighborhood-level staleness calculation exists. | medium | open | D149, I116 | Exercise multi-chat proposal flows where another chat applies a changeset while a proposal remains open; check whether regeneration prompts feel safe rather than noisy. | +| A93 | Relation-policy directionality lookup is safer than trying to force all useful intent-edge verbs into one dependency direction, because graph edges must serve display, context packs, export trace, reconciliation, critique, verification, candidate generation, and explanation. | medium | open | D137, D150 | In FE-700, define canonical/inverse sentences and source/target change behavior for each relation; test direct-edit and hard-impact cascade against mixed-direction relations. | ## Decisions @@ -205,7 +211,12 @@ Post-launch, Brunch should support specification work across two axes rather tha 144. **Intent graph vocabulary supersedes knowledge graph vocabulary** — Canonical product vocabulary is `intent graph`, made of `intent items` and `intent edges`. Current schema/code may still use `knowledge_item` and `knowledge_edge` as implementation names during transition, but new planning, agent capability contracts, context packs, operation ids, and user-facing design should prefer intent vocabulary unless referring to current persistence/API names. `Claim` may remain an explanatory generic for natural-language content, but it is not a product/schema noun. Depends on: D134, D136, D137. Supersedes: using `knowledge graph`, `knowledge item`, `knowledge edge`, or `claim` as future-facing product nouns. 145. **Changeset/change supersedes patch/patch_change** — Semantic mutation history uses `changeset` for one submitted semantic mutation bundle and `change` for one atomic mutation inside it. `Patch` and `patch_change` remain historical design-doc vocabulary and may appear in older file names, but new schema, capability contracts, operation ids, and planning language should use `changeset` / `change` unless this decision is explicitly reversed. Depends on: D135, D138, D143. Supersedes: treating naming as open between patch and changeset. 146. **Hard-impact edit cascade reads from the `reconciliation_need` queue, not from REVISIT walk state** — when a hard-impact `propose_edit` patch applies, the server enumerates `knowledge_edge` rows incident on the changed item under typed relation policy and opens one `reconciliation_need` row per affected pair (Path 1 from `docs/design/MULTI_CHAT.md` §5.1). The patch list overlay is the canonical resolution surface: open needs render as a `Pending review` section alongside staged patches, with per-row accept-on-target / edit-target / dismiss actions. The V2 `deferred: true` apply response and the "Hard impact — coming in V3 cascade preview" banner are removed at V3.0 ship. V3.0 groups needs mechanically by `kind` and relation type; agent-grouped resolution (auto-confirm / auto-edit / substantive) is V3.1 work and does not block V3.0. Side-chat thread persistence is not a V3.0 prerequisite — threads stay in-memory until MULTI_CHAT.md Phase 2. Depends on: A71, A83, A88, D80, D135, D137, D138. Supersedes: hard-edit deferral with a placeholder banner, the modal secondary-thread walk in `docs/archive/design/REVISIT_MODULE.md`, and the SIDE_CHAT.md V3 prose that pre-dated the multi-chat substrate. -147. **A future Brunch CLI should be an agent-native adapter over capability contracts, not a separate product API** — CLI-addressability belongs as a transport/tool adapter generated from or mechanically checked against Brunch-owned agent capability contracts. The CLI should inherit stable operation ids, input/output schemas, authority classes, replay policy, and reconciliation/changeset semantics from the contract registry, then project them through conventional agent-friendly CLI vocabulary (`get`, `list`, `create`, `update`, `delete`, `--json`, `--force`, `--limit`, `--dry-run`, `--wait`) with CI checks for banned aliases. A CLI adapter should default to non-interactive execution, send data to stdout and diagnostics to stderr, bound list/log output, enumerate valid values in validation errors, expose structured introspection (`brunch agent-context`) plus workflow guidance, and treat async/submitting operations as recoverable through `--wait` and a local job ledger. Durable Brunch writes still enter only through D143 mutation handlers; CLI profiles, delivery sinks, and feedback commands are adapter conveniences, not alternate state authority. Depends on: D139, D140, D143, A89. Supersedes: treating a future CLI as hand-written wrappers around routes or direct ORM scripts. +147. **The local agent CLI is a long-lived JSONL adapter over Brunch capability contracts** — CLI-addressability should first ship as a `brunch agent`-style local process that speaks request/response JSONL over stdin/stdout, dispatches Brunch-owned capability contracts, and keeps all product resources explicit in input payloads. The adapter may hold ambient runtime plumbing such as a DB connection, provider config, and in-flight interviewer / observer generation bookkeeping, but it must not hold hidden selected spec/chat/turn handles as command semantics. Read capabilities use `list` / `get` for structured read-model data and `read` for agent-facing projections with allowed response shapes and next-command hints. Mutations stay capability-first and surface-lazy: add only contracts needed by real probe/tool use, with an initial surface around `spec.create`, `spec.getStatus`, `spec.requestPhaseClosure(specId, phaseId?)`, `spec.requestExport`, `chat.getPrimary`, `chat.ensureReady(chatId?, timeoutMs?)`, `chat.read`, `turn.get`, and `turn.submitResponse(chatId, turnId?, response)`. `chat.ensureReady` is the idempotent synchronization/recovery command: it may trigger continuation when a chat lacks a usable next turn, can block up to a bounded timeout, and returns a derived state such as `generating`, `awaiting_response`, `idle_no_frontier`, `closed`, or `error` without requiring a durable runtime-operations ledger in the first cut. The LLM-as-user probe runner is a separate client of this JSONL adapter, not part of the capability server and not allowed to import DB/product handlers directly; its scenario briefs, model policy, generated transcripts, and curated golden fixture bundles are probe artifacts, not Brunch authority. Suggested module boundary: `src/server/capabilities.ts` plus `src/server/capabilities/` own contracts, schemas, handlers, and dispatch; `src/server/agent-jsonl.ts` plus `src/server/agent-jsonl/` own only protocol/session/transport; `scripts/agent-probes/` owns the outer LLM-as-user loop and artifact writing as development harness code through a JSONL client. Depends on: Requirement 43, A89, D139, D140, D143. Supersedes: treating the CLI as hand-written route wrappers, direct ORM scripts, a one-shot TUI with hidden ambient selection, or a probe runner that bypasses the same mutation surface future agents must use. +148. **Spec evolution strategies are chat-local, turn-mediated process state** — strategy is not specification-level semantic truth. A chat may be established through a first assistant/system frontier turn that offers or declares a strategy such as `step_by_step`, `scenario_options`, `targeted_cases`, `graph_review`, or `reconciliation`; globally triggered flows may create/reuse a pre-strategized chat whose first turn is the procedure kickoff. A chat can technically change strategy through later turns, but explicit switch UX is deferred. Tactical sub-strategies are allowed inside a chat, but broad mid-interview acceleration should branch into a side-chat/strategy chat rather than mutate the primary interview chat in place. Depends on: Requirement 39, Requirement 44, D138. Supersedes: treating the interviewer as one global mode per specification. +149. **Changesets are the atomic semantic mutation boundary, while proposal turns are not mutations until accepted** — a graph-review finding, candidate bundle, or reconciliation suggestion is the assistant/system half of an open frontier turn until the user responds. Only `accept` applies a proposal turn's semantic changeset; `revise`, `ask_followup`, `regenerate`, `defer`, and ordinary `reject` produce successor/process state rather than direct graph mutation. A changeset is the smallest semantic mutation unit that preserves coherence, and may record direct edits, candidate acceptance, reconciliation resolutions, opened reconciliation needs, or future verifier/import results. Turns should stamp the latest applied changeset id at creation (`opened_at_changeset_id` / `base_changeset_id`) so open proposals can be conservatively marked stale when the specification advances. Depends on: A71, A79, A92, D135, D145, Requirement 44. Supersedes: treating agent proposals or review findings as durable semantic truth before user/action acceptance. +150. **Relation policy owns operational directionality for intent edges** — relation names should be semantically clear, but code must not infer cascade or reconciliation behavior from raw edge source/target direction. Each relation kind declares a canonical sentence, inverse display sentence, operational-axis participation, and source-change / target-change behavior. Direct edit and hard-impact cascade enumerate incident accepted edges, then ask relation policy which endpoint, if any, receives a `reconciliation_need`. FE-700 may break current `depends_on` / `derived_from` / `constrains` / `verifies` records while expanding the ontology, but should not force every useful edge verb into one dependency direction at the expense of display, prompt context, export trace, critique, verification, candidate generation, or explanation. Depends on: A81, A88, A93, D137, D146. Supersedes: assuming outgoing edges from the changed item are the cascade direction. +151. **Scenario-options acceleration is product-facing, but graph review is its safety oracle** — the first user-visible alternative to long drilldown should likely be a first-turn strategy choice or mid-interview `speed this up` side-chat that generates 2–3 candidate bundles completing the current direction from context-packed accepted graph truth. Candidate bundles present named tradeoff profiles and are accepted as coherent units, not item-by-item pick lists. Fast gates (parse/schema/fixed-premise/no-obvious-conflict/tradeoff summary) can run before display; deeper graph review, coverage, checkability, provenance, and repair/refinement can run asynchronously. Depends on: A67, A84, A85, A90, A91, D126, D139, D140, D148, Requirement 31, Requirement 44. Supersedes: treating candidate-spec assist as a skip/force-close helper or as one-shot generation that can be committed without critique. +152. **Graph review and reconciliation are separate graph operations** — reconciliation is repair-oriented process debt from a known disturbance (`reconciliation_need`), while graph review is quality-oriented critique over any graph for weakness, genericity, low support, missing coverage, weak checkability, poor provenance, or maturity gaps. Broader review findings start as turn-owned structured artifacts; `reconciliation_need` remains the only first-class problem table until review issues require independent querying, filtering, badges, assignment, or lifecycle. Candidates may be accepted with represented issues if accepting also opens a graph-review frontier or appropriate process-debt records. Depends on: A91, D137, D149, D151, Requirement 44. Supersedes: overloading reconciliation as the umbrella for all graph intelligence or blocking useful imperfect specs until every review issue is repaired. ## Interaction Stream Model @@ -373,6 +384,11 @@ Each row in this table is a **formalization candidate** ascending the progressiv | I112 | Prompt/context scenarios render from packaged markdown prompts and typed context-pack builders rather than scattered inline prompt strings; probe artifacts include deterministic rendered prompt/context fingerprints, prompt asset packaging mirrors current source assets at build time, and production prompt text has reviewable golden coverage without requiring product UI. | `prompt-loader.test.ts`, `prompt-build-boundary.test.ts`, `prompt-golden.test.ts`, `context-pack.test.ts`, `scenario-runner.test.ts` | Requirements 40, 41; D139, D140 | | I113 | Hard-impact `propose_edit` apply opens at least one `reconciliation_need` per existing typed dependency edge incident on the changed knowledge item (relations: `depends_on`, `derived_from`, `constrains`, `refines`, `verifies`), records `caused_by_turn_id` provenance, deduplicates against the partial unique index, and never returns `deferred: true` from the apply contract; resolutions transition `open → resolved` idempotently. | planned: `edit-applier.test.ts`, `reconciliation-need.test.ts`, `patch-list-overlay.test.tsx`, `app.test.ts` | Acceptance Criterion 7; A88; D135, D137, D138, D146 | | I114 | The reconciliation classifier (V3.1 `run-agent` route + `classifyNeed`) walks every awaiting open `reconciliation_need` row through the lifecycle `null → queued → classifying → classified \| failed`, persists exactly one of `auto-confirm` / `auto-edit` / `substantive` into `agent_classification` on `classified`, and writes the parser error or thrown message into `agent_proposal` on `failed`; `agent_proposal` is text-only and is never auto-applied by the server (resolution actions remain user-initiated per slice 6), so an invalid label or hallucinated proposal stays recoverable via per-row Re-run. | `reconciliation-agent.test.ts`, `reconciliation-agent-route.test.ts`, `reconciliation-need.test.ts`, `reconciliation-needs-route.test.ts` | Requirement 10; A88; D139 | +| I115 | The agent capability CLI remains an adapter over Brunch capability contracts: JSONL calls validate explicit resource ids and schemas, mutating calls dispatch through server-owned capability handlers rather than ORM/route bypasses, `read` projections provide affordance hints without importing scenario briefs, and the probe runner exercises the surface only through a JSONL client. | planned: `capabilities/*.test.ts`, `agent-jsonl.test.ts`, `probe-runner.test.ts` | Requirements 42, 43; A89; D143, D147 | +| I116 | Each active/resumable chat has at most one open assistant/system-first frontier turn; user responses complete that turn through normalized proposal/response semantics, and strategy is chat-local process state rather than specification-level semantic truth. | planned: `chat-substrate.test.ts`, `turn-response-transition.test.ts`, `capabilities.test.ts` | Requirement 44; D138, D148 | +| I117 | Open proposal turns are stamped with the latest applied changeset id at creation and are conservatively stale when the specification's latest changeset advances before completion; stale proposals refresh/regenerate rather than applying against unknown graph state. | planned: `changeset.test.ts`, `turn-response-transition.test.ts`, `app.test.ts` | A92; D149 | +| I118 | Reconciliation/direct-edit cascade never infers affected endpoints from raw edge direction alone; it consults relation policy source-change / target-change behavior over incident accepted edges. | planned: `knowledge-relationship-policy.test.ts`, `edit-impact.test.ts`, `reconciliation-need.test.ts` | A93; D137, D146, D150 | +| I119 | Scenario-option candidate bundles can only become canonical by accepting a coherent bundle changeset; accepted-with-issues candidates must also create durable follow-on review/process debt so known weaknesses are not hidden. | planned: `scenario-runner.test.ts`, `turn-artifacts.test.ts`, `changeset.test.ts` | A90, A91; D151, D152 | ## Lexicon @@ -388,6 +404,9 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **scenario runner** | A lightweight pre-UI harness that runs a selected prompt scenario against fixtures, context packs, tools, and model settings, then records outputs for qualitative and structural review. Execution adapters translate this harness input into a concrete fake/model/harness call; they do not define Brunch semantics, credential UX, provider resolution, or mutation authority. | | **agent mutation surface** | The Brunch-owned typed handler layer for any durable data mutation initiated by an agent, internal or external. It is the only write entry point agents may use; handlers own schemas, authority, replay behavior, and reconciliation/changeset-ledger semantics rather than letting agents call the ORM directly. | | **agent capability contract** | A Brunch-owned typed contract addressable by agents or harnesses, with a stable id, description, input/output schemas, authority class, and replay policy. Read-only capabilities and mutating handlers can share this registry shape, but mutating contracts must route through the agent mutation surface. | +| **agent capability CLI** | A local machine-facing CLI adapter, initially a long-lived JSONL stdin/stdout process, that exposes Brunch-owned capability contracts to external agents and probe runners without defining its own product API or mutation authority. | +| **JSONL capability session** | The request/response transport between an external harness and `brunch agent`: every call includes an id, capability id, and explicit input resource identifiers; the process may keep DB/provider/in-flight runtime handles internally, but selected spec/chat/turn targets are not hidden ambient state. | +| **probe runner** | An external client of the agent capability CLI that supplies scenario briefs, calls an LLM-as-user, drives Brunch through capability calls, and writes generated transcript/spec/export/graph artifacts for human curation. It must not import Brunch DB or product handlers directly. | | **tool adapter** | A provider- or harness-specific projection of an agent capability contract into a concrete tool format such as AI SDK tools, Pi tools, CLI/TUI commands, or a future external-agent API. Adapters translate shape and transport while preserving Brunch-owned authority semantics. | | **authority class** | The contract metadata that says whether an agent capability is read-only, proposal-only, or commits durable product truth, and therefore which replay, reconciliation, and mutation boundaries govern it. | | **AI runtime provider** | The shared server seam that resolves the configured LLM provider, model names, API-key source, and provider-specific options for interviewer and observer calls. | @@ -405,6 +424,7 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **projected control card** | A workflow affordance derived from durable state rather than authored conversational content. Kickoff, recovery, and proceed / handoff controls live here. | | **kickoff card** | A projected phase-entry control card that appears whenever an open phase is in entry-pending state and requires an explicit user action before substantive interviewer progression begins. | | **frontier turn** | The single actionable durable conversational turn currently at the bottom of an open phase when the phase is in substantive elicitation rather than structural control. | +| **proposal turn** | An assistant/system-first frontier turn that offers a candidate bundle, graph-review finding, reconciliation suggestion, or other proposed action. It is not a semantic mutation until the user completes it, usually by accepting, revising, asking follow-up, deferring, regenerating, or rejecting. | | **preface card** | A turn-internal artifact that presents provisional context from interviewer-invoked context gathering, rendered above a paired question card within the same turn. The observer captures from the whole turn (preface context + question + user response) as one validated unit rather than from the preface card alone. Available in any phase when the workspace directory is present. Implementation: `preface` / `PrefaceCard` / `present_preface` tool / `data-preface` part. Renders as a simple `bg-tint` rounded box with italic subdued text, not as a DrawerCard. | | **question card** | A turn card that asks a structured interviewer question and expects a substantive user response. | | **review turn** | A full-set requirements or criteria review interaction that offers a synthesized candidate list with stable reference codes, supports per-item commenting (inline comment toggle on each item) plus one optional global review note, and persists its own `reviewActions` / `reviewSet` metadata on the turn. On `request changes`, the successor review turn carries a revision card above the new review set. | @@ -421,6 +441,7 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **recognition-first elicitation** | The strategy of helping users converge by reacting to concrete possibilities, tradeoffs, examples, and ruled-out directions rather than requiring them to author intent from scratch. | | **candidate direction** | An agent-synthesized possible specification direction offered when the user asks Brunch to fill in the rest, compare options, or react to proposed typologies. It includes rationale, implications, tradeoffs, likely generated knowledge, and what it rules out. | | **candidate-spec set** | A turn-owned interviewer artifact in grounding or design that presents one or more candidate directions for reaction-driven refinement. It is analogous to a review set in being a persisted artifact on the turn, but it proposes possible directions rather than reviewing a synthesized inventory. | +| **candidate graph bundle** | The coherent commit/review unit produced by scenario-options flows: a named scenario with tradeoff profile, generated intent items and edges, required core items, optional/swappable items, known risks, graph-review findings, provenance labels, and commit preconditions. It should be accepted or revised as a bundle rather than item-by-item unless semantic closure can be proved. | | **candidate-spec reaction** | The structured user response to a candidate-spec set, choosing whether to accept a direction, request refinement of one candidate, or regenerate a fresh set. It steers the next interview move without directly closing the phase. | | **breadth skeleton** | A turn-owned interviewer artifact used during a progressive detail pass that summarizes the current broad-pass map, highlights areas that remain shallow, and offers explicit deepening targets. | | **detail focus** | The selected area or lens for the next recursive follow-up pass. It scopes the next same-phase frontier turn without becoming a separate workflow state or durable topic tree. | @@ -469,7 +490,8 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **disambiguating example** | An `example` whose primary purpose is to settle ambiguity between plausible interpretations of a requirement, invariant, or decision. Linked through the `disambiguates` relation. Generalizes the TiCoder move beyond test cases: the interviewer generates cases where interpretations diverge, and the user's classification settles the meaning. | | **spec drift** | A divergence between an intent item's recorded meaning and the artifact (criterion, generated requirement, candidate spec, export bundle, or downstream implementation behavior) meant to satisfy it. Surfaced in human terms — "original intent vs generated behavior vs potential mismatch" — so the user can validate meaning at the point where it could have changed, rather than after the divergence has been laundered into a final document. | | **relation family** | One of five semantic groupings that organize the relation kinds in the intent graph: `justification`, `dependency`, `boundary`, `refinement`, and `verification`. Distinct from the relation `kind` itself; a single kind belongs to exactly one family. Drives prompt grouping, default policy, and observer classification heuristics. | -| **relation policy** | The per-relation, per-axis registry that decides whether each edge participates in `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, or `weak_suggestion` capabilities. Replaces the implicit assumption that every edge is equally authoritative. Gated by edge `support` (`explicit` / `strong_inference` / `weak_candidate`) and `status` (`proposed` / `accepted` / `rejected` / `stale`). | +| **relation policy** | The per-relation, per-axis registry that decides whether each edge participates in `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, or `weak_suggestion` capabilities. Replaces the implicit assumption that every edge is equally authoritative. Gated by edge `support` (`explicit` / `strong_inference` / `weak_candidate`) and `status` (`proposed` / `accepted` / `rejected` / `stale`). It also owns operational directionality: source-change and target-change behavior must be explicit rather than inferred from raw edge direction. | +| **graph-review finding** | A turn-owned structured artifact produced by graph review. It may later lead to a changeset if accepted, but it is not itself semantic truth or process debt unless represented through a follow-on turn, changeset, or reconciliation need. | | **structured list** | The first-ship graph-view layout: kind-grouped item rows with a relations footer of Outgoing / Incoming relation chips. Item-first; relationships visible inline. It currently renders the whole-spec entity set because D129 ships the whole-spec fetch first; the intended default becomes active-path items over whole-spec data once the active-path membership seam and `Show all` toggle land. | | **spatial canvas** | A deferred future graph-view layout where intent items render as nodes with visible edges in a 2D scene. Shares the projection seam and intent contract of D128 with the structured-list layout. | | **relation chip** | A compact UI element representing one intent-edge endpoint inside a relations footer, carrying the target item's reference code and content snippet. Hover reveals a preview card; click navigates to the target item via hash anchor. | @@ -478,7 +500,7 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **secondary thread** | Modal revisit conversation anchored to a primary-path turn and used to resolve cascade implications. | | **needs-revisit** | Flag meaning an item is affected by upstream invalidation and must be explicitly resolved before the specification is whole again. | | **chat** *(planned persistence seam)* | A conversation container inside one specification. The primary interview, side-chats, reconciliation chats, verifier feedback, and review discussions may all own turns without owning semantic truth directly. Phase one adds the table and transitional pointers before making chat ownership canonical. | -| **changeset** *(future persistence seam)* | Canonical term for one submitted semantic mutation bundle against the intent graph. It records what changed and why, separate from the conversational turn that may have initiated it. Supersedes `patch` as the future-facing schema/contract noun. | +| **changeset** *(future persistence seam)* | Canonical term for one submitted semantic mutation bundle against the intent graph. It records what changed and why, separate from the conversational turn that may have initiated it. A changeset is the smallest atomic unit that preserves graph coherence; proposals/findings become changesets only when accepted or otherwise acted on. Supersedes `patch` as the future-facing schema/contract noun. | | **change** *(future persistence seam)* | Canonical term for one atomic semantic mutation inside a changeset, such as `intentItem.create`, `intentItem.updateContent`, `intentEdge.create`, or `intentEdge.delete`. Supersedes `patch_change`. | | **patch / patch_change** | Historical design-doc vocabulary for changeset/change. Avoid in new schema, capability contracts, and operation ids unless referring to older docs or source-control-style analogy. | | **reconciliation need** *(planned persistence seam)* | Durable semantic debt saying existing intent-graph truth may require renewed judgment because an upstream item, relation, verifier, contradiction, or historical premise changed. Phase one stores directed item-to-item needs with narrow kind/status and provenance placeholders; later phases may add relation targets and changeset-backed cause/resolution. It is process state, not an intent edge or intent content. | diff --git a/memory/SPEC_RESTRUCTURE.md b/memory/SPEC_RESTRUCTURE.md new file mode 100644 index 00000000..8c159187 --- /dev/null +++ b/memory/SPEC_RESTRUCTURE.md @@ -0,0 +1,175 @@ +# SPEC Restructure Plan + +> Status: proposed one-off workflow doc. +> Created: 2026-05-13. +> Purpose: capture the intended cleanup for `memory/SPEC.md` before splitting this work into a separate branch / PR. Delete this file after the restructure is completed or explicitly abandoned. + +## Goal + +Make `memory/SPEC.md` lighter, more structurally resistant to branch conflicts, and clearer about what belongs in the live architecture register versus historical/product-embedded truth. + +The cleanup should preserve durable product/architecture authority while retiring rows that are already fully embedded in code, tests, or design docs. + +## Diagnosis + +`memory/SPEC.md` now mixes several kinds of truth in one long mutable document: + +1. **Stable product contract** — concept, non-goals, durable product requirements. +2. **Live uncertainty** — assumptions still awaiting validation or still shaping frontier work. +3. **Current architectural guardrails** — decisions and invariants that actively constrain near-term work. +4. **Historical embedded decisions** — shipped seams whose rationale is now code/test/design-doc truth. +5. **Future direction** — semantic/generative/agent/provider trajectories not yet productized. +6. **Verification policy and coverage** — useful, but partly over-detailed as implementation/test history. + +This creates churn because ordinary feature work edits the same numbered tables/sections, and because sequential IDs (`Requirement N`, `A##`, `D###`, `I###`) are collision-prone across branches. + +## Desired document shape + +Target structure, to be refined during the cleanup: + +```md +# Brunch v2 — Spec Elicitation Tool + +## Product Contract +### Concept +### Constraints & Non-goals +### Capability Requirements +#### Runtime & persistence +#### Interview workflow +#### Knowledge / intent graph +#### Review & export +#### Workspace / graph UI +#### Provider / agent substrate + +## Live Architecture Register +### Open Assumptions +### Active Decisions +### Critical Invariants + +## Future Direction Register +### Semantic / generative substrate +### Agent capability substrate +### Provider / workspace hardening + +## Interaction Stream Model +[keep if still actively useful, but compress or move details to design docs] + +## Layout Architecture +[compress; move design-level detail out if it is no longer needed as SPEC authority] + +## Lexicon + +## Verification Design +``` + +Principles: + +- Separate **stable product contract** from **live architecture register** from **future direction**. +- Keep `SPEC.md` as the authority for active constraints, not as the full archive of how each seam was built. +- Prefer short guardrails plus links to design docs over long design-doc-scale paragraphs. +- Do not renumber surviving tracked IDs unless the cleanup explicitly adopts a new ID scheme. +- Leave concise retirement comments for removed ID ranges when useful. + +## Assessment pass + +Classify each tracked row before editing: + +| Classification | Meaning | Action | +| --- | --- | --- | +| keep live | Still unresolved or actively constrains near-term work | Keep, possibly tighten wording | +| compress / merge | Overlaps another row or carries too much rationale | Merge into one active guardrail | +| retire embedded | Fully shipped and now protected by code/tests/design docs | Remove from live table; optionally note retired IDs in an HTML comment | +| move rationale | Valuable context but too detailed for SPEC | Keep a short SPEC guardrail and point to design doc | +| future direction | Not current product contract but shapes frontier work | Move under Future Direction Register or ensure PLAN owns it | + +### Assumptions to inspect first + +Strong candidates: + +- `A82`, `A83` — already validated; likely retire from live assumptions unless still needed as FE-701 constraints. + +Possible embedded/product-fact candidates: + +- `A51`, `A53`, `A54`, `A55` — workspace turn-card / activity / frontier projection assumptions may now be product facts or invariants. +- `A59`, `A60`, `A63` — prompt/question/header assumptions may be embedded or lower-priority watch items. +- `A64` — query invalidation may have become a concrete architectural decision/invariant if already built. +- `A66`–`A70` — graph/relation assumptions should be checked against shipped graph view and FE-700 direction. +- `A71`–`A73`, `A77`–`A81`, `A84`–`A91`, `A93` — likely still live future/semantic/generative assumptions; may move to Future Direction Register. + +### Decisions to inspect first + +Potential merge/compression clusters: + +- Runtime / stream / workflow cluster: + - `D22`, `D89`, `D93`, `D94`, `D95`, `D96`, `D110`, `D112`, `D113`, `D116`, `D121`, `D123`, `D114` + - Goal: compress overlapping turn-centered stream, projected controls, lifecycle, observer backlog, route/query ownership, and continuous workspace guardrails. + +- Graph / side-chat / semantic mutation cluster: + - `D80`, `D125`, `D134`, `D135`, `D136`, `D137`, `D138`, `D144`, `D145`, `D146`, `D149`, `D150`, `D152` + - Goal: keep current semantic direction and active changeset/reconciliation guardrails; retire or compress older side-chat/revisit wording superseded by multi-chat + reconciliation docs. + +- Prompt/context / agent capability cluster: + - `D139`, `D140`, `D141`, `D142`, `D143`, `D147` + - Goal: keep concise active guardrails for prompt/context substrate and Brunch-owned mutation surface; move implementation boundary detail to design docs where possible. + +- Candidate/scenario strategy cluster: + - `D126`, `D127`, `D148`, `D151` + - Goal: separate current product contract from future strategy/proposal direction. + +- Provider/workspace hardening cluster: + - `D130`, `D131`, `D132`, `D133` + - Goal: likely keep as active near-term frontier constraints; wording can be shorter. + +### Invariants to inspect first + +Keep only critical seam-level invariants live. + +Candidates to compress or retire: + +- Rows that primarily enumerate test filenames or implementation history rather than a reusable invariant. +- Older invariants whose protected behavior is fully covered by a broader newer invariant. +- Planned invariants for not-yet-built future work should be checked against `memory/PLAN.md`; if they only describe future acceptance criteria, PLAN may be the better home until implemented. + +Likely keep live: + +- Distribution/runtime startup invariants (`I4`, `I100`). +- Boundary/schema invariants (`I17`, `I48`, `I54`). +- Workflow/turn/lifecycle invariants (`I24`, `I72`, `I87`, `I104`, `I105`, `I108`, `I110`). +- Current frontier invariants for provider/gitignore/agent/changing semantic substrate (`I106` onward), if they still correspond to active PLAN frontier items. + +## Rewrite pass + +1. Create a branch specifically for SPEC restructuring. +2. Read `memory/SPEC.md`, `memory/PLAN.md`, and current design docs named by SPEC rows. +3. Classify rows using the assessment table above. +4. Rewrite `SPEC.md` into the target structure. +5. Preserve cross-reference integrity: + - `PLAN.md` frontier definitions still point at surviving SPEC requirements/assumptions/decisions/invariants. + - Retired IDs are not referenced by live PLAN frontier definitions unless intentionally historical. + - Design docs carry detailed rationale that SPEC no longer repeats. +6. Run link/reference checks if available, then `npm run fix` and `npm run verify` before PR. + +## Output expectations + +The completed PR should include: + +- `memory/SPEC.md` rewritten / pruned. +- Any necessary small updates to `memory/PLAN.md` traceability references caused by retired/merged SPEC rows. +- Optional updates to `ln-spec` / `ln-sync` instructions **only if** the restructure changes the intended SPEC shape. +- Deletion of this `memory/SPEC_RESTRUCTURE.md` file once its plan has been executed or superseded. + +## Non-goals + +- Do not change product behavior. +- Do not add new requirements just because there is a new section for them. +- Do not migrate to a structured generated spec registry in this pass; that remains `structured-development-spec-registry` horizon work. +- Do not rewrite design docs unless a SPEC row is moved there and the target doc needs a small anchor. +- Do not renumber surviving IDs casually. + +## Open design questions for the restructure branch + +1. Should requirements remain a single numbered sequence, or should they become grouped stable IDs by capability area? +2. Should assumptions/decisions/invariants stay as global tables/lists, or be grouped by subsystem to reduce edit conflicts? +3. Should validated assumptions be removed immediately, or retained for one release window with a retirement note? +4. How much of Interaction Stream Model and Layout Architecture still belongs in SPEC versus `docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md` and related design docs? +5. Should future direction rows live in SPEC at all, or should SPEC only link to PLAN frontier definitions and design docs for unbuilt future work? diff --git a/package.json b/package.json index 6ef49603..27d0aed0 100644 --- a/package.json +++ b/package.json @@ -37,12 +37,12 @@ "check": "npm run fmt:check && npm run lint", "dev": "agent-tail run 'vite: vite' 'api: npx tsx --watch src/server/index.ts'", "fix": "npm run lint:fix && npm run fmt", - "fmt": "oxfmt src/ config/ .ladle/ vite.config.ts drizzle.config.ts", - "fmt:check": "oxfmt --check src/ config/ .ladle/ vite.config.ts drizzle.config.ts", + "fmt": "oxfmt src/ scripts/ config/ .ladle/ vite.config.ts drizzle.config.ts", + "fmt:check": "oxfmt --check src/ scripts/ config/ .ladle/ vite.config.ts drizzle.config.ts", "ladle": "ladle serve", "ladle:build": "ladle build", - "lint": "oxlint --type-aware --type-check src/ config/ .ladle/ vite.config.ts drizzle.config.ts", - "lint:fix": "oxlint --type-aware --type-check --fix src/ config/ .ladle/ vite.config.ts drizzle.config.ts", + "lint": "oxlint --type-aware --type-check src/ scripts/ config/ .ladle/ vite.config.ts drizzle.config.ts", + "lint:fix": "oxlint --type-aware --type-check --fix src/ scripts/ config/ .ladle/ vite.config.ts drizzle.config.ts", "release": "release-it", "seed": "npx tsx src/server/fixtures/seed.ts", "server": "npx tsx src/server/index.ts", diff --git a/scripts/agent-probes/fixture-candidate.test.ts b/scripts/agent-probes/fixture-candidate.test.ts new file mode 100644 index 00000000..0f8073e9 --- /dev/null +++ b/scripts/agent-probes/fixture-candidate.test.ts @@ -0,0 +1,187 @@ +import { mkdirSync, mkdtempSync, readFileSync, rmSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { afterEach, describe, expect, it } from 'vitest'; + +import { inspectFixtureCandidate } from './fixture-candidate.js'; + +describe('fixture candidate checkpoint', () => { + const tempDirs: string[] = []; + + afterEach(() => { + for (const dir of tempDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }); + } + }); + + function makeTempDir(prefix: string): string { + const dir = mkdtempSync(join(tmpdir(), prefix)); + tempDirs.push(dir); + return dir; + } + + it('reports a complete artifact directory as parse-ready and structure-ready with normalization debt', () => { + const dir = makeTempDir('brunch-fixture-complete-'); + writeCandidate(dir, { includeWorkspaceState: true }); + + const report = inspectFixtureCandidate(dir, { expectWorkspaceState: true }); + + expect(report.parseReady).toBe(true); + expect(report.structureReady).toBe(true); + expect(report.files).toMatchObject({ + 'artifact-bundle.json': { present: true, validJson: true }, + 'summary.json': { present: true, validJson: true }, + 'raw-jsonl.ndjson': { present: true, validJson: true }, + 'final-chat.json': { present: true, validJson: true }, + }); + expect(report.workspaceState).toEqual({ + expected: true, + present: true, + path: join(dir, 'workspace-state'), + }); + expect(report.normalizationDebt).toEqual( + expect.arrayContaining([ + 'summary.durationMs', + 'artifact-bundle.environment.nodeVersion', + 'artifact-bundle.environment.platform', + 'artifact-bundle.environment.arch', + 'artifact-bundle.workspace.cwd', + 'artifact-bundle.workspace.preservedStatePath', + 'artifact-bundle.summary.durationMs', + 'raw-jsonl request/response ids and resource ids', + 'final-chat generated question wording', + ]), + ); + expect(report.errors).toEqual([]); + }); + + it('flags a missing expected workspace-state fixture', () => { + const dir = makeTempDir('brunch-fixture-missing-workspace-'); + writeCandidate(dir, { includeWorkspaceState: false }); + + const report = inspectFixtureCandidate(dir, { expectWorkspaceState: true }); + + expect(report.parseReady).toBe(true); + expect(report.structureReady).toBe(false); + expect(report.workspaceState).toEqual({ + expected: true, + present: false, + path: join(dir, 'workspace-state'), + }); + expect(report.errors).toContain('workspace-state is missing'); + }); + + it('rejects parseable artifacts with invalid structure or inconsistent duplicated fields', () => { + const dir = makeTempDir('brunch-fixture-invalid-'); + writeCandidate(dir, { includeWorkspaceState: false }); + const bundlePath = join(dir, 'artifact-bundle.json'); + const bundle = JSON.parse(readFileSync(bundlePath, 'utf8')) as Record; + bundle.schemaVersion = 2; + bundle.summary = { turnsAnswered: 'two' }; + bundle.finalChat = null; + bundle.rawJsonlTranscript = []; + delete bundle.commandSequence; + writeFileSync(bundlePath, `${JSON.stringify(bundle, null, 2)}\n`); + + const report = inspectFixtureCandidate(dir); + + expect(report.parseReady).toBe(true); + expect(report.structureReady).toBe(false); + expect(report.errors).toEqual( + expect.arrayContaining([ + 'artifact-bundle.json schemaVersion must be 1', + 'artifact-bundle.json commandSequence must be an array', + 'artifact-bundle.summary does not match summary.json', + 'artifact-bundle.finalChat does not match final-chat.json', + 'artifact-bundle.rawJsonlTranscript does not match raw-jsonl.ndjson', + ]), + ); + }); + + it('accepts an error-run candidate while reporting failure status and normalization debt', () => { + const dir = makeTempDir('brunch-fixture-error-run-'); + writeCandidate(dir, { includeWorkspaceState: false, errorRun: true }); + + const report = inspectFixtureCandidate(dir, { expectWorkspaceState: false }); + + expect(report.parseReady).toBe(true); + expect(report.structureReady).toBe(true); + expect(report.runStatus).toEqual({ kind: 'error-run', turnsAnswered: 0, errorCount: 1 }); + expect(report.workspaceState).toEqual({ + expected: false, + present: false, + path: join(dir, 'workspace-state'), + }); + expect(report.normalizationDebt).toContain('error messages may need provider-specific redaction review'); + expect(report.errors).toEqual([]); + }); +}); + +function writeCandidate( + dir: string, + { includeWorkspaceState, errorRun = false }: { includeWorkspaceState: boolean; errorRun?: boolean }, +): void { + const summary = { + turnsAnswered: errorRun ? 0 : 2, + finalFrontierState: errorRun ? 'awaiting_response' : 'answered', + durationMs: 23446, + questionAnswers: errorRun + ? [] + : [ + { + question: 'What is this project?', + answer: 'A repeatable fixture candidate.', + }, + ], + errors: errorRun + ? [ + { + requestId: 'policy-1', + capability: 'probe.responsePolicy', + code: 'policy_failed', + message: 'redacted', + }, + ] + : [], + }; + const finalChat = { + frontier: { state: summary.finalFrontierState, turnId: 101 }, + turns: summary.questionAnswers.map((pair, index) => ({ id: index + 100, ...pair })), + }; + const bundle = { + schemaVersion: 1, + scenario: { name: 'candidate', brief: 'fixture brief', specName: 'Fixture spec' }, + workspace: { + cwd: '/var/folders/example/brunch-probe-workspace-abc123', + preservedStatePath: includeWorkspaceState ? join(dir, 'workspace-state') : null, + }, + commandSequence: ['spec.create', 'chat.getPrimary', 'chat.ensureReady'], + rawJsonlTranscript: [ + { + direction: 'request', + payload: { id: 'create', capability: 'spec.create', input: { name: 'Fixture spec' } }, + }, + { direction: 'response', payload: { id: 'create', ok: true, output: { specId: 1 } } }, + ], + parsedEvents: [], + finalChat, + summary, + errors: summary.errors, + simulatedUserEvents: [], + environment: { nodeVersion: 'v24.15.0', platform: 'darwin', arch: 'arm64' }, + }; + + writeFileSync(join(dir, 'summary.json'), `${JSON.stringify(summary, null, 2)}\n`); + writeFileSync(join(dir, 'final-chat.json'), `${JSON.stringify(finalChat, null, 2)}\n`); + writeFileSync(join(dir, 'artifact-bundle.json'), `${JSON.stringify(bundle, null, 2)}\n`); + writeFileSync( + join(dir, 'raw-jsonl.ndjson'), + `${bundle.rawJsonlTranscript.map((entry) => JSON.stringify(entry)).join('\n')}\n`, + ); + + if (includeWorkspaceState) { + mkdirSync(join(dir, 'workspace-state', '.brunch'), { recursive: true }); + writeFileSync(join(dir, 'workspace-state', '.brunch', 'brunch.db'), 'sqlite'); + } +} diff --git a/scripts/agent-probes/fixture-candidate.ts b/scripts/agent-probes/fixture-candidate.ts new file mode 100644 index 00000000..bec95ec4 --- /dev/null +++ b/scripts/agent-probes/fixture-candidate.ts @@ -0,0 +1,297 @@ +import { existsSync, readFileSync } from 'node:fs'; +import { join } from 'node:path'; + +export interface FixtureCandidateFileReport { + present: boolean; + validJson: boolean | null; +} + +export interface FixtureCandidateReport { + parseReady: boolean; + structureReady: boolean; + files: Record; + workspaceState: { expected: boolean; present: boolean; path: string }; + runStatus: { kind: 'completed' | 'error-run'; turnsAnswered: number; errorCount: number } | null; + normalizationDebt: string[]; + errors: string[]; +} + +export function inspectFixtureCandidate( + artifactDir: string, + { expectWorkspaceState = false }: { expectWorkspaceState?: boolean } = {}, +): FixtureCandidateReport { + const errors: string[] = []; + const files = { + 'artifact-bundle.json': inspectJsonFile(join(artifactDir, 'artifact-bundle.json'), errors), + 'summary.json': inspectJsonFile(join(artifactDir, 'summary.json'), errors), + 'raw-jsonl.ndjson': inspectNdjsonFile(join(artifactDir, 'raw-jsonl.ndjson'), errors), + 'final-chat.json': inspectJsonFile(join(artifactDir, 'final-chat.json'), errors), + }; + const workspaceStatePath = join(artifactDir, 'workspace-state'); + const workspaceState = { + expected: expectWorkspaceState, + present: existsSync(workspaceStatePath), + path: workspaceStatePath, + }; + if (expectWorkspaceState && !workspaceState.present) { + errors.push('workspace-state is missing'); + } + + const parseReady = Object.values(files).every((file) => file.present && file.validJson !== false); + const summary = readJson(join(artifactDir, 'summary.json')); + const bundle = readJson(join(artifactDir, 'artifact-bundle.json')); + const finalChat = readJson(join(artifactDir, 'final-chat.json')); + const rawJsonlTranscript = readNdjson(join(artifactDir, 'raw-jsonl.ndjson')); + validateCandidateStructure({ bundle, summary, finalChat, rawJsonlTranscript, errors }); + const runStatus = getRunStatus(summary); + const normalizationDebt = collectNormalizationDebt({ + bundle, + summary, + hasErrors: runStatus?.kind === 'error-run', + }); + + return { + parseReady, + structureReady: parseReady && errors.length === 0, + files, + workspaceState, + runStatus, + normalizationDebt, + errors, + }; +} + +function inspectJsonFile(path: string, errors: string[]): FixtureCandidateFileReport { + if (!existsSync(path)) { + errors.push(`${fileName(path)} is missing`); + return { present: false, validJson: null }; + } + + try { + JSON.parse(readFileSync(path, 'utf8')); + return { present: true, validJson: true }; + } catch { + errors.push(`${fileName(path)} is not valid JSON`); + return { present: true, validJson: false }; + } +} + +function inspectNdjsonFile(path: string, errors: string[]): FixtureCandidateFileReport { + if (!existsSync(path)) { + errors.push(`${fileName(path)} is missing`); + return { present: false, validJson: null }; + } + + const lines = readFileSync(path, 'utf8') + .split('\n') + .filter((line) => line.trim() !== ''); + try { + for (const line of lines) { + JSON.parse(line); + } + return { present: true, validJson: true }; + } catch { + errors.push(`${fileName(path)} contains invalid NDJSON`); + return { present: true, validJson: false }; + } +} + +function readJson(path: string): unknown { + if (!existsSync(path)) { + return null; + } + try { + return JSON.parse(readFileSync(path, 'utf8')); + } catch { + return null; + } +} + +function readNdjson(path: string): unknown[] | null { + if (!existsSync(path)) { + return null; + } + try { + return readFileSync(path, 'utf8') + .split('\n') + .filter((line) => line.trim() !== '') + .map((line) => JSON.parse(line) as unknown); + } catch { + return null; + } +} + +function validateCandidateStructure({ + bundle, + summary, + finalChat, + rawJsonlTranscript, + errors, +}: { + bundle: unknown; + summary: unknown; + finalChat: unknown; + rawJsonlTranscript: unknown[] | null; + errors: string[]; +}): void { + validateSummaryStructure(summary, errors); + validateBundleStructure(bundle, errors); + + if (isRecord(bundle)) { + if (!deepEqual(bundle.summary, summary)) { + errors.push('artifact-bundle.summary does not match summary.json'); + } + if (!deepEqual(bundle.finalChat, finalChat)) { + errors.push('artifact-bundle.finalChat does not match final-chat.json'); + } + if (!Array.isArray(rawJsonlTranscript) || !deepEqual(bundle.rawJsonlTranscript, rawJsonlTranscript)) { + errors.push('artifact-bundle.rawJsonlTranscript does not match raw-jsonl.ndjson'); + } + } +} + +function validateSummaryStructure(summary: unknown, errors: string[]): void { + if (!isRecord(summary)) { + errors.push('summary.json is not an object'); + return; + } + + requireField(summary, 'turnsAnswered', 'number', 'summary.json', errors); + if (typeof summary.finalFrontierState !== 'string' && summary.finalFrontierState !== null) { + errors.push('summary.json finalFrontierState must be a string or null'); + } + requireField(summary, 'durationMs', 'number', 'summary.json', errors); + requireArrayField(summary, 'questionAnswers', 'summary.json', errors); + requireArrayField(summary, 'errors', 'summary.json', errors); +} + +function validateBundleStructure(bundle: unknown, errors: string[]): void { + if (!isRecord(bundle)) { + errors.push('artifact-bundle.json is not an object'); + return; + } + + if (bundle.schemaVersion !== 1) { + errors.push('artifact-bundle.json schemaVersion must be 1'); + } + requireRecordField(bundle, 'scenario', 'artifact-bundle.json', errors); + requireRecordField(bundle, 'workspace', 'artifact-bundle.json', errors); + requireArrayField(bundle, 'commandSequence', 'artifact-bundle.json', errors); + requireArrayField(bundle, 'rawJsonlTranscript', 'artifact-bundle.json', errors); + requireArrayField(bundle, 'parsedEvents', 'artifact-bundle.json', errors); + if (!('finalChat' in bundle)) { + errors.push('artifact-bundle.json finalChat is missing'); + } + requireRecordField(bundle, 'summary', 'artifact-bundle.json', errors); + requireArrayField(bundle, 'errors', 'artifact-bundle.json', errors); + requireArrayField(bundle, 'simulatedUserEvents', 'artifact-bundle.json', errors); + requireRecordField(bundle, 'environment', 'artifact-bundle.json', errors); +} + +function requireField( + record: Record, + field: string, + type: 'number' | 'string', + label: string, + errors: string[], +): void { + if (typeof record[field] !== type) { + errors.push(`${label} ${field} must be a ${type}`); + } +} + +function requireArrayField( + record: Record, + field: string, + label: string, + errors: string[], +): void { + if (!Array.isArray(record[field])) { + errors.push(`${label} ${field} must be an array`); + } +} + +function requireRecordField( + record: Record, + field: string, + label: string, + errors: string[], +): void { + if (!isRecord(record[field])) { + errors.push(`${label} ${field} must be an object`); + } +} + +function deepEqual(left: unknown, right: unknown): boolean { + return JSON.stringify(left) === JSON.stringify(right); +} + +function getRunStatus(summary: unknown): FixtureCandidateReport['runStatus'] { + if (!isRecord(summary)) { + return null; + } + const turnsAnswered = typeof summary.turnsAnswered === 'number' ? summary.turnsAnswered : 0; + const errors = Array.isArray(summary.errors) ? summary.errors : []; + return { + kind: errors.length > 0 ? 'error-run' : 'completed', + turnsAnswered, + errorCount: errors.length, + }; +} + +function collectNormalizationDebt({ + bundle, + summary, + hasErrors, +}: { + bundle: unknown; + summary: unknown; + hasErrors: boolean; +}): string[] { + const debt = new Set(); + + if (isRecord(summary) && typeof summary.durationMs === 'number') { + debt.add('summary.durationMs'); + } + + if (isRecord(bundle)) { + const environment = isRecord(bundle.environment) ? bundle.environment : null; + if (environment) { + if (typeof environment.nodeVersion === 'string') debt.add('artifact-bundle.environment.nodeVersion'); + if (typeof environment.platform === 'string') debt.add('artifact-bundle.environment.platform'); + if (typeof environment.arch === 'string') debt.add('artifact-bundle.environment.arch'); + } + + const workspace = isRecord(bundle.workspace) ? bundle.workspace : null; + if (workspace) { + if (typeof workspace.cwd === 'string') debt.add('artifact-bundle.workspace.cwd'); + if (typeof workspace.preservedStatePath === 'string') { + debt.add('artifact-bundle.workspace.preservedStatePath'); + } + } + + const bundleSummary = isRecord(bundle.summary) ? bundle.summary : null; + if (bundleSummary && typeof bundleSummary.durationMs === 'number') { + debt.add('artifact-bundle.summary.durationMs'); + } + + if (Array.isArray(bundle.rawJsonlTranscript) && bundle.rawJsonlTranscript.length > 0) { + debt.add('raw-jsonl request/response ids and resource ids'); + } + } + + debt.add('final-chat generated question wording'); + if (hasErrors) { + debt.add('error messages may need provider-specific redaction review'); + } + + return [...debt]; +} + +function fileName(path: string): string { + return path.split('/').at(-1) ?? path; +} + +function isRecord(value: unknown): value is Record { + return typeof value === 'object' && value !== null && !Array.isArray(value); +} diff --git a/scripts/agent-probes/llm-user.test.ts b/scripts/agent-probes/llm-user.test.ts new file mode 100644 index 00000000..b5d12426 --- /dev/null +++ b/scripts/agent-probes/llm-user.test.ts @@ -0,0 +1,202 @@ +import { describe, expect, it } from 'vitest'; + +import { createModelBackedUserPolicy, type SimulatedUserModelAdapter } from './llm-user.js'; +import { buildProbeArtifactBundle, runScriptedProbe, type JsonlTransport } from './probe-runner.js'; +import type { ProbeJsonlRequest, ProbeJsonlResponse, SimulatedUserEvent } from './probe-runner.js'; + +describe('model-backed simulated user policy', () => { + it('renders a strict JSON prompt with scenario, active question, options, and prior Q/A', async () => { + const prompts: string[] = []; + const events: SimulatedUserEvent[] = []; + const model: SimulatedUserModelAdapter = { + async generateText(prompt) { + prompts.push(prompt); + return JSON.stringify({ kind: 'free-text', freeText: 'I want a spec assistant.' }); + }, + }; + + const result = await runScriptedProbe({ + transport: createOneTurnTransport(), + scenario: { name: 'llm-user', specName: 'LLM user proof', brief: 'A tired founder wants help.' }, + scriptedAnswers: [], + responsePolicy: createModelBackedUserPolicy({ model, events }), + simulatedUserEvents: events, + }); + + expect(prompts[0]).toContain('You are simulating the user, not the interviewer.'); + expect(prompts[0]).toContain('A tired founder wants help.'); + expect(prompts[0]).toContain('What are you building?'); + expect(prompts[0]).toContain('Earlier answered turns'); + expect(prompts[1]).toContain('0. Acceptance criteria'); + expect(prompts[1]).toContain('Q: What are you building?'); + expect(prompts[1]).toContain('A: I want a spec assistant.'); + expect(result.summary.turnsAnswered).toBe(2); + }); + + it('parses valid model JSON into free-text and option-selection response payloads', async () => { + const events: SimulatedUserEvent[] = []; + const outputs = [ + JSON.stringify({ kind: 'free-text', freeText: 'A graph-first spec tool' }), + JSON.stringify({ kind: 'select-options', positions: [0] }), + ]; + const model: SimulatedUserModelAdapter = { + async generateText() { + return outputs.shift() ?? '{}'; + }, + }; + const requests: ProbeJsonlRequest[] = []; + + const result = await runScriptedProbe({ + transport: createOneTurnTransport(requests), + scenario: { name: 'parse', specName: 'Parse proof' }, + scriptedAnswers: [], + responsePolicy: createModelBackedUserPolicy({ model, events }), + simulatedUserEvents: events, + }); + + expect(requests[4]).toMatchObject({ + capability: 'turn.submitResponse', + input: { response: { kind: 'free-text', freeText: 'A graph-first spec tool' } }, + }); + expect(requests[8]).toMatchObject({ + capability: 'turn.submitResponse', + input: { response: { kind: 'select-options', positions: [0] } }, + }); + expect(result.errors).toEqual([]); + }); + + it('records simulated-user prompt artifacts and parse status in the artifact bundle', async () => { + const events: SimulatedUserEvent[] = []; + const model: SimulatedUserModelAdapter = { + async generateText() { + return JSON.stringify({ kind: 'free-text', freeText: 'Preserve prompt artifacts' }); + }, + }; + + const result = await runScriptedProbe({ + transport: createOneTurnTransport(), + scenario: { name: 'artifact', specName: 'Artifact proof' }, + scriptedAnswers: [], + responsePolicy: createModelBackedUserPolicy({ model, events }), + simulatedUserEvents: events, + }); + + const bundle = buildProbeArtifactBundle(result); + expect(bundle.simulatedUserEvents[0]).toMatchObject({ + turnId: 100, + status: 'parsed', + parsedResponse: { kind: 'free-text', freeText: 'Preserve prompt artifacts' }, + }); + expect(bundle.simulatedUserEvents[0]?.prompt).toContain('Return exactly one JSON object'); + expect(bundle.simulatedUserEvents[0]?.rawModelOutput).toContain('Preserve prompt artifacts'); + }); + + it('turns invalid model output into a structured probe error', async () => { + const events: SimulatedUserEvent[] = []; + const model: SimulatedUserModelAdapter = { + async generateText() { + return 'not json'; + }, + }; + + const result = await runScriptedProbe({ + transport: createOneTurnTransport(), + scenario: { name: 'bad-json', specName: 'Bad JSON proof' }, + scriptedAnswers: [], + responsePolicy: createModelBackedUserPolicy({ model, events }), + simulatedUserEvents: events, + }); + + expect(result.summary.turnsAnswered).toBe(0); + expect(result.errors).toEqual([ + { + requestId: 'policy-1', + capability: 'probe.responsePolicy', + code: 'policy_failed', + message: 'Simulated user returned invalid JSON', + }, + ]); + expect(result.simulatedUserEvents[0]).toMatchObject({ status: 'failed', rawModelOutput: 'not json' }); + }); +}); + +function createOneTurnTransport(requests: ProbeJsonlRequest[] = []): JsonlTransport { + return { + async send(request) { + requests.push(request); + return getFakeAgentResponse(request); + }, + }; +} + +function getFakeAgentResponse(request: ProbeJsonlRequest): ProbeJsonlResponse { + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + if (request.capability === 'chat.getPrimary') { + return { id: request.id, ok: true, output: { chatId: 10 } }; + } + if (request.capability === 'chat.ensureReady') { + const turnId = request.id === 'ready-1' ? 100 : 101; + return { id: request.id, ok: true, output: { chatId: 10, state: 'awaiting_response', turnId } }; + } + if (request.id === 'read-1') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', turnId: 100 }, + turns: [{ id: 100, question: 'What are you building?', answer: null, options: [] }], + }, + }; + } + if (request.id === 'read-2') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', turnId: 100 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'I want a spec assistant.', options: [] }, + ], + }, + }; + } + if (request.id === 'read-3') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'I want a spec assistant.', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: null, + options: [{ position: 0, content: 'Acceptance criteria' }], + }, + ], + }, + }; + } + if (request.id === 'read-4') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'I want a spec assistant.', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: 'Acceptance criteria', + options: [], + }, + ], + }, + }; + } + return { id: request.id, ok: true, output: { ok: true } }; +} diff --git a/scripts/agent-probes/llm-user.ts b/scripts/agent-probes/llm-user.ts new file mode 100644 index 00000000..d4bf4955 --- /dev/null +++ b/scripts/agent-probes/llm-user.ts @@ -0,0 +1,124 @@ +import type { + ProbeResponsePolicy, + ProbeResponsePolicyInput, + ProbeTurnResponse, + SimulatedUserEvent, +} from './probe-runner.js'; + +export interface SimulatedUserModelAdapter { + generateText(prompt: string): Promise; +} + +export function createModelBackedUserPolicy({ + model, + events, +}: { + model: SimulatedUserModelAdapter; + events: SimulatedUserEvent[]; +}): ProbeResponsePolicy { + return async (input) => { + const prompt = renderSimulatedUserPrompt(input); + const rawModelOutput = await model.generateText(prompt); + + try { + const parsedResponse = parseSimulatedUserResponse(rawModelOutput, input); + events.push({ + turnId: input.activeTurn.id, + prompt, + rawModelOutput, + parsedResponse, + status: 'parsed', + error: null, + }); + return parsedResponse; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + events.push({ + turnId: input.activeTurn.id, + prompt, + rawModelOutput, + parsedResponse: null, + status: 'failed', + error: message, + }); + throw error; + } + }; +} + +function renderSimulatedUserPrompt(input: ProbeResponsePolicyInput): string { + const options = input.activeTurn.options?.length + ? input.activeTurn.options.map((option) => `${option.position}. ${option.content}`).join('\n') + : 'No options are available; answer with free text.'; + const priorTurns = input.priorAnsweredTurns.length + ? input.priorAnsweredTurns.map((turn) => `Q: ${turn.question}\nA: ${turn.answer ?? ''}`).join('\n\n') + : 'None yet.'; + + return [ + 'You are simulating the user, not the interviewer.', + 'Answer only as the user described by the scenario. Do not invent product state outside the prompt.', + 'Return exactly one JSON object and no Markdown.', + '', + 'Allowed response JSON:', + '- Free text: {"kind":"free-text","freeText":"your answer"}', + '- Option selection: {"kind":"select-options","positions":[0]}', + '', + `Scenario brief: ${input.scenario.brief ?? 'No scenario brief provided.'}`, + `Specification name: ${input.scenario.specName}`, + '', + 'Earlier answered turns:', + priorTurns, + '', + 'Active question:', + input.activeTurn.question, + '', + 'Options:', + options, + ].join('\n'); +} + +function parseSimulatedUserResponse( + rawModelOutput: string, + input: ProbeResponsePolicyInput, +): ProbeTurnResponse { + let parsed: unknown; + try { + parsed = JSON.parse(rawModelOutput); + } catch { + throw new Error('Simulated user returned invalid JSON'); + } + + if (!isRecord(parsed) || typeof parsed.kind !== 'string') { + throw new Error('Simulated user response did not match an allowed response shape'); + } + + if (parsed.kind === 'free-text') { + if (typeof parsed.freeText !== 'string' || parsed.freeText.trim() === '') { + throw new Error('Simulated user free-text response was empty or invalid'); + } + return { kind: 'free-text', freeText: parsed.freeText }; + } + + if (parsed.kind === 'select-options') { + if ( + !Array.isArray(parsed.positions) || + parsed.positions.some((position) => typeof position !== 'number') + ) { + throw new Error('Simulated user option response had invalid positions'); + } + const allowedPositions = new Set(input.activeTurn.options?.map((option) => option.position) ?? []); + if ( + parsed.positions.length === 0 || + parsed.positions.some((position) => !allowedPositions.has(position)) + ) { + throw new Error('Simulated user option response selected unavailable positions'); + } + return { kind: 'select-options', positions: parsed.positions }; + } + + throw new Error('Simulated user response did not match an allowed response shape'); +} + +function isRecord(value: unknown): value is Record { + return typeof value === 'object' && value !== null && !Array.isArray(value); +} diff --git a/scripts/agent-probes/packaged-smoke.test.ts b/scripts/agent-probes/packaged-smoke.test.ts new file mode 100644 index 00000000..f22f1bbe --- /dev/null +++ b/scripts/agent-probes/packaged-smoke.test.ts @@ -0,0 +1,189 @@ +import { mkdtempSync, readFileSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join, resolve } from 'node:path'; + +import { afterEach, describe, expect, it } from 'vitest'; + +import { formatSmokeSummary, runPackagedLlmUserSmoke } from './packaged-smoke.js'; +import type { ProbeJsonlRequest, ProbeJsonlResponse, SpawnedJsonlProcess } from './probe-runner.js'; + +describe('packaged LLM-as-user smoke helper', () => { + const tempDirs: string[] = []; + + afterEach(() => { + for (const dir of tempDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }); + } + }); + + function makeTempDir(prefix: string): string { + const dir = mkdtempSync(join(tmpdir(), prefix)); + tempDirs.push(dir); + return dir; + } + + it('builds first, drives the packaged agent command, preserves fixture state, and returns JSON-only summary data', async () => { + const outputDir = makeTempDir('brunch-smoke-output-'); + const buildCommands: Array<{ command: string; args: string[] }> = []; + const spawnedCommands: Array<{ command: string; args: string[]; cwd: string }> = []; + + const summary = await runPackagedLlmUserSmoke({ + outputDir, + model: { + async generateText(prompt) { + if (prompt.includes('Options:') && prompt.includes('0. Acceptance criteria')) { + return JSON.stringify({ kind: 'select-options', positions: [0] }); + } + return JSON.stringify({ kind: 'free-text', freeText: 'A smoke-test spec assistant' }); + }, + }, + async runBuildCommand(command, args) { + buildCommands.push({ command, args }); + }, + spawnProcess(options) { + spawnedCommands.push({ command: options.command, args: options.args, cwd: options.cwd }); + return createFakeAgentProcess(); + }, + }); + + expect(buildCommands).toEqual([{ command: 'npm', args: ['run', 'build'] }]); + expect(spawnedCommands).toEqual([ + { + command: process.execPath, + args: [resolve('bin/brunch.js'), 'agent'], + cwd: expect.stringContaining('brunch-probe-workspace-'), + }, + ]); + expect(summary).toEqual({ + outputDir, + turnsAnswered: 2, + finalFrontierState: 'answered', + errors: [], + }); + expect(JSON.parse(formatSmokeSummary(summary))).toEqual(summary); + expect(readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8')).toContain('simulatedUserEvents'); + expect(readFileSync(join(outputDir, 'summary.json'), 'utf8')).toContain('"turnsAnswered": 2'); + }); + + it('writes redacted failure artifacts and returns JSON-safe errors when the model fails', async () => { + const outputDir = makeTempDir('brunch-smoke-failure-'); + + const summary = await runPackagedLlmUserSmoke({ + outputDir, + model: { + async generateText() { + throw new Error('Provider failed with ANTHROPIC_API_KEY=sk-ant-secret-value\nstack'); + }, + }, + async runBuildCommand() {}, + spawnProcess() { + return createFakeAgentProcess(); + }, + }); + + expect(summary).toEqual({ + outputDir, + turnsAnswered: 0, + finalFrontierState: 'awaiting_response', + errors: [ + { + requestId: 'policy-1', + capability: 'probe.responsePolicy', + code: 'policy_failed', + message: 'Provider failed with ANTHROPIC_API_KEY=[redacted]', + }, + ], + }); + expect(readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8')).not.toContain( + 'sk-ant-secret-value', + ); + }); +}); + +function createFakeAgentProcess(): SpawnedJsonlProcess { + let onStdoutData: ((chunk: string) => void) | null = null; + + return { + writeStdin(line) { + const request = JSON.parse(line) as ProbeJsonlRequest; + const response = getFakeAgentResponse(request); + onStdoutData?.(`${JSON.stringify(response)}\n`); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; +} + +function getFakeAgentResponse(request: ProbeJsonlRequest): ProbeJsonlResponse { + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + if (request.capability === 'chat.getPrimary') { + return { id: request.id, ok: true, output: { chatId: 10 } }; + } + if (request.capability === 'chat.ensureReady') { + const turnId = request.id === 'ready-1' ? 100 : 101; + return { id: request.id, ok: true, output: { chatId: 10, state: 'awaiting_response', turnId } }; + } + if (request.id === 'read-1') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', turnId: 100 }, + turns: [{ id: 100, question: 'What are you building?', answer: null, options: [] }], + }, + }; + } + if (request.id === 'read-2') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', turnId: 100 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A smoke-test spec assistant', options: [] }, + ], + }, + }; + } + if (request.id === 'read-3') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A smoke-test spec assistant', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: null, + options: [{ position: 0, content: 'Acceptance criteria' }], + }, + ], + }, + }; + } + if (request.id === 'read-4') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A smoke-test spec assistant', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: 'Acceptance criteria', + options: [], + }, + ], + }, + }; + } + return { id: request.id, ok: true, output: { ok: true } }; +} diff --git a/scripts/agent-probes/packaged-smoke.ts b/scripts/agent-probes/packaged-smoke.ts new file mode 100644 index 00000000..5072f1eb --- /dev/null +++ b/scripts/agent-probes/packaged-smoke.ts @@ -0,0 +1,74 @@ +import { spawn } from 'node:child_process'; + +import { createModelBackedUserPolicy, type SimulatedUserModelAdapter } from './llm-user.js'; +import { + runProcessBackedProbe, + type ProbeProcessSpawner, + type ProbeRunError, + type SimulatedUserEvent, +} from './probe-runner.js'; + +export interface PackagedLlmUserSmokeSummary { + outputDir: string; + turnsAnswered: number; + finalFrontierState: string | null; + errors: ProbeRunError[]; +} + +export type SmokeBuildCommandRunner = (command: string, args: string[]) => Promise; + +export interface PackagedLlmUserSmokeOptions { + outputDir: string; + model: SimulatedUserModelAdapter; + runBuildCommand?: SmokeBuildCommandRunner; + spawnProcess?: ProbeProcessSpawner; +} + +export async function runPackagedLlmUserSmoke({ + outputDir, + model, + runBuildCommand = runCommand, + spawnProcess, +}: PackagedLlmUserSmokeOptions): Promise { + await runBuildCommand('npm', ['run', 'build']); + + const simulatedUserEvents: SimulatedUserEvent[] = []; + const result = await runProcessBackedProbe({ + scenario: { + name: 'packaged-llm-user-smoke', + specName: 'LLM user smoke fixture candidate', + brief: 'Answer as a concise user who wants Brunch to help clarify a software specification.', + }, + scriptedAnswers: [], + outputDir, + preserveWorkspaceState: true, + responsePolicy: createModelBackedUserPolicy({ model, events: simulatedUserEvents }), + simulatedUserEvents, + spawnProcess, + }); + + return { + outputDir, + turnsAnswered: result.summary.turnsAnswered, + finalFrontierState: result.summary.finalFrontierState, + errors: result.errors, + }; +} + +export function formatSmokeSummary(summary: PackagedLlmUserSmokeSummary): string { + return `${JSON.stringify(summary)}\n`; +} + +function runCommand(command: string, args: string[]): Promise { + return new Promise((resolveCommand, rejectCommand) => { + const child = spawn(command, args, { stdio: 'inherit' }); + child.on('error', rejectCommand); + child.on('exit', (code) => { + if (code === 0) { + resolveCommand(); + return; + } + rejectCommand(new Error(`${command} ${args.join(' ')} exited with code ${code ?? 'unknown'}`)); + }); + }); +} diff --git a/scripts/agent-probes/probe-runner.test.ts b/scripts/agent-probes/probe-runner.test.ts new file mode 100644 index 00000000..249f61b0 --- /dev/null +++ b/scripts/agent-probes/probe-runner.test.ts @@ -0,0 +1,817 @@ +import { + existsSync, + mkdirSync, + mkdtempSync, + readFileSync, + readdirSync, + rmSync, + writeFileSync, +} from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { afterEach, describe, expect, it } from 'vitest'; + +import { + createProcessJsonlTransport, + runProcessBackedProbe, + runScriptedProbe, + type JsonlTransport, + type ProbeJsonlRequest, + type ProbeJsonlResponse, + type SpawnedJsonlProcess, +} from './probe-runner.js'; + +describe('probe runner', () => { + const tempDirs: string[] = []; + + afterEach(() => { + for (const dir of tempDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }); + } + }); + + function makeTempDir(prefix: string): string { + const dir = mkdtempSync(join(tmpdir(), prefix)); + tempDirs.push(dir); + return dir; + } + + it('drives two interview responses through an injected JSONL transport', async () => { + const requests: ProbeJsonlRequest[] = []; + const transport: JsonlTransport = { + async send(request) { + requests.push(request); + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + if (request.capability === 'chat.getPrimary') { + return { + id: request.id, + ok: true, + output: { specId: 1, chatId: 10, kind: 'interview', activeTurnId: null }, + }; + } + if (request.id === 'ready-1') { + return { + id: request.id, + ok: true, + output: { chatId: 10, specId: 1, state: 'awaiting_response', turnId: 100 }, + }; + } + if (request.id === 'read-1') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: 100 }, + turns: [{ id: 100, question: 'What are you building?', answer: null, options: [] }], + nextCommands: [{ capability: 'turn.submitResponse', input: { chatId: 10, turnId: 100 } }], + }, + }; + } + if (request.id === 'answer-1') { + return { id: request.id, ok: true, output: { response: { ok: true } } }; + } + if (request.id === 'read-2') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', phase: 'grounding', turnId: 100 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A probeable spec tool', options: [] }, + ], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 10 } }], + }, + }; + } + if (request.id === 'ready-2') { + return { + id: request.id, + ok: true, + output: { chatId: 10, specId: 1, state: 'awaiting_response', turnId: 101 }, + }; + } + if (request.id === 'read-3') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A probeable spec tool', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: null, + options: [ + { id: 1, position: 0, content: 'Acceptance criteria' }, + { id: 2, position: 1, content: 'API shape' }, + ], + }, + ], + nextCommands: [{ capability: 'turn.submitResponse', input: { chatId: 10, turnId: 101 } }], + }, + }; + } + if (request.id === 'answer-2') { + return { id: request.id, ok: true, output: { response: { ok: true } } }; + } + if (request.id === 'read-4') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', phase: 'grounding', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A probeable spec tool', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: 'Acceptance criteria', + options: [], + }, + ], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 10 } }], + }, + }; + } + return { id: request.id, ok: false, error: { code: 'unexpected', message: request.id } }; + }, + }; + + const result = await runScriptedProbe({ + transport, + scenario: { name: 'proof', specName: 'Probe proof' }, + scriptedAnswers: ['A probeable spec tool'], + }); + + expect(requests.map((request) => request.capability)).toEqual([ + 'spec.create', + 'chat.getPrimary', + 'chat.ensureReady', + 'chat.read', + 'turn.submitResponse', + 'chat.read', + 'chat.ensureReady', + 'chat.read', + 'turn.submitResponse', + 'chat.read', + ]); + expect(requests[4]).toMatchObject({ + id: 'answer-1', + input: { chatId: 10, turnId: 100, response: { kind: 'free-text', freeText: 'A probeable spec tool' } }, + }); + expect(requests[8]).toMatchObject({ + id: 'answer-2', + input: { chatId: 10, turnId: 101, response: { kind: 'select-options', positions: [0] } }, + }); + expect(result.summary).toMatchObject({ turnsAnswered: 2, finalFrontierState: 'answered' }); + expect(result.errors).toEqual([]); + }); + + it('stops scripted probing after an explicit one-turn budget', async () => { + const requests: ProbeJsonlRequest[] = []; + + const result = await runScriptedProbe({ + transport: { + async send(request) { + requests.push(request); + return getFakeAgentResponse(request); + }, + }, + scenario: { name: 'one-turn', specName: 'One turn proof' }, + scriptedAnswers: ['A one-turn answer'], + turnBudget: 1, + }); + + expect(result.summary).toMatchObject({ turnsAnswered: 1, finalFrontierState: 'answered' }); + expect(requests.map((request) => request.id)).toEqual([ + 'create', + 'primary', + 'ready-1', + 'read-1', + 'answer-1', + 'read-2', + ]); + }); + + it('can answer turns through an injected response policy', async () => { + const policyInputs: Array<{ activeTurnId: number; priorAnswerCount: number; brief: string | undefined }> = + []; + const transport = createScriptedSuccessTransport(); + + const result = await runScriptedProbe({ + transport, + scenario: { name: 'policy-proof', specName: 'Policy proof', brief: 'answer like a user' }, + scriptedAnswers: [], + responsePolicy(input) { + policyInputs.push({ + activeTurnId: input.activeTurn.id, + priorAnswerCount: input.priorAnsweredTurns.length, + brief: input.scenario.brief, + }); + if (input.activeTurn.options?.[0]) { + return { kind: 'select-options', positions: [input.activeTurn.options[0].position] }; + } + return { kind: 'free-text', freeText: `Policy response to ${input.activeTurn.question}` }; + }, + }); + + expect(policyInputs).toEqual([ + { activeTurnId: 100, priorAnswerCount: 0, brief: 'answer like a user' }, + { activeTurnId: 101, priorAnswerCount: 1, brief: 'answer like a user' }, + ]); + expect(result.summary).toMatchObject({ turnsAnswered: 2, finalFrontierState: 'answered' }); + expect(result.errors).toEqual([]); + }); + + it('returns structured probe errors when the response policy fails', async () => { + const result = await runScriptedProbe({ + transport: createScriptedSuccessTransport(), + scenario: { name: 'policy-failure', specName: 'Policy failure proof' }, + scriptedAnswers: [], + responsePolicy() { + throw new Error('Simulated user could not answer\nwith stack details'); + }, + }); + + expect(result.summary.turnsAnswered).toBe(0); + expect(result.errors).toEqual([ + { + requestId: 'policy-1', + capability: 'probe.responsePolicy', + code: 'policy_failed', + message: 'Simulated user could not answer', + }, + ]); + }); + + it('uses a process JSONL transport to write requests and parse responses', async () => { + const written: string[] = []; + let onStdoutData: ((chunk: string) => void) | null = null; + const process: SpawnedJsonlProcess = { + writeStdin(line) { + written.push(line); + const request = JSON.parse(line) as ProbeJsonlRequest; + onStdoutData?.( + `${JSON.stringify({ id: request.id, ok: true, output: { echoed: request.capability } })}\n`, + ); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; + + const transport = createProcessJsonlTransport(process); + const response = await transport.send({ + id: 'create', + capability: 'spec.create', + input: { name: 'Probe' }, + }); + + expect(written).toEqual([ + JSON.stringify({ id: 'create', capability: 'spec.create', input: { name: 'Probe' } }), + ]); + expect(response).toEqual({ id: 'create', ok: true, output: { echoed: 'spec.create' } }); + }); + + it('settles a pending process JSONL request when the child emits an id:null protocol error', async () => { + let onStdoutData: ((chunk: string) => void) | null = null; + const process: SpawnedJsonlProcess = { + writeStdin() { + onStdoutData?.( + `${JSON.stringify({ + id: null, + ok: false, + error: { code: 'bad_request', message: 'Malformed request envelope' }, + })}\n`, + ); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; + + const transport = createProcessJsonlTransport(process); + const response = await expectSettledJsonlResponse( + transport.send({ id: 'create', capability: 'spec.create', input: { name: 'Probe' } }), + ); + + expect(response).toEqual({ + id: 'create', + ok: false, + error: { code: 'protocol_error', message: 'Unmatched id:null response: Malformed request envelope' }, + }); + }); + + it('settles a pending process JSONL request when the child emits malformed JSON', async () => { + let onStdoutData: ((chunk: string) => void) | null = null; + const process: SpawnedJsonlProcess = { + writeStdin() { + onStdoutData?.('{not-json}\n'); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; + + const transport = createProcessJsonlTransport(process); + const response = await expectSettledJsonlResponse( + transport.send({ id: 'create', capability: 'spec.create', input: { name: 'Probe' } }), + ); + + expect(response).toEqual({ + id: 'create', + ok: false, + error: { code: 'malformed_json', message: 'Malformed JSONL response from child process' }, + }); + }); + + it('settles pending process JSONL requests when the child process exits', async () => { + let onExit: ((code: number | null) => void) | null = null; + const process: SpawnedJsonlProcess = { + writeStdin() { + onExit?.(17); + }, + endStdin() {}, + onStdoutData() {}, + onExit(listener) { + onExit = listener; + }, + }; + + const transport = createProcessJsonlTransport(process); + const response = await expectSettledJsonlResponse( + transport.send({ id: 'create', capability: 'spec.create', input: { name: 'Probe' } }), + ); + + expect(response).toEqual({ + id: 'create', + ok: false, + error: { code: 'process_exit', message: 'JSONL child process exited with code 17' }, + }); + }); + + it('settles pending process JSONL requests when the child never responds before timeout', async () => { + const process: SpawnedJsonlProcess = { + writeStdin() {}, + endStdin() {}, + onStdoutData() {}, + }; + const transportFactory = createProcessJsonlTransport as ( + process: SpawnedJsonlProcess, + options: { requestTimeoutMs: number }, + ) => JsonlTransport; + + const transport = transportFactory(process, { requestTimeoutMs: 1 }); + const response = await expectSettledJsonlResponse( + transport.send({ id: 'create', capability: 'spec.create', input: { name: 'Probe' } }), + 50, + ); + + expect(response).toEqual({ + id: 'create', + ok: false, + error: { code: 'request_timeout', message: 'JSONL child process did not respond within 1ms' }, + }); + }); + + it('passes an explicit one-turn budget through process-backed probes', async () => { + const outputDir = makeTempDir('brunch-probe-output-'); + + const result = await runProcessBackedProbe({ + scenario: { name: 'process-one-turn', specName: 'Process one turn' }, + scriptedAnswers: ['A one-turn process probe'], + outputDir, + turnBudget: 1, + spawnProcess() { + return createFakeAgentProcess(); + }, + }); + + const summary = JSON.parse(readFileSync(join(outputDir, 'summary.json'), 'utf8')) as unknown; + const bundle = JSON.parse(readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8')) as unknown; + + expect(result.summary).toMatchObject({ turnsAnswered: 1, finalFrontierState: 'answered' }); + expect(summary).toMatchObject({ turnsAnswered: 1, finalFrontierState: 'answered' }); + expect(bundle).toMatchObject({ + commandSequence: [ + 'spec.create', + 'chat.getPrimary', + 'chat.ensureReady', + 'chat.read', + 'turn.submitResponse', + 'chat.read', + ], + }); + }); + + it('creates an isolated workspace and writes minimal probe artifacts outside .brunch', async () => { + const outputDir = makeTempDir('brunch-probe-output-'); + const spawnedCwds: string[] = []; + + const result = await runProcessBackedProbe({ + scenario: { name: 'process-proof', specName: 'Process proof' }, + scriptedAnswers: ['A temp-workspace probe'], + outputDir, + spawnProcess({ cwd }) { + spawnedCwds.push(cwd); + return createFakeAgentProcess(); + }, + }); + + expect(result.summary).toMatchObject({ turnsAnswered: 2, finalFrontierState: 'answered' }); + expect(result.workspaceCwd).toBe(spawnedCwds[0]); + expect(spawnedCwds).toHaveLength(1); + expect(spawnedCwds[0]).toContain('brunch-probe-workspace-'); + expect(outputDir).not.toContain(`${spawnedCwds[0]}/.brunch`); + + const rawJsonl = readFileSync(join(outputDir, 'raw-jsonl.ndjson'), 'utf8'); + const finalChat = JSON.parse(readFileSync(join(outputDir, 'final-chat.json'), 'utf8')) as unknown; + const summary = JSON.parse(readFileSync(join(outputDir, 'summary.json'), 'utf8')) as unknown; + const bundle = JSON.parse(readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8')) as unknown; + + expect(rawJsonl).toContain('"direction":"request"'); + expect(rawJsonl).toContain('"direction":"response"'); + expect(finalChat).toMatchObject({ frontier: { state: 'answered' } }); + expect(summary).toMatchObject({ + turnsAnswered: 2, + finalFrontierState: 'answered', + questionAnswers: [ + { question: 'What are you building?', answer: 'A temp-workspace probe' }, + { question: 'What should be specified first?', answer: 'Acceptance criteria' }, + ], + }); + expect(bundle).toMatchObject({ + schemaVersion: 1, + scenario: { name: 'process-proof', brief: null }, + commandSequence: expect.arrayContaining(['spec.create', 'chat.getPrimary', 'chat.ensureReady']), + environment: { platform: process.platform, arch: process.arch }, + workspace: { cwd: spawnedCwds[0], preservedStatePath: null }, + }); + expect(existsSync(join(outputDir, 'workspace-state'))).toBe(false); + expect(existsSync(result.workspaceCwd ?? '')).toBe(false); + }); + + it('cleans up the temp workspace when process startup fails', async () => { + const outputDir = makeTempDir('brunch-probe-output-'); + let workspaceCwd: string | null = null; + + await expect( + runProcessBackedProbe({ + scenario: { name: 'startup-failure', specName: 'Startup failure' }, + scriptedAnswers: [], + outputDir, + spawnProcess({ cwd }) { + workspaceCwd = cwd; + throw new Error('spawn failed'); + }, + }), + ).rejects.toThrow('spawn failed'); + + expect(workspaceCwd).toContain('brunch-probe-workspace-'); + expect(existsSync(workspaceCwd ?? '')).toBe(false); + }); + + it('writes sanitized process-backed failure artifacts when JSONL protocol interaction fails', async () => { + const outputDir = makeTempDir('brunch-probe-output-'); + + const result = await runProcessBackedProbe({ + scenario: { name: 'process-protocol-failure', specName: 'Process protocol failure' }, + scriptedAnswers: [], + outputDir, + spawnProcess() { + let onStdoutData: ((chunk: string) => void) | null = null; + return { + writeStdin() { + onStdoutData?.( + `${JSON.stringify({ + id: null, + ok: false, + error: { code: 'bad_request', message: 'ANTHROPIC_API_KEY=sk-secret bad envelope' }, + })}\n`, + ); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; + }, + }); + + const summary = JSON.parse(readFileSync(join(outputDir, 'summary.json'), 'utf8')) as unknown; + const artifactBundle = readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8'); + const bundle = JSON.parse(artifactBundle) as unknown; + const rawJsonl = readFileSync(join(outputDir, 'raw-jsonl.ndjson'), 'utf8'); + + expect(result.summary.turnsAnswered).toBe(0); + expect(result.responses).toEqual([ + { + id: 'create', + ok: false, + error: { + code: 'protocol_error', + message: 'Unmatched id:null response: ANTHROPIC_API_KEY=[redacted] bad envelope', + }, + }, + ]); + expect(result.errors).toEqual([ + { + requestId: 'create', + capability: 'spec.create', + code: 'protocol_error', + message: 'Unmatched id:null response: ANTHROPIC_API_KEY=[redacted] bad envelope', + }, + ]); + expect(summary).toMatchObject({ + turnsAnswered: 0, + errors: [ + { + requestId: 'create', + capability: 'spec.create', + code: 'protocol_error', + message: 'Unmatched id:null response: ANTHROPIC_API_KEY=[redacted] bad envelope', + }, + ], + }); + expect(bundle).toMatchObject({ + commandSequence: ['spec.create'], + errors: [ + { + requestId: 'create', + capability: 'spec.create', + code: 'protocol_error', + message: 'Unmatched id:null response: ANTHROPIC_API_KEY=[redacted] bad envelope', + }, + ], + }); + expect(rawJsonl).toContain('"direction":"request"'); + expect(rawJsonl).toContain('"direction":"response"'); + expect(rawJsonl).not.toContain('sk-secret'); + expect(artifactBundle).not.toContain('sk-secret'); + }); + + it('can preserve the temp workspace .brunch state into the artifact directory', async () => { + const outputDir = makeTempDir('brunch-probe-output-'); + let liveWorkspaceDbPath: string | null = null; + + const result = await runProcessBackedProbe({ + scenario: { name: 'preserve-fixture', specName: 'Preserve fixture proof' }, + scriptedAnswers: ['A fixture candidate'], + outputDir, + preserveWorkspaceState: true, + spawnProcess({ cwd }) { + const brunchDir = join(cwd, '.brunch'); + mkdirSync(brunchDir); + liveWorkspaceDbPath = join(brunchDir, 'brunch.db'); + writeFileSync(liveWorkspaceDbPath, 'sqlite fixture bytes'); + return createFakeAgentProcess(); + }, + }); + + const preservedDbPath = join(outputDir, 'workspace-state', '.brunch', 'brunch.db'); + const bundle = JSON.parse(readFileSync(join(outputDir, 'artifact-bundle.json'), 'utf8')) as unknown; + + expect(result.workspaceCwd).not.toBeNull(); + expect(result.workspaceCwd).not.toContain(outputDir); + expect(result.preservedWorkspaceStatePath).toBe(join(outputDir, 'workspace-state')); + expect(preservedDbPath).not.toBe(liveWorkspaceDbPath); + expect(readFileSync(preservedDbPath, 'utf8')).toBe('sqlite fixture bytes'); + rmSync(result.workspaceCwd ?? '', { recursive: true, force: true }); + expect(readFileSync(preservedDbPath, 'utf8')).toBe('sqlite fixture bytes'); + expect(bundle).toMatchObject({ + workspace: { cwd: result.workspaceCwd, preservedStatePath: join(outputDir, 'workspace-state') }, + }); + }); + + it('redacts secret-like failure summaries without provider stack dumps', async () => { + const transport: JsonlTransport = { + async send(request) { + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + return { + id: request.id, + ok: false, + error: { + code: 'handler_failed', + message: + 'Provider failed with ANTHROPIC_API_KEY=sk-ant-secret-value\n at internal/provider.ts:1', + }, + }; + }, + }; + + const result = await runScriptedProbe({ + transport, + scenario: { name: 'redaction', specName: 'Redaction proof', brief: 'check safe artifacts' }, + scriptedAnswers: [], + }); + + expect(result.errors).toEqual([ + { + requestId: 'primary', + capability: 'chat.getPrimary', + code: 'handler_failed', + message: 'Provider failed with ANTHROPIC_API_KEY=[redacted]', + }, + ]); + expect(result.summary).toMatchObject({ + errors: [ + { + requestId: 'primary', + capability: 'chat.getPrimary', + code: 'handler_failed', + message: 'Provider failed with ANTHROPIC_API_KEY=[redacted]', + }, + ], + }); + }); + + it('guards the agent-probes import boundary from server mutation authority modules', () => { + const sources = readdirSync(new URL('.', import.meta.url)) + .filter((fileName) => fileName.endsWith('.ts') && !fileName.endsWith('.test.ts')) + .map((fileName) => readFileSync(new URL(`./${fileName}`, import.meta.url), 'utf8')); + const forbiddenImports = [ + '@/server/db', + '@/server/capabilities', + '@/server/capability-registry', + '@/server/schema', + '@/server/core', + '@/server/chat-route-transition', + '@/server/turn-response-transition', + '../../src/server/db', + '../../src/server/capabilities', + '../../src/server/capability-registry', + '../../src/server/schema', + '../../src/server/core', + '../../src/server/chat-route-transition', + '../../src/server/turn-response-transition', + ]; + + for (const source of sources) { + for (const forbiddenImport of forbiddenImports) { + expect(source).not.toContain(`from '${forbiddenImport}`); + expect(source).not.toContain(`from "${forbiddenImport}`); + } + } + }); + + it('returns structured errors from failed JSONL responses', async () => { + const transport: JsonlTransport = { + async send(request) { + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + return { + id: request.id, + ok: false, + error: { code: 'handler_failed', message: 'Chat 10 not found' }, + }; + }, + }; + + const result = await runScriptedProbe({ + transport, + scenario: { name: 'failure', specName: 'Failure proof' }, + scriptedAnswers: [], + }); + + expect(result.summary.turnsAnswered).toBe(0); + expect(result.errors).toEqual([ + { + requestId: 'primary', + capability: 'chat.getPrimary', + code: 'handler_failed', + message: 'Chat 10 not found', + }, + ]); + }); +}); + +async function expectSettledJsonlResponse( + response: Promise, + timeoutMs = 20, +): Promise { + const timeout = new Promise<{ timedOut: true }>((resolve) => { + setTimeout(() => resolve({ timedOut: true }), timeoutMs); + }); + const settled = await Promise.race([response, timeout]); + + expect(settled).not.toEqual({ timedOut: true }); + return settled as ProbeJsonlResponse; +} + +function createScriptedSuccessTransport(): JsonlTransport { + return { + async send(request) { + return getFakeAgentResponse(request); + }, + }; +} + +function createFakeAgentProcess(): SpawnedJsonlProcess { + let onStdoutData: ((chunk: string) => void) | null = null; + + return { + writeStdin(line) { + const request = JSON.parse(line) as ProbeJsonlRequest; + const response = getFakeAgentResponse(request); + onStdoutData?.(`${JSON.stringify(response)}\n`); + }, + endStdin() {}, + onStdoutData(listener) { + onStdoutData = listener; + }, + }; +} + +function getFakeAgentResponse(request: ProbeJsonlRequest): ProbeJsonlResponse { + if (request.capability === 'spec.create') { + return { id: request.id, ok: true, output: { specId: 1 } }; + } + if (request.capability === 'chat.getPrimary') { + return { + id: request.id, + ok: true, + output: { specId: 1, chatId: 10, kind: 'interview', activeTurnId: null }, + }; + } + if (request.capability === 'chat.ensureReady') { + const turnId = request.id === 'ready-1' ? 100 : 101; + return { + id: request.id, + ok: true, + output: { chatId: 10, specId: 1, state: 'awaiting_response', turnId }, + }; + } + if (request.id === 'read-1') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: 100 }, + turns: [{ id: 100, question: 'What are you building?', answer: null, options: [] }], + nextCommands: [{ capability: 'turn.submitResponse', input: { chatId: 10, turnId: 100 } }], + }, + }; + } + if (request.id === 'read-2') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', phase: 'grounding', turnId: 100 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A temp-workspace probe', options: [] }, + ], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 10 } }], + }, + }; + } + if (request.id === 'read-3') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A temp-workspace probe', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: null, + options: [{ id: 1, position: 0, content: 'Acceptance criteria' }], + }, + ], + nextCommands: [{ capability: 'turn.submitResponse', input: { chatId: 10, turnId: 101 } }], + }, + }; + } + if (request.id === 'read-4') { + return { + id: request.id, + ok: true, + output: { + frontier: { state: 'answered', phase: 'grounding', turnId: 101 }, + turns: [ + { id: 100, question: 'What are you building?', answer: 'A temp-workspace probe', options: [] }, + { + id: 101, + question: 'What should be specified first?', + answer: 'Acceptance criteria', + options: [], + }, + ], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 10 } }], + }, + }; + } + return { id: request.id, ok: true, output: { response: { ok: true } } }; +} diff --git a/scripts/agent-probes/probe-runner.ts b/scripts/agent-probes/probe-runner.ts new file mode 100644 index 00000000..35f45393 --- /dev/null +++ b/scripts/agent-probes/probe-runner.ts @@ -0,0 +1,601 @@ +import { spawn } from 'node:child_process'; +import { cpSync, existsSync, mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join, resolve } from 'node:path'; + +export interface ProbeJsonlRequest { + id: string; + capability: string; + input?: unknown; +} + +export type ProbeJsonlResponse = + | { id: string; ok: true; output: unknown } + | { id: string | null; ok: false; error: { code: string; message: string } }; + +export interface JsonlTransport { + send(request: ProbeJsonlRequest): Promise; +} + +export interface SpawnedJsonlProcess { + writeStdin(line: string): void; + endStdin(): void; + onStdoutData(listener: (chunk: string) => void): void; + onStderrData?(listener: (chunk: string) => void): void; + onExit?(listener: (code: number | null) => void): void; +} + +export interface ProbeProcessSpawnOptions { + cwd: string; + command: string; + args: string[]; + env: NodeJS.ProcessEnv; +} + +export type ProbeProcessSpawner = (options: ProbeProcessSpawnOptions) => SpawnedJsonlProcess; + +export interface ScriptedProbeScenario { + name: string; + specName: string; + brief?: string; +} + +export interface ProbeRunError { + requestId: string; + capability: string; + code: string; + message: string; +} + +export interface ProbeQuestionAnswer { + question: string; + answer: string; +} + +export interface ProbeRunSummary { + turnsAnswered: number; + finalFrontierState: string | null; + durationMs: number; + questionAnswers: ProbeQuestionAnswer[]; + errors: ProbeRunError[]; +} + +export interface SimulatedUserEvent { + turnId: number; + prompt: string; + rawModelOutput: string; + parsedResponse: ProbeTurnResponse | null; + status: 'parsed' | 'failed'; + error: string | null; +} + +export interface ProbeArtifactBundle { + schemaVersion: 1; + scenario: { name: string; brief: string | null; specName: string }; + workspace: { cwd: string | null; preservedStatePath: string | null }; + commandSequence: string[]; + rawJsonlTranscript: Array<{ + direction: 'request' | 'response'; + payload: ProbeJsonlRequest | ProbeJsonlResponse | null; + }>; + parsedEvents: Array<{ index: number; request: ProbeJsonlRequest; response: ProbeJsonlResponse | null }>; + finalChat: AgentChatReadProjection | null; + summary: ProbeRunSummary; + errors: ProbeRunError[]; + simulatedUserEvents: SimulatedUserEvent[]; + environment: { nodeVersion: string; platform: NodeJS.Platform; arch: string }; +} + +export interface ProbeRunResult { + scenario: ScriptedProbeScenario; + workspaceCwd: string | null; + preservedWorkspaceStatePath: string | null; + requests: ProbeJsonlRequest[]; + responses: ProbeJsonlResponse[]; + finalChat: AgentChatReadProjection | null; + summary: ProbeRunSummary; + errors: ProbeRunError[]; + simulatedUserEvents: SimulatedUserEvent[]; +} + +interface SpecCreateOutput { + specId: number; +} + +interface ChatGetPrimaryOutput { + chatId: number; +} + +export interface AgentChatReadProjection { + frontier: { state: string; turnId: number | null }; + turns: AgentChatTurn[]; + nextCommands?: AgentNextCommand[]; +} + +export interface AgentChatTurn { + id: number; + question: string; + answer: string | null; + options?: AgentTurnOption[]; +} + +export interface AgentTurnOption { + position: number; + content: string; +} + +export interface AgentNextCommand { + capability: string; + input?: unknown; +} + +export type ProbeTurnResponse = + | { kind: 'free-text'; freeText: string } + | { kind: 'select-options'; positions: number[] }; + +export interface ProbeResponsePolicyInput { + scenario: ScriptedProbeScenario; + chat: AgentChatReadProjection; + activeTurn: AgentChatTurn; + priorAnsweredTurns: AgentChatTurn[]; + turnIndex: number; +} + +export type ProbeResponsePolicy = ( + input: ProbeResponsePolicyInput, +) => ProbeTurnResponse | Promise; + +interface RunScriptedProbeOptions { + transport: JsonlTransport; + scenario: ScriptedProbeScenario; + scriptedAnswers: string[]; + responsePolicy?: ProbeResponsePolicy; + simulatedUserEvents?: SimulatedUserEvent[]; + turnBudget?: number; +} + +export interface ProcessBackedProbeOptions { + scenario: ScriptedProbeScenario; + scriptedAnswers: string[]; + outputDir: string; + spawnProcess?: ProbeProcessSpawner; + command?: string; + args?: string[]; + env?: NodeJS.ProcessEnv; + preserveWorkspaceState?: boolean; + responsePolicy?: ProbeResponsePolicy; + simulatedUserEvents?: SimulatedUserEvent[]; + turnBudget?: number; +} + +export async function runProcessBackedProbe({ + scenario, + scriptedAnswers, + outputDir, + spawnProcess = spawnBrunchAgentProcess, + command = process.execPath, + args = [resolve('bin/brunch.js'), 'agent'], + env = process.env, + preserveWorkspaceState = false, + responsePolicy, + simulatedUserEvents, + turnBudget, +}: ProcessBackedProbeOptions): Promise { + const workspaceCwd = mkdtempSync(join(tmpdir(), 'brunch-probe-workspace-')); + let spawned: SpawnedJsonlProcess | null = null; + + try { + spawned = spawnProcess({ cwd: workspaceCwd, command, args, env }); + const transport = createProcessJsonlTransport(spawned); + const result = await runScriptedProbe({ + transport, + scenario, + scriptedAnswers, + responsePolicy, + simulatedUserEvents, + turnBudget, + }); + result.workspaceCwd = workspaceCwd; + if (preserveWorkspaceState) { + result.preservedWorkspaceStatePath = copyWorkspaceState({ workspaceCwd, outputDir }); + } + writeProbeArtifacts(outputDir, result); + return result; + } finally { + spawned?.endStdin(); + rmSync(workspaceCwd, { recursive: true, force: true }); + } +} + +export function createProcessJsonlTransport( + process: SpawnedJsonlProcess, + { requestTimeoutMs = 30_000 }: { requestTimeoutMs?: number } = {}, +): JsonlTransport { + let buffer = ''; + const pending = new Map< + string, + { resolveResponse: (response: ProbeJsonlResponse) => void; timeout: ReturnType } + >(); + + function settle(requestId: string, response: ProbeJsonlResponse): void { + const pendingRequest = pending.get(requestId); + if (!pendingRequest) { + return; + } + clearTimeout(pendingRequest.timeout); + pending.delete(requestId); + pendingRequest.resolveResponse(response); + } + + function settleAll(error: { code: string; message: string }): void { + for (const requestId of Array.from(pending.keys())) { + settle(requestId, { id: requestId, ok: false, error }); + } + } + + process.onStdoutData((chunk) => { + buffer += chunk; + let newlineIndex = buffer.indexOf('\n'); + while (newlineIndex >= 0) { + const line = buffer.slice(0, newlineIndex).trim(); + buffer = buffer.slice(newlineIndex + 1); + if (line !== '') { + let response: ProbeJsonlResponse; + try { + response = JSON.parse(line) as ProbeJsonlResponse; + } catch { + settleAll({ code: 'malformed_json', message: 'Malformed JSONL response from child process' }); + newlineIndex = buffer.indexOf('\n'); + continue; + } + + if (response.id === null) { + const message = response.ok + ? 'Unmatched id:null response' + : `Unmatched id:null response: ${response.error.message}`; + settleAll({ code: 'protocol_error', message }); + } else { + settle(response.id, response); + } + } + newlineIndex = buffer.indexOf('\n'); + } + }); + + process.onStderrData?.((chunk) => { + const message = chunk.trim().split('\n')[0] || 'JSONL child process wrote to stderr'; + settleAll({ code: 'process_stderr', message }); + }); + + process.onExit?.((code) => { + settleAll({ code: 'process_exit', message: `JSONL child process exited with code ${code ?? 'null'}` }); + }); + + return { + send(request) { + return new Promise((resolveResponse) => { + const timeout = setTimeout(() => { + settle(request.id, { + id: request.id, + ok: false, + error: { + code: 'request_timeout', + message: `JSONL child process did not respond within ${requestTimeoutMs}ms`, + }, + }); + }, requestTimeoutMs); + pending.set(request.id, { resolveResponse, timeout }); + try { + process.writeStdin(JSON.stringify(request)); + } catch (error) { + settle(request.id, { + id: request.id, + ok: false, + error: { + code: 'stdin_write_failed', + message: error instanceof Error ? error.message : String(error), + }, + }); + } + }); + }, + }; +} + +export async function runScriptedProbe({ + transport, + scenario, + scriptedAnswers, + responsePolicy = createScriptedResponsePolicy(scriptedAnswers), + simulatedUserEvents = [], + turnBudget = 2, +}: RunScriptedProbeOptions): Promise { + const startedAt = Date.now(); + const state: ProbeRunResult = { + scenario, + workspaceCwd: null, + preservedWorkspaceStatePath: null, + requests: [], + responses: [], + finalChat: null, + summary: { turnsAnswered: 0, finalFrontierState: null, durationMs: 0, questionAnswers: [], errors: [] }, + errors: [], + simulatedUserEvents, + }; + + const created = await sendExpectingOutput(state, transport, { + id: 'create', + capability: 'spec.create', + input: { name: scenario.specName }, + }); + if (!created) { + return finishRun(state, startedAt); + } + + const primary = await sendExpectingOutput(state, transport, { + id: 'primary', + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + if (!primary) { + return finishRun(state, startedAt); + } + + for (let turnIndex = 0; turnIndex < turnBudget; turnIndex += 1) { + const ready = await sendExpectingOutput(state, transport, { + id: `ready-${turnIndex + 1}`, + capability: 'chat.ensureReady', + input: { chatId: primary.chatId }, + }); + if (!ready) { + return finishRun(state, startedAt); + } + + const readyRead = await sendExpectingOutput(state, transport, { + id: `read-${turnIndex * 2 + 1}`, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }); + if (!readyRead) { + return finishRun(state, startedAt); + } + state.finalChat = readyRead; + state.summary.finalFrontierState = readyRead.frontier.state; + + const activeTurn = getActiveTurn(readyRead); + if (!activeTurn) { + state.errors.push({ + requestId: `read-${turnIndex * 2 + 1}`, + capability: 'chat.read', + code: 'no_answerable_turn', + message: 'chat.read did not expose an awaiting-response frontier turn', + }); + return finishRun(state, startedAt); + } + + const policyResponse = await getPolicyResponse(state, responsePolicy, { + scenario, + chat: readyRead, + activeTurn, + priorAnsweredTurns: readyRead.turns.filter((turn) => turn.answer !== null), + turnIndex, + }); + if (!policyResponse) { + return finishRun(state, startedAt); + } + + const submit = await sendExpectingOutput(state, transport, { + id: `answer-${turnIndex + 1}`, + capability: 'turn.submitResponse', + input: { + chatId: primary.chatId, + turnId: activeTurn.id, + response: policyResponse, + }, + }); + if (!submit) { + return finishRun(state, startedAt); + } + state.summary.turnsAnswered += 1; + + const afterAnswerRead = await sendExpectingOutput(state, transport, { + id: `read-${turnIndex * 2 + 2}`, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }); + if (!afterAnswerRead) { + return finishRun(state, startedAt); + } + state.finalChat = afterAnswerRead; + state.summary.finalFrontierState = afterAnswerRead.frontier.state; + } + + return finishRun(state, startedAt); +} + +async function sendExpectingOutput( + state: ProbeRunResult, + transport: JsonlTransport, + request: ProbeJsonlRequest, +): Promise { + state.requests.push(request); + const response = sanitizeProbeJsonlResponse(await transport.send(request)); + state.responses.push(response); + + if (!response.ok) { + state.errors.push({ + requestId: request.id, + capability: request.capability, + code: response.error.code, + message: sanitizeProbeErrorMessage(response.error.message), + }); + return null; + } + + return response.output as T; +} + +function getActiveTurn(read: AgentChatReadProjection): AgentChatTurn | null { + if (read.frontier.state !== 'awaiting_response' || read.frontier.turnId === null) { + return null; + } + return read.turns.find((turn) => turn.id === read.frontier.turnId) ?? null; +} + +function createScriptedResponsePolicy(scriptedAnswers: string[]): ProbeResponsePolicy { + return ({ activeTurn, turnIndex }) => buildScriptedResponse(activeTurn, scriptedAnswers[turnIndex]); +} + +function buildScriptedResponse(turn: AgentChatTurn, scriptedAnswer: string | undefined): ProbeTurnResponse { + const firstOption = turn.options?.[0]; + if (firstOption) { + return { kind: 'select-options', positions: [firstOption.position] }; + } + + return { + kind: 'free-text', + freeText: scriptedAnswer?.trim() || `Scripted response to: ${turn.question}`, + }; +} + +async function getPolicyResponse( + state: ProbeRunResult, + responsePolicy: ProbeResponsePolicy, + input: ProbeResponsePolicyInput, +): Promise { + try { + return await responsePolicy(input); + } catch (error) { + state.errors.push({ + requestId: `policy-${input.turnIndex + 1}`, + capability: 'probe.responsePolicy', + code: 'policy_failed', + message: sanitizeProbeErrorMessage(error instanceof Error ? error.message : String(error)), + }); + return null; + } +} + +function finishRun(state: ProbeRunResult, startedAt: number): ProbeRunResult { + state.summary.durationMs = Date.now() - startedAt; + state.summary.errors = state.errors; + state.summary.questionAnswers = extractQuestionAnswers(state.finalChat); + return state; +} + +function extractQuestionAnswers(finalChat: AgentChatReadProjection | null): ProbeQuestionAnswer[] { + return ( + finalChat?.turns + .filter((turn) => turn.answer !== null) + .map((turn) => ({ question: turn.question, answer: turn.answer ?? '' })) ?? [] + ); +} + +function sanitizeProbeErrorMessage(message: string): string { + return message + .split('\n')[0] + .replace(/(ANTHROPIC_API_KEY=)[^\s]+/gi, '$1[redacted]') + .replace(/(OPENAI_API_KEY=)[^\s]+/gi, '$1[redacted]') + .replace(/sk-[a-z0-9_-]+/gi, '[redacted]') + .slice(0, 300); +} + +export function buildProbeArtifactBundle(result: ProbeRunResult): ProbeArtifactBundle { + const rawJsonlTranscript = result.requests.flatMap((request, index) => [ + { direction: 'request' as const, payload: request }, + { direction: 'response' as const, payload: sanitizeJsonlResponse(result.responses[index] ?? null) }, + ]); + + return { + schemaVersion: 1, + scenario: { + name: result.scenario.name, + brief: result.scenario.brief ?? null, + specName: result.scenario.specName, + }, + workspace: { + cwd: result.workspaceCwd, + preservedStatePath: result.preservedWorkspaceStatePath, + }, + commandSequence: result.requests.map((request) => request.capability), + rawJsonlTranscript, + parsedEvents: result.requests.map((request, index) => ({ + index, + request, + response: sanitizeJsonlResponse(result.responses[index] ?? null), + })), + finalChat: result.finalChat, + summary: result.summary, + errors: result.errors, + simulatedUserEvents: result.simulatedUserEvents, + environment: { nodeVersion: process.version, platform: process.platform, arch: process.arch }, + }; +} + +function sanitizeProbeJsonlResponse(response: ProbeJsonlResponse): ProbeJsonlResponse { + if (response.ok) { + return response; + } + + return { + ...response, + error: { + ...response.error, + message: sanitizeProbeErrorMessage(response.error.message), + }, + }; +} + +function sanitizeJsonlResponse(response: ProbeJsonlResponse | null): ProbeJsonlResponse | null { + return response ? sanitizeProbeJsonlResponse(response) : null; +} + +function writeProbeArtifacts(outputDir: string, result: ProbeRunResult): void { + mkdirSync(outputDir, { recursive: true }); + const bundle = buildProbeArtifactBundle(result); + const rawJsonl = bundle.rawJsonlTranscript.map((entry) => JSON.stringify(entry)).join('\n'); + + writeFileSync(join(outputDir, 'artifact-bundle.json'), `${JSON.stringify(bundle, null, 2)}\n`); + writeFileSync(join(outputDir, 'raw-jsonl.ndjson'), `${rawJsonl}\n`); + writeFileSync(join(outputDir, 'final-chat.json'), `${JSON.stringify(bundle.finalChat, null, 2)}\n`); + writeFileSync(join(outputDir, 'summary.json'), `${JSON.stringify(bundle.summary, null, 2)}\n`); +} + +function copyWorkspaceState({ + workspaceCwd, + outputDir, +}: { + workspaceCwd: string; + outputDir: string; +}): string { + const source = join(workspaceCwd, '.brunch'); + const destination = join(outputDir, 'workspace-state'); + mkdirSync(destination, { recursive: true }); + + if (existsSync(source)) { + cpSync(source, join(destination, '.brunch'), { recursive: true }); + } + + return destination; +} + +function spawnBrunchAgentProcess({ cwd, command, args, env }: ProbeProcessSpawnOptions): SpawnedJsonlProcess { + const child = spawn(command, args, { cwd, env, stdio: ['pipe', 'pipe', 'pipe'] }); + return { + writeStdin(line) { + child.stdin.write(`${line}\n`); + }, + endStdin() { + child.stdin.end(); + }, + onStdoutData(listener) { + child.stdout.on('data', (chunk) => listener(chunk.toString())); + }, + onStderrData(listener) { + child.stderr.on('data', (chunk) => listener(chunk.toString())); + }, + onExit(listener) { + child.on('exit', listener); + }, + }; +} diff --git a/src/client/components/pending-review-section.tsx b/src/client/components/pending-review-section.tsx index 25f3b4d8..7bb3627a 100644 --- a/src/client/components/pending-review-section.tsx +++ b/src/client/components/pending-review-section.tsx @@ -4,17 +4,16 @@ // a per-row Resolve button. Driven by useSpecificationOpenReconciliationNeeds; // returns null when the queue is empty so the parent overlay can skip rendering. // -// V3.1 will add agent grouping (auto-confirm / auto-edit / substantive) and a -// substantive-walk surface; that work expands inside this component without -// affecting the patch-list-overlay's other regions. +// V3.1 adds agent grouping (auto-confirm / auto-edit / substantive), +// per-row agent actions, and bulk resolution while preserving the +// patch-list-overlay's surrounding staged-change regions. // // Card 4 polish: source diff is no longer rendered inline. Each row shows a // "↗ view source diff" chip that opens a . Action buttons shrink // to icon-only ghost (Edit) + small kind-accent solid (Resolve). The inline // edit form reuses the same toolbar contract as ItemEditTextarea (icon-only -// Cancel + small kind-accent Save). Until the listing endpoint is enriched -// with target_item_kind, the row left bar and Resolve fill use a neutral -// amber as a kind-accent fallback (deferred follow-up card). +// Cancel + small kind-accent Save). Rows use target_item_kind when present, +// with neutral amber as the nullable-kind fallback. import { Check, diff --git a/src/server/agent-jsonl.test.ts b/src/server/agent-jsonl.test.ts new file mode 100644 index 00000000..4b658575 --- /dev/null +++ b/src/server/agent-jsonl.test.ts @@ -0,0 +1,290 @@ +import { mkdtempSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { PassThrough } from 'node:stream'; + +import { afterEach, describe, expect, it } from 'vitest'; + +import { runAgentJsonlSession } from './agent-jsonl.js'; +import { createDb, type DB } from './db.js'; + +describe('agent JSONL session', () => { + const tempDirs: string[] = []; + let db: DB | null = null; + + afterEach(() => { + db?.$client.close(); + db = null; + for (const dir of tempDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }); + } + }); + + function createTempDb(): DB { + const dir = mkdtempSync(join(tmpdir(), 'brunch-agent-jsonl-')); + tempDirs.push(dir); + db = createDb(join(dir, 'brunch.db')); + return db; + } + + async function runSession( + lines: string[], + options: Partial< + Pick[0], 'generateAnswerableFrontier' | 'projectCwd'> + > = {}, + ) { + const input = new PassThrough(); + const output = new PassThrough(); + const chunks: string[] = []; + output.on('data', (chunk) => chunks.push(chunk.toString())); + + const session = runAgentJsonlSession({ + db: createTempDb(), + input, + output, + generateAnswerableFrontier: + options.generateAnswerableFrontier ?? + (async () => ({ + question: 'What are you trying to build?', + assistantParts: [{ type: 'text', text: 'What are you trying to build?' }], + })), + projectCwd: options.projectCwd, + }); + for (const line of lines) { + input.write(`${line}\n`); + } + input.end(); + await session; + + return chunks + .join('') + .trim() + .split('\n') + .filter(Boolean) + .map((line) => JSON.parse(line) as unknown); + } + + it('creates and reads a real specification over JSONL without ambient selection', async () => { + const responses = await runSession([ + JSON.stringify({ id: 'create-1', capability: 'spec.create', input: { name: 'JSONL spec' } }), + JSON.stringify({ id: 'read-1', capability: 'spec.getStatus', input: { specId: 1 } }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ + id: 'create-1', + ok: true, + output: expect.objectContaining({ specId: 1 }), + }), + expect.objectContaining({ + id: 'read-1', + ok: true, + output: expect.objectContaining({ + specification: expect.objectContaining({ id: 1, name: 'JSONL spec' }), + }), + }), + ]); + }); + + it('creates and reads the primary chat projection over JSONL using explicit ids', async () => { + const responses = await runSession([ + JSON.stringify({ id: 'create-1', capability: 'spec.create', input: { name: 'JSONL chat spec' } }), + JSON.stringify({ id: 'primary-1', capability: 'chat.getPrimary', input: { specId: 1 } }), + JSON.stringify({ id: 'chat-1', capability: 'chat.read', input: { chatId: 1 } }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ id: 'create-1', ok: true, output: expect.objectContaining({ specId: 1 }) }), + expect.objectContaining({ + id: 'primary-1', + ok: true, + output: { specId: 1, chatId: 1, kind: 'interview', activeTurnId: null }, + }), + expect.objectContaining({ + id: 'chat-1', + ok: true, + output: expect.objectContaining({ + specification: { id: 1, name: 'JSONL chat spec', mode: 'greenfield' }, + chat: { id: 1, specificationId: 1, kind: 'interview', activeTurnId: null }, + frontier: { state: 'idle_no_frontier', phase: 'grounding', turnId: null }, + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 1 } }], + }), + }), + ]); + }); + + it('ensures chat readiness and then reads the active frontier over JSONL', async () => { + const responses = await runSession([ + JSON.stringify({ id: 'create-1', capability: 'spec.create', input: { name: 'JSONL ready spec' } }), + JSON.stringify({ id: 'primary-1', capability: 'chat.getPrimary', input: { specId: 1 } }), + JSON.stringify({ id: 'ready-1', capability: 'chat.ensureReady', input: { chatId: 1 } }), + JSON.stringify({ id: 'chat-1', capability: 'chat.read', input: { chatId: 1 } }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ id: 'create-1', ok: true, output: expect.objectContaining({ specId: 1 }) }), + expect.objectContaining({ id: 'primary-1', ok: true, output: expect.objectContaining({ chatId: 1 }) }), + expect.objectContaining({ + id: 'ready-1', + ok: true, + output: expect.objectContaining({ + chatId: 1, + specId: 1, + state: 'awaiting_response', + turnId: 1, + }), + }), + expect.objectContaining({ + id: 'chat-1', + ok: true, + output: expect.objectContaining({ + chat: { id: 1, specificationId: 1, kind: 'interview', activeTurnId: 1 }, + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: 1 }, + turns: [ + expect.objectContaining({ + id: 1, + phase: 'grounding', + question: expect.stringMatching(/What are you trying to build/), + answer: null, + }), + ], + nextCommands: [{ capability: 'turn.submitResponse', input: { chatId: 1, turnId: 1 } }], + }), + }), + ]); + }); + + it('passes project cwd into brownfield chat readiness generation', async () => { + const generationInputs: Array<{ modeOptions?: unknown }> = []; + + await runSession( + [ + JSON.stringify({ + id: 'create-1', + capability: 'spec.create', + input: { name: 'JSONL brownfield spec', mode: 'brownfield' }, + }), + JSON.stringify({ id: 'ready-1', capability: 'chat.ensureReady', input: { chatId: 1 } }), + ], + { + projectCwd: '/workspace/brunch', + generateAnswerableFrontier: async (input) => { + generationInputs.push({ modeOptions: input.modeOptions }); + return { + question: 'What are you trying to understand?', + assistantParts: [{ type: 'text', text: 'What are you trying to understand?' }], + }; + }, + }, + ); + + expect(generationInputs).toEqual([{ modeOptions: { mode: 'brownfield', cwd: '/workspace/brunch' } }]); + }); + + it('submits a turn response and reads the answered turn over JSONL', async () => { + const responses = await runSession([ + JSON.stringify({ id: 'create-1', capability: 'spec.create', input: { name: 'JSONL response spec' } }), + JSON.stringify({ id: 'primary-1', capability: 'chat.getPrimary', input: { specId: 1 } }), + JSON.stringify({ id: 'ready-1', capability: 'chat.ensureReady', input: { chatId: 1 } }), + JSON.stringify({ + id: 'response-1', + capability: 'turn.submitResponse', + input: { + chatId: 1, + turnId: 1, + response: { kind: 'free-text', freeText: 'A local spec elicitation tool' }, + }, + }), + JSON.stringify({ id: 'chat-1', capability: 'chat.read', input: { chatId: 1 } }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ id: 'create-1', ok: true, output: expect.objectContaining({ specId: 1 }) }), + expect.objectContaining({ id: 'primary-1', ok: true, output: expect.objectContaining({ chatId: 1 }) }), + expect.objectContaining({ id: 'ready-1', ok: true, output: expect.objectContaining({ turnId: 1 }) }), + expect.objectContaining({ + id: 'response-1', + ok: true, + output: expect.objectContaining({ + chatId: 1, + specId: 1, + turnId: 1, + response: { ok: true }, + }), + }), + expect.objectContaining({ + id: 'chat-1', + ok: true, + output: expect.objectContaining({ + frontier: { state: 'answered', phase: 'grounding', turnId: 1 }, + turns: [expect.objectContaining({ id: 1, answer: 'A local spec elicitation tool' })], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: 1 } }], + }), + }), + ]); + }); + + it('returns typed chat read errors without crashing the session', async () => { + const responses = await runSession([ + JSON.stringify({ id: 'missing-chat', capability: 'chat.read', input: { chatId: 999 } }), + JSON.stringify({ id: 'invalid-chat', capability: 'chat.read', input: { chatId: 0 } }), + JSON.stringify({ id: 'missing-ready', capability: 'chat.ensureReady', input: { chatId: 999 } }), + JSON.stringify({ + id: 'create-after-chat-errors', + capability: 'spec.create', + input: { name: 'Still works' }, + }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ + id: 'missing-chat', + ok: false, + error: expect.objectContaining({ code: 'handler_failed' }), + }), + expect.objectContaining({ + id: 'invalid-chat', + ok: false, + error: expect.objectContaining({ code: 'invalid_input' }), + }), + expect.objectContaining({ + id: 'missing-ready', + ok: false, + error: expect.objectContaining({ code: 'handler_failed' }), + }), + expect.objectContaining({ + id: 'create-after-chat-errors', + ok: true, + output: expect.objectContaining({ specId: 1 }), + }), + ]); + }); + + it('returns typed error envelopes and keeps processing after recoverable errors', async () => { + const responses = await runSession([ + '{not json', + JSON.stringify({ id: 'unknown-1', capability: 'spec.delete', input: {} }), + JSON.stringify({ id: 'invalid-1', capability: 'spec.create', input: { name: '' } }), + JSON.stringify({ id: 'create-2', capability: 'spec.create', input: { name: 'Still works' } }), + ]); + + expect(responses).toEqual([ + expect.objectContaining({ + id: null, + ok: false, + error: expect.objectContaining({ code: 'invalid_json' }), + }), + expect.objectContaining({ + id: 'unknown-1', + ok: false, + error: expect.objectContaining({ code: 'unknown_capability' }), + }), + expect.objectContaining({ + id: 'invalid-1', + ok: false, + error: expect.objectContaining({ code: 'invalid_input' }), + }), + expect.objectContaining({ id: 'create-2', ok: true, output: expect.objectContaining({ specId: 1 }) }), + ]); + }); +}); diff --git a/src/server/agent-jsonl.ts b/src/server/agent-jsonl.ts new file mode 100644 index 00000000..5398fa6a --- /dev/null +++ b/src/server/agent-jsonl.ts @@ -0,0 +1,103 @@ +import { createInterface } from 'node:readline/promises'; +import type { Readable, Writable } from 'node:stream'; + +import { z } from 'zod'; + +import { + CapabilityDispatchError, + dispatchCapability, + type GenerateAnswerableFrontier, +} from './capabilities.js'; +import type { DB } from './db.js'; + +const agentJsonlRequestSchema = z.object({ + id: z.string().min(1), + capability: z.string().min(1), + input: z.unknown().optional(), +}); + +export interface AgentJsonlSessionOptions { + db: DB; + input: Readable; + output: Writable; + generateAnswerableFrontier?: GenerateAnswerableFrontier; + projectCwd?: string; +} + +type AgentJsonlResponse = + | { id: string; ok: true; output: unknown } + | { id: string | null; ok: false; error: { code: string; message: string } }; + +function writeResponse(output: Writable, response: AgentJsonlResponse): void { + output.write(`${JSON.stringify(response)}\n`); +} + +function toErrorResponse(id: string | null, code: string, message: string): AgentJsonlResponse { + return { + id, + ok: false, + error: { code, message }, + }; +} + +function getRecoverableErrorCode(error: unknown): string { + if (error instanceof CapabilityDispatchError) { + return error.code; + } + return 'handler_failed'; +} + +export async function runAgentJsonlSession({ + db, + input, + output, + generateAnswerableFrontier, + projectCwd, +}: AgentJsonlSessionOptions): Promise { + const lines = createInterface({ input, crlfDelay: Infinity }); + + for await (const line of lines) { + if (line.trim() === '') { + continue; + } + + let rawRequest: unknown; + try { + rawRequest = JSON.parse(line); + } catch { + writeResponse(output, toErrorResponse(null, 'invalid_json', 'Invalid JSONL request')); + continue; + } + + const parsedRequest = agentJsonlRequestSchema.safeParse(rawRequest); + const requestId = + rawRequest && typeof rawRequest === 'object' && 'id' in rawRequest && typeof rawRequest.id === 'string' + ? rawRequest.id + : null; + + if (!parsedRequest.success) { + writeResponse(output, toErrorResponse(requestId, 'invalid_request', 'Invalid JSONL request envelope')); + continue; + } + + try { + const result = await dispatchCapability({ + db, + capability: parsedRequest.data.capability, + input: parsedRequest.data.input, + generateAnswerableFrontier, + projectCwd, + }); + writeResponse(output, { id: parsedRequest.data.id, ok: true, output: result }); + } catch (error) { + writeResponse( + output, + toErrorResponse( + parsedRequest.data.id, + getRecoverableErrorCode(error), + error instanceof Error ? error.message : 'Capability dispatch failed', + ), + ); + } + } +} diff --git a/src/server/capabilities.test.ts b/src/server/capabilities.test.ts new file mode 100644 index 00000000..b0e84ec3 --- /dev/null +++ b/src/server/capabilities.test.ts @@ -0,0 +1,489 @@ +import { mkdtempSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { eq } from 'drizzle-orm'; +import { afterEach, describe, expect, it, vi } from 'vitest'; + +import { dispatchCapability } from './capabilities.js'; +import { + advanceHead, + createConfirmedPhaseOutcome, + createDb, + createTurn, + getActivePath, + getSpecification, + getTurn, + listSpecifications, + type DB, +} from './db.js'; +import * as schema from './schema.js'; + +describe('agent capabilities', () => { + const tempDirs: string[] = []; + let db: DB | null = null; + + afterEach(() => { + db?.$client.close(); + db = null; + for (const dir of tempDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }); + } + }); + + function createTempDb(): DB { + const dir = mkdtempSync(join(tmpdir(), 'brunch-capabilities-')); + tempDirs.push(dir); + db = createDb(join(dir, 'brunch.db')); + return db; + } + + it('dispatches spec.create through a Brunch-owned handler', async () => { + const result = await dispatchCapability({ + db: createTempDb(), + capability: 'spec.create', + input: { name: 'Agent-made spec' }, + }); + + expect(result).toMatchObject({ + specId: expect.any(Number), + specification: expect.objectContaining({ name: 'Agent-made spec' }), + }); + expect(listSpecifications(db!)).toHaveLength(1); + }); + + it('dispatches spec.getStatus using an explicit spec id', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Readable spec' }, + }); + + const result = await dispatchCapability({ + db: activeDb, + capability: 'spec.getStatus', + input: { specId: created.specId }, + }); + + expect(result).toMatchObject({ + specification: expect.objectContaining({ id: created.specId, name: 'Readable spec' }), + workflow: expect.objectContaining({ + phases: expect.objectContaining({ grounding: expect.any(Object) }), + }), + }); + }); + + it('dispatches chat.getPrimary for an explicit spec id', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Chat owner' }, + }); + + const result = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + + expect(result).toEqual({ + specId: created.specId, + chatId: expect.any(Number), + kind: 'interview', + activeTurnId: null, + }); + }); + + it('dispatches chat.read as a compact agent-facing projection with next-command hints', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Chat readable' }, + }); + const turn = createTurn(activeDb, created.specId, { + parent_turn_id: null, + phase: 'grounding', + question: 'What are you trying to build?', + answer: null, + assistant_parts: null, + user_parts: null, + }); + advanceHead(activeDb, created.specId, turn.id); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + + const result = await dispatchCapability({ + db: activeDb, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }); + + expect(result).toEqual({ + specification: { id: created.specId, name: 'Chat readable', mode: 'greenfield' }, + chat: { + id: primary.chatId, + specificationId: created.specId, + kind: 'interview', + activeTurnId: turn.id, + }, + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: turn.id }, + turns: [ + { + id: turn.id, + phase: 'grounding', + kind: 'question', + question: 'What are you trying to build?', + answer: null, + isResolution: false, + options: [], + capturedItems: [], + }, + ], + nextCommands: [ + { capability: 'turn.submitResponse', input: { chatId: primary.chatId, turnId: turn.id } }, + ], + }); + }); + + it('reports the first open workflow phase when a chat has no active frontier', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Idle requirements spec' }, + }); + const groundingTurn = createTurn(activeDb, created.specId, { + parent_turn_id: null, + phase: 'grounding', + question: 'What are you trying to build?', + answer: 'A product planning tool', + }); + advanceHead(activeDb, created.specId, groundingTurn.id); + createConfirmedPhaseOutcome(activeDb, { + specificationId: created.specId, + phase: 'grounding', + proposal_turn_id: groundingTurn.id, + confirmation_turn_id: groundingTurn.id, + summary: 'Grounding closed.', + }); + const designTurn = createTurn(activeDb, created.specId, { + parent_turn_id: groundingTurn.id, + phase: 'design', + question: 'What should the design emphasize?', + answer: 'Agent-facing workflow affordances', + }); + advanceHead(activeDb, created.specId, designTurn.id); + createConfirmedPhaseOutcome(activeDb, { + specificationId: created.specId, + phase: 'design', + proposal_turn_id: designTurn.id, + confirmation_turn_id: designTurn.id, + summary: 'Design closed.', + }); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + activeDb + .update(schema.chat) + .set({ active_turn_id: null }) + .where(eq(schema.chat.id, primary.chatId)) + .run(); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }), + ).resolves.toMatchObject({ + frontier: { state: 'idle_no_frontier', phase: 'requirements', turnId: null }, + }); + }); + + it('dispatches chat.ensureReady by generating an answerable frontier', async () => { + const activeDb = createTempDb(); + const generateAnswerableFrontier = vi.fn(async () => ({ + question: '', + assistantParts: [ + { + type: 'tool-ask_question' as const, + toolCallId: 'question-1', + state: 'output-available' as const, + input: { + question: 'What are you trying to build?', + why: 'Grounding starts with the user goal.', + impact: 'high' as const, + options: [], + }, + output: { ok: true as const, turnId: 1, optionCount: 0 }, + }, + ], + })); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Ready spec' }, + }); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + + const result = await dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: primary.chatId }, + generateAnswerableFrontier, + }); + const activePath = getActivePath(activeDb, created.specId); + + expect(result).toEqual({ + chatId: primary.chatId, + specId: created.specId, + state: 'awaiting_response', + turnId: expect.any(Number), + nextCommands: [{ capability: 'chat.read', input: { chatId: primary.chatId } }], + }); + expect(generateAnswerableFrontier).toHaveBeenCalledOnce(); + expect(generateAnswerableFrontier).toHaveBeenCalledWith( + expect.objectContaining({ userMessage: 'Begin the grounding interview.' }), + ); + expect(activePath).toHaveLength(1); + expect(activePath[0]).toMatchObject({ + id: result.turnId, + phase: 'grounding', + question: 'What are you trying to build?', + answer: null, + }); + expect(activePath[0]?.assistant_parts).toContain('tool-ask_question'); + expect(getSpecification(activeDb, created.specId)?.active_turn_id).toBe(result.turnId); + expect( + await dispatchCapability({ + db: activeDb, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }), + ).toMatchObject({ + frontier: { state: 'awaiting_response', phase: 'grounding', turnId: result.turnId }, + turns: [expect.objectContaining({ question: 'What are you trying to build?' })], + nextCommands: [ + { capability: 'turn.submitResponse', input: { chatId: primary.chatId, turnId: result.turnId } }, + ], + }); + }); + + it('keeps chat.ensureReady idempotent when an answerable frontier already exists', async () => { + const activeDb = createTempDb(); + const generateAnswerableFrontier = vi.fn(async () => ({ + question: 'What should we clarify first?', + assistantParts: [{ type: 'text' as const, text: 'What should we clarify first?' }], + })); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Idempotent readiness' }, + }); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + + const first = await dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: primary.chatId }, + generateAnswerableFrontier, + }); + const second = await dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: primary.chatId }, + generateAnswerableFrontier, + }); + + expect(second).toEqual(first); + expect(second.state).toBe('awaiting_response'); + expect(generateAnswerableFrontier).toHaveBeenCalledOnce(); + expect(getActivePath(activeDb, created.specId)).toHaveLength(1); + }); + + it('dispatches turn.submitResponse through the existing turn-response transition', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Respondable spec' }, + }); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + const ready = await dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: primary.chatId }, + generateAnswerableFrontier: async () => ({ + question: 'What are you trying to build?', + assistantParts: [{ type: 'text' as const, text: 'What are you trying to build?' }], + }), + }); + + const result = await dispatchCapability({ + db: activeDb, + capability: 'turn.submitResponse', + input: { + chatId: primary.chatId, + turnId: ready.turnId, + response: { kind: 'free-text', freeText: 'A local spec elicitation tool' }, + }, + }); + + expect(result).toEqual({ + chatId: primary.chatId, + specId: created.specId, + turnId: ready.turnId, + response: { ok: true }, + nextCommands: [{ capability: 'chat.read', input: { chatId: primary.chatId } }], + }); + expect(getTurn(activeDb, ready.turnId)?.answer).toBe('A local spec elicitation tool'); + expect(getTurn(activeDb, ready.turnId)?.user_parts).toContain('data-turn-response'); + await expect( + dispatchCapability({ + db: activeDb, + capability: 'chat.read', + input: { chatId: primary.chatId }, + }), + ).resolves.toMatchObject({ + frontier: { state: 'answered', phase: 'grounding', turnId: ready.turnId }, + turns: [expect.objectContaining({ id: ready.turnId, answer: 'A local spec elicitation tool' })], + nextCommands: [{ capability: 'chat.ensureReady', input: { chatId: primary.chatId } }], + }); + }); + + it('accepts legacy spec-owned turns that predate chat id backfill', async () => { + const activeDb = createTempDb(); + const created = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Legacy turn owner' }, + }); + const primary = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: created.specId }, + }); + const turn = createTurn(activeDb, created.specId, { + parent_turn_id: null, + phase: 'grounding', + question: 'What are you trying to build?', + answer: null, + }); + advanceHead(activeDb, created.specId, turn.id); + activeDb.update(schema.turn).set({ chat_id: null }).where(eq(schema.turn.id, turn.id)).run(); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'turn.submitResponse', + input: { + chatId: primary.chatId, + turnId: turn.id, + response: { kind: 'free-text', freeText: 'A migrated pre-chat turn' }, + }, + }), + ).resolves.toMatchObject({ response: { ok: true } }); + expect(getTurn(activeDb, turn.id)?.answer).toBe('A migrated pre-chat turn'); + }); + + it('rejects turn.submitResponse for turns outside the explicit chat', async () => { + const activeDb = createTempDb(); + const first = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'First spec' }, + }); + const second = await dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: 'Second spec' }, + }); + const firstChat = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: first.specId }, + }); + const secondChat = await dispatchCapability({ + db: activeDb, + capability: 'chat.getPrimary', + input: { specId: second.specId }, + }); + const secondReady = await dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: secondChat.chatId }, + generateAnswerableFrontier: async () => ({ + question: 'What are you trying to build?', + assistantParts: [{ type: 'text' as const, text: 'What are you trying to build?' }], + }), + }); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'turn.submitResponse', + input: { + chatId: firstChat.chatId, + turnId: secondReady.turnId, + response: { kind: 'free-text', freeText: 'Wrong owner' }, + }, + }), + ).rejects.toThrow(`Turn ${secondReady.turnId} does not belong to chat ${firstChat.chatId}`); + expect(getTurn(activeDb, secondReady.turnId)?.answer).toBeNull(); + }); + + it('rejects unknown chat ids and schema-invalid capability input before calling handlers', async () => { + const activeDb = createTempDb(); + await expect( + dispatchCapability({ + db: activeDb, + capability: 'chat.read', + input: { chatId: 999 }, + }), + ).rejects.toThrow('Chat 999 not found'); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'chat.ensureReady', + input: { chatId: 999 }, + }), + ).rejects.toThrow('Chat 999 not found'); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'turn.submitResponse', + input: { chatId: 1, turnId: 1, response: { kind: 'free-text', freeText: '' } }, + }), + ).rejects.toThrow('Invalid input for capability turn.submitResponse'); + + await expect( + dispatchCapability({ + db: activeDb, + capability: 'spec.create', + input: { name: '' }, + }), + ).rejects.toThrow('Invalid input for capability spec.create'); + }); +}); diff --git a/src/server/capabilities.ts b/src/server/capabilities.ts new file mode 100644 index 00000000..9d674b77 --- /dev/null +++ b/src/server/capabilities.ts @@ -0,0 +1,622 @@ +import { readUIMessageStream } from 'ai'; +import { eq } from 'drizzle-orm'; +import { z } from 'zod'; + +import { submitTurnResponseRequestSchema } from '@/shared/api-types.js'; +import { extractTextFromMessage, structuredQuestionSchema, type BrunchUIMessage } from '@/shared/chat.js'; +import { getCurrentWorkflowPhase } from '@/shared/phase-close.js'; + +import { getCapabilityContract, type CapabilityId } from './capability-registry.js'; +import { applyChatRouteTransition } from './chat-route-transition.js'; +import { createNewSpecification, finalizeTurn, getSpecificationState, type TurnWithOptions } from './core.js'; +import type { DB, Turn } from './db.js'; +import { getTurn, updateTurn } from './db.js'; +import { persistFallbackQuestionText, streamInterviewer, type InterviewerModeOptions } from './interview.js'; +import { serializeParts, type AssistantPart } from './parts.js'; +import * as schema from './schema.js'; +import { materializeTurnArtifacts } from './turn-artifacts.js'; +import { submitTurnResponseTransition } from './turn-response-transition.js'; + +const specCreateInputSchema = z.object({ + name: z.string().trim().min(1), + mode: z.enum(['greenfield', 'brownfield']).optional(), +}); + +const specGetStatusInputSchema = z.object({ + specId: z.number().int().positive(), +}); + +const chatGetPrimaryInputSchema = z.object({ + specId: z.number().int().positive(), +}); + +const chatReadInputSchema = z.object({ + chatId: z.number().int().positive(), +}); + +const chatEnsureReadyInputSchema = z.object({ + chatId: z.number().int().positive(), +}); + +const turnSubmitResponseInputSchema = z.object({ + chatId: z.number().int().positive(), + turnId: z.number().int().positive(), + response: submitTurnResponseRequestSchema, +}); + +const capabilityInputSchemas = { + 'spec.create': specCreateInputSchema, + 'spec.getStatus': specGetStatusInputSchema, + 'chat.getPrimary': chatGetPrimaryInputSchema, + 'chat.read': chatReadInputSchema, + 'chat.ensureReady': chatEnsureReadyInputSchema, + 'turn.submitResponse': turnSubmitResponseInputSchema, +} as const; + +export class CapabilityDispatchError extends Error { + constructor( + message: string, + public readonly code: 'unknown_capability' | 'invalid_input' | 'handler_failed', + ) { + super(message); + this.name = 'CapabilityDispatchError'; + } +} + +export interface GeneratedAnswerableFrontier { + question: string; + assistantParts: AssistantPart[]; +} + +export interface GenerateAnswerableFrontierInput { + db: DB; + turn: Turn; + activePath: TurnWithOptions[]; + userMessage: string; + modeOptions?: InterviewerModeOptions; +} + +export type GenerateAnswerableFrontier = ( + input: GenerateAnswerableFrontierInput, +) => Promise; + +export interface CapabilityDispatchContext { + db: DB; + generateAnswerableFrontier?: GenerateAnswerableFrontier; + projectCwd?: string; +} + +export interface DispatchCapabilityInput extends CapabilityDispatchContext { + capability: string; + input: unknown; +} + +type SpecCreateInput = z.infer; +type SpecGetStatusInput = z.infer; +type ChatGetPrimaryInput = z.infer; +type ChatReadInput = z.infer; +type ChatEnsureReadyInput = z.infer; +type TurnSubmitResponseInput = z.infer; +type SpecCreateOutput = ReturnType; +type SpecGetStatusOutput = ReturnType; +type ChatGetPrimaryOutput = ReturnType; +type ChatReadOutput = ReturnType; +type ChatEnsureReadyOutput = Awaited>; +type TurnSubmitResponseOutput = ReturnType; + +function parseSpecCreateInput(input: unknown): SpecCreateInput { + const parsed = specCreateInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability spec.create', 'invalid_input'); + } + return parsed.data; +} + +function parseSpecGetStatusInput(input: unknown): SpecGetStatusInput { + const parsed = specGetStatusInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability spec.getStatus', 'invalid_input'); + } + return parsed.data; +} + +function parseChatGetPrimaryInput(input: unknown): ChatGetPrimaryInput { + const parsed = chatGetPrimaryInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability chat.getPrimary', 'invalid_input'); + } + return parsed.data; +} + +function parseChatReadInput(input: unknown): ChatReadInput { + const parsed = chatReadInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability chat.read', 'invalid_input'); + } + return parsed.data; +} + +function parseChatEnsureReadyInput(input: unknown): ChatEnsureReadyInput { + const parsed = chatEnsureReadyInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability chat.ensureReady', 'invalid_input'); + } + return parsed.data; +} + +function parseTurnSubmitResponseInput(input: unknown): TurnSubmitResponseInput { + const parsed = turnSubmitResponseInputSchema.safeParse(input); + if (!parsed.success) { + throw new CapabilityDispatchError('Invalid input for capability turn.submitResponse', 'invalid_input'); + } + return parsed.data; +} + +function assertExecutableCapability( + capability: string, +): asserts capability is keyof typeof capabilityInputSchemas { + try { + getCapabilityContract(capability as CapabilityId); + } catch { + throw new CapabilityDispatchError(`Unknown capability ${capability}`, 'unknown_capability'); + } + + if (!(capability in capabilityInputSchemas)) { + throw new CapabilityDispatchError( + `Capability ${capability} has no executable handler`, + 'unknown_capability', + ); + } +} + +function createSpecificationFromCapability(db: DB, input: SpecCreateInput) { + const specification = createNewSpecification( + db, + input.name, + input.mode === 'brownfield' ? { mode: input.mode } : {}, + ); + return { + specId: specification.id, + specification, + }; +} + +function getSpecificationStatusFromCapability(db: DB, input: SpecGetStatusInput) { + const state = getSpecificationState(db, input.specId); + if (!state) { + throw new CapabilityDispatchError(`Specification ${input.specId} not found`, 'handler_failed'); + } + return state; +} + +function getPrimaryChatFromCapability(db: DB, input: ChatGetPrimaryInput) { + const specification = db + .select({ + id: schema.specification.id, + primary_chat_id: schema.specification.primary_chat_id, + }) + .from(schema.specification) + .where(eq(schema.specification.id, input.specId)) + .get(); + + if (!specification) { + throw new CapabilityDispatchError(`Specification ${input.specId} not found`, 'handler_failed'); + } + if (!specification.primary_chat_id) { + throw new CapabilityDispatchError(`Specification ${input.specId} has no primary chat`, 'handler_failed'); + } + + const chat = db + .select({ + id: schema.chat.id, + specification_id: schema.chat.specification_id, + kind: schema.chat.kind, + active_turn_id: schema.chat.active_turn_id, + }) + .from(schema.chat) + .where(eq(schema.chat.id, specification.primary_chat_id)) + .get(); + + if (!chat || chat.specification_id !== input.specId) { + throw new CapabilityDispatchError( + `Primary chat for specification ${input.specId} not found`, + 'handler_failed', + ); + } + + return { + specId: input.specId, + chatId: chat.id, + kind: chat.kind, + activeTurnId: chat.active_turn_id, + }; +} + +function getChatById(db: DB, chatId: number) { + return db + .select({ + id: schema.chat.id, + specification_id: schema.chat.specification_id, + kind: schema.chat.kind, + active_turn_id: schema.chat.active_turn_id, + }) + .from(schema.chat) + .where(eq(schema.chat.id, chatId)) + .get(); +} + +const INITIAL_INTERVIEWER_PROMPT = 'Begin the grounding interview.'; + +function getReadyStateForTurn(turn: { question: string; answer: string | null }) { + if (turn.answer !== null) { + return 'answered'; + } + return turn.question.trim() === '' ? 'needs_generation' : 'awaiting_response'; +} + +async function generateAnswerableFrontierWithInterviewer({ + db, + turn, + activePath, + userMessage, + modeOptions, +}: GenerateAnswerableFrontierInput): Promise { + const startedAt = Date.now(); + const interviewer = await streamInterviewer(db, turn, activePath, userMessage, turn.phase, modeOptions); + const stream = interviewer.toUIMessageStream({ + sendReasoning: true, + sendFinish: false, + }); + let responseMessage: BrunchUIMessage | null = null; + for await (const message of readUIMessageStream({ stream })) { + responseMessage = message; + } + await interviewer.finishReason; + + if (!responseMessage) { + throw new Error(`Interviewer did not generate content for turn ${turn.id}`); + } + + const assistantParts = materializeTurnArtifacts({ + phase: turn.phase, + responseMessage, + elapsedMs: Date.now() - startedAt, + }); + const question = + extractTextFromMessage(responseMessage) || extractQuestionFromAssistantParts(assistantParts); + + return { question, assistantParts }; +} + +function extractQuestionFromAssistantParts(parts: AssistantPart[]): string { + const askQuestionPart = parts.find( + (part): part is Extract => + part.type === 'tool-ask_question' && 'input' in part, + ); + if (!askQuestionPart) { + return ''; + } + + const parsedInput = structuredQuestionSchema.safeParse(askQuestionPart.input); + return parsedInput.success ? parsedInput.data.question : ''; +} + +async function persistGeneratedAnswerableFrontier( + db: DB, + turn: Turn, + generated: GeneratedAnswerableFrontier, +): Promise { + const currentQuestion = getTurn(db, turn.id)?.question ?? ''; + const question = + generated.question || extractQuestionFromAssistantParts(generated.assistantParts) || currentQuestion; + if (question.trim() === '') { + throw new Error(`Interviewer generated an empty question for turn ${turn.id}`); + } + + persistFallbackQuestionText(db, turn.id, question); + updateTurn(db, turn.id, { + assistant_parts: serializeParts(generated.assistantParts), + }); +} + +function readChatFromCapability(db: DB, input: ChatReadInput) { + const chat = getChatById(db, input.chatId); + if (!chat) { + throw new CapabilityDispatchError(`Chat ${input.chatId} not found`, 'handler_failed'); + } + + const state = getSpecificationState(db, chat.specification_id); + if (!state) { + throw new CapabilityDispatchError(`Specification ${chat.specification_id} not found`, 'handler_failed'); + } + + const currentPhase = getCurrentWorkflowPhase(state.workflow); + const activeTurn = state.turns.find((turn) => turn.id === chat.active_turn_id) ?? null; + const frontier = activeTurn + ? { state: getReadyStateForTurn(activeTurn), phase: activeTurn.phase, turnId: activeTurn.id } + : { state: 'idle_no_frontier' as const, phase: currentPhase, turnId: null }; + const nextCommands = + activeTurn && frontier.state === 'awaiting_response' + ? [{ capability: 'turn.submitResponse', input: { chatId: chat.id, turnId: activeTurn.id } }] + : [{ capability: 'chat.ensureReady', input: { chatId: chat.id } }]; + + return { + specification: { + id: state.specification.id, + name: state.specification.name, + mode: state.specification.mode, + }, + chat: { + id: chat.id, + specificationId: chat.specification_id, + kind: chat.kind, + activeTurnId: chat.active_turn_id, + }, + frontier, + turns: state.turns.map((turn) => ({ + id: turn.id, + phase: turn.phase, + kind: turn.turn_kind ?? 'question', + question: turn.question, + answer: turn.answer, + isResolution: Boolean(turn.is_resolution), + options: turn.options ?? [], + capturedItems: turn.captured_items ?? [], + })), + nextCommands, + }; +} + +function submitTurnResponseFromCapability(db: DB, input: TurnSubmitResponseInput) { + const chat = getChatById(db, input.chatId); + if (!chat) { + throw new CapabilityDispatchError(`Chat ${input.chatId} not found`, 'handler_failed'); + } + + const turn = getTurn(db, input.turnId); + if (!turn) { + throw new CapabilityDispatchError(`Turn ${input.turnId} not found`, 'handler_failed'); + } + const belongsToChat = turn.chat_id === chat.id; + const belongsToLegacySpecChat = turn.chat_id === null && turn.specification_id === chat.specification_id; + if ((!belongsToChat && !belongsToLegacySpecChat) || turn.specification_id !== chat.specification_id) { + throw new CapabilityDispatchError( + `Turn ${input.turnId} does not belong to chat ${input.chatId}`, + 'handler_failed', + ); + } + + const response = submitTurnResponseTransition({ + db, + specificationId: chat.specification_id, + turnId: turn.id, + request: input.response, + }); + + if (!response.ok) { + throw new CapabilityDispatchError(response.message, 'handler_failed'); + } + + return { + chatId: chat.id, + specId: chat.specification_id, + turnId: turn.id, + response, + nextCommands: [{ capability: 'chat.read', input: { chatId: chat.id } }], + }; +} + +async function ensureChatReadyFromCapability( + db: DB, + input: ChatEnsureReadyInput, + { + generateAnswerableFrontier = generateAnswerableFrontierWithInterviewer, + projectCwd, + }: Pick = {}, +) { + const chat = getChatById(db, input.chatId); + if (!chat) { + throw new CapabilityDispatchError(`Chat ${input.chatId} not found`, 'handler_failed'); + } + + const state = getSpecificationState(db, chat.specification_id); + if (!state) { + throw new CapabilityDispatchError(`Specification ${chat.specification_id} not found`, 'handler_failed'); + } + + const modeOptions = + state.specification.mode === 'brownfield' && projectCwd + ? { mode: 'brownfield' as const, cwd: projectCwd } + : undefined; + const activeTurn = state.turns.find((turn) => turn.id === chat.active_turn_id) ?? null; + if (activeTurn) { + const activeState = getReadyStateForTurn(activeTurn); + if (activeState === 'awaiting_response') { + return { + chatId: chat.id, + specId: chat.specification_id, + state: 'awaiting_response' as const, + turnId: activeTurn.id, + nextCommands: [{ capability: 'chat.read', input: { chatId: chat.id } }], + }; + } + + if (activeState === 'needs_generation') { + const persistedActiveTurn = getTurn(db, activeTurn.id); + if (!persistedActiveTurn) { + throw new CapabilityDispatchError(`Turn ${activeTurn.id} not found`, 'handler_failed'); + } + const generated = await generateAnswerableFrontier({ + db, + turn: persistedActiveTurn, + activePath: state.turns, + userMessage: INITIAL_INTERVIEWER_PROMPT, + modeOptions, + }); + await persistGeneratedAnswerableFrontier(db, persistedActiveTurn, generated); + + return { + chatId: chat.id, + specId: chat.specification_id, + state: 'awaiting_response' as const, + turnId: activeTurn.id, + nextCommands: [{ capability: 'chat.read', input: { chatId: chat.id } }], + }; + } + + const answeredText = activeTurn.answer ?? ''; + const transition = applyChatRouteTransition( + { db, specificationId: chat.specification_id }, + { + kind: 'continue', + reply: { text: answeredText, parts: [] }, + }, + ); + if (!transition.ok) { + throw new CapabilityDispatchError(transition.message, 'handler_failed'); + } + if (transition.kind !== 'interviewer-turn') { + throw new CapabilityDispatchError( + `Chat ${chat.id} did not produce an interviewer frontier`, + 'handler_failed', + ); + } + finalizeTurn(db, chat.specification_id, transition.prepared.turn.id); + const generated = await generateAnswerableFrontier({ + db, + turn: transition.prepared.turn, + activePath: transition.prepared.activePath, + userMessage: answeredText, + modeOptions, + }); + await persistGeneratedAnswerableFrontier(db, transition.prepared.turn, generated); + + return { + chatId: chat.id, + specId: chat.specification_id, + state: 'awaiting_response' as const, + turnId: transition.prepared.turn.id, + nextCommands: [{ capability: 'chat.read', input: { chatId: chat.id } }], + }; + } + + const landing = state.landing; + if (!landing || landing.kind === 'frontier-turn') { + throw new CapabilityDispatchError( + `Chat ${chat.id} is not ready for deterministic entry`, + 'handler_failed', + ); + } + + const request = + landing.kind === 'kickoff' + ? { kind: 'phase-entry' as const, phase: landing.phase } + : { kind: 'phase-continue' as const, phase: landing.phase }; + const transition = applyChatRouteTransition( + { db, specificationId: chat.specification_id }, + { + kind: 'phase-entry', + request, + }, + ); + + if (!transition.ok) { + throw new CapabilityDispatchError(transition.message, 'handler_failed'); + } + if (transition.kind !== 'interviewer-turn') { + throw new CapabilityDispatchError( + `Chat ${chat.id} did not produce an interviewer frontier`, + 'handler_failed', + ); + } + + finalizeTurn(db, chat.specification_id, transition.prepared.turn.id); + const generated = await generateAnswerableFrontier({ + db, + turn: transition.prepared.turn, + activePath: transition.prepared.activePath, + userMessage: INITIAL_INTERVIEWER_PROMPT, + modeOptions, + }); + await persistGeneratedAnswerableFrontier(db, transition.prepared.turn, generated); + + return { + chatId: chat.id, + specId: chat.specification_id, + state: 'awaiting_response' as const, + turnId: transition.prepared.turn.id, + nextCommands: [{ capability: 'chat.read', input: { chatId: chat.id } }], + }; +} + +export function dispatchCapability(input: { + db: DB; + capability: 'spec.create'; + input: unknown; +}): Promise; +export function dispatchCapability(input: { + db: DB; + capability: 'spec.getStatus'; + input: unknown; +}): Promise; +export function dispatchCapability(input: { + db: DB; + capability: 'chat.getPrimary'; + input: unknown; +}): Promise; +export function dispatchCapability(input: { + db: DB; + capability: 'chat.read'; + input: unknown; +}): Promise; +export function dispatchCapability(input: { + db: DB; + capability: 'chat.ensureReady'; + input: unknown; + generateAnswerableFrontier?: GenerateAnswerableFrontier; + projectCwd?: string; +}): Promise; +export function dispatchCapability(input: { + db: DB; + capability: 'turn.submitResponse'; + input: unknown; +}): Promise; +export function dispatchCapability(input: DispatchCapabilityInput): Promise; +export async function dispatchCapability({ + db, + capability, + input, + generateAnswerableFrontier, + projectCwd, +}: DispatchCapabilityInput): Promise { + assertExecutableCapability(capability); + + if (capability === 'spec.create') { + return createSpecificationFromCapability(db, parseSpecCreateInput(input)); + } + + if (capability === 'spec.getStatus') { + return getSpecificationStatusFromCapability(db, parseSpecGetStatusInput(input)); + } + + if (capability === 'chat.getPrimary') { + return getPrimaryChatFromCapability(db, parseChatGetPrimaryInput(input)); + } + + if (capability === 'chat.read') { + return readChatFromCapability(db, parseChatReadInput(input)); + } + + if (capability === 'chat.ensureReady') { + return ensureChatReadyFromCapability(db, parseChatEnsureReadyInput(input), { + generateAnswerableFrontier, + projectCwd, + }); + } + + if (capability === 'turn.submitResponse') { + return submitTurnResponseFromCapability(db, parseTurnSubmitResponseInput(input)); + } + + throw new CapabilityDispatchError('Capability has no executable handler', 'unknown_capability'); +} diff --git a/src/server/capability-registry.test.ts b/src/server/capability-registry.test.ts index 60bb1bcb..21629a7e 100644 --- a/src/server/capability-registry.test.ts +++ b/src/server/capability-registry.test.ts @@ -41,6 +41,42 @@ describe('capability registry', () => { id: 'changeset.submit', authority: 'proposal_only', }), + expect.objectContaining({ + id: 'spec.create', + authority: 'commit_truth', + inputSchema: 'spec.create.input.v1', + outputSchema: 'spec.create.output.v1', + }), + expect.objectContaining({ + id: 'spec.getStatus', + authority: 'read_only', + inputSchema: 'spec.getStatus.input.v1', + outputSchema: 'spec.getStatus.output.v1', + }), + expect.objectContaining({ + id: 'chat.getPrimary', + authority: 'read_only', + inputSchema: 'chat.getPrimary.input.v1', + outputSchema: 'chat.getPrimary.output.v1', + }), + expect.objectContaining({ + id: 'chat.read', + authority: 'read_only', + inputSchema: 'chat.read.input.v1', + outputSchema: 'chat.read.output.v1', + }), + expect.objectContaining({ + id: 'chat.ensureReady', + authority: 'commit_truth', + inputSchema: 'chat.ensureReady.input.v1', + outputSchema: 'chat.ensureReady.output.v1', + }), + expect.objectContaining({ + id: 'turn.submitResponse', + authority: 'commit_truth', + inputSchema: 'turn.submitResponse.input.v1', + outputSchema: 'turn.submitResponse.output.v1', + }), ]); }); @@ -49,6 +85,8 @@ describe('capability registry', () => { id: 'workspace.readFile', authority: 'read_only', summary: 'Read a file from the workspace context.', + inputSchema: 'workspace.readFile.input.v1', + outputSchema: 'workspace.readFile.output.v1', handler: null, }); }); diff --git a/src/server/capability-registry.ts b/src/server/capability-registry.ts index 9000cac2..eff042bc 100644 --- a/src/server/capability-registry.ts +++ b/src/server/capability-registry.ts @@ -14,15 +14,23 @@ export type CapabilityId = | 'intentGraph.validateEdge' | 'scenario.render' | 'observer.captureTurnIntent' - | 'changeset.submit'; + | 'changeset.submit' + | 'spec.create' + | 'spec.getStatus' + | 'chat.getPrimary' + | 'chat.read' + | 'chat.ensureReady' + | 'turn.submitResponse'; export interface CapabilityContract { id: CapabilityId; authority: CapabilityAuthority; summary: string; + inputSchema: string; + outputSchema: string; /** - * Capability contracts are metadata only for now. Runtime handlers and adapter - * tool projections must be introduced explicitly in later slices. + * Capability contracts carry transport-safe metadata here. Executable handlers + * live behind the capability dispatcher so adapters do not own product semantics. */ handler: null; } @@ -32,48 +40,112 @@ const capabilityContracts = [ id: 'workspace.readFile', authority: 'read_only', summary: 'Read a file from the workspace context.', + inputSchema: 'workspace.readFile.input.v1', + outputSchema: 'workspace.readFile.output.v1', handler: null, }, { id: 'workspace.search', authority: 'read_only', summary: 'Search workspace files without mutating project or Brunch state.', + inputSchema: 'workspace.search.input.v1', + outputSchema: 'workspace.search.output.v1', handler: null, }, { id: 'web.search', authority: 'read_only', summary: 'Search the web for current external context without mutating Brunch state.', + inputSchema: 'web.search.input.v1', + outputSchema: 'web.search.output.v1', handler: null, }, { id: 'web.fetchPage', authority: 'read_only', summary: 'Fetch a web page for research context without mutating Brunch state.', + inputSchema: 'web.fetchPage.input.v1', + outputSchema: 'web.fetchPage.output.v1', handler: null, }, { id: 'intentGraph.validateEdge', authority: 'read_only', summary: 'Validate an intent graph edge against relation policy without mutating graph truth.', + inputSchema: 'intentGraph.validateEdge.input.v1', + outputSchema: 'intentGraph.validateEdge.output.v1', handler: null, }, { id: 'scenario.render', authority: 'read_only', summary: 'Render prompt scenario inputs into a reviewable probe artifact.', + inputSchema: 'scenario.render.input.v1', + outputSchema: 'scenario.render.output.v1', handler: null, }, { id: 'observer.captureTurnIntent', authority: 'commit_truth', summary: 'Capture supported intent items and edges from a validated turn.', + inputSchema: 'observer.captureTurnIntent.input.v1', + outputSchema: 'observer.captureTurnIntent.output.v1', handler: null, }, { id: 'changeset.submit', authority: 'proposal_only', summary: 'Submit proposed semantic graph changes for later validation and application.', + inputSchema: 'changeset.submit.input.v1', + outputSchema: 'changeset.submit.output.v1', + handler: null, + }, + { + id: 'spec.create', + authority: 'commit_truth', + summary: 'Create a new Brunch specification in the local project store.', + inputSchema: 'spec.create.input.v1', + outputSchema: 'spec.create.output.v1', + handler: null, + }, + { + id: 'spec.getStatus', + authority: 'read_only', + summary: 'Read the current workflow and active-path projection for an explicit specification id.', + inputSchema: 'spec.getStatus.input.v1', + outputSchema: 'spec.getStatus.output.v1', + handler: null, + }, + { + id: 'chat.getPrimary', + authority: 'read_only', + summary: 'Read the primary interview chat identity for an explicit specification id.', + inputSchema: 'chat.getPrimary.input.v1', + outputSchema: 'chat.getPrimary.output.v1', + handler: null, + }, + { + id: 'chat.read', + authority: 'read_only', + summary: 'Read a compact agent-facing projection for an explicit chat id.', + inputSchema: 'chat.read.input.v1', + outputSchema: 'chat.read.output.v1', + handler: null, + }, + { + id: 'chat.ensureReady', + authority: 'commit_truth', + summary: 'Ensure an explicit chat has an answerable generated frontier.', + inputSchema: 'chat.ensureReady.input.v1', + outputSchema: 'chat.ensureReady.output.v1', + handler: null, + }, + { + id: 'turn.submitResponse', + authority: 'commit_truth', + summary: 'Submit a structured response to an explicit chat turn.', + inputSchema: 'turn.submitResponse.input.v1', + outputSchema: 'turn.submitResponse.output.v1', handler: null, }, ] as const satisfies readonly CapabilityContract[]; diff --git a/src/server/cli.test.ts b/src/server/cli.test.ts index bf211a19..35a42dbb 100644 --- a/src/server/cli.test.ts +++ b/src/server/cli.test.ts @@ -62,12 +62,13 @@ function runCommand( args: string[], cwd: string, env: NodeJS.ProcessEnv = process.env, + input?: string, ): Promise { return new Promise((resolve, reject) => { const child = spawn(command, args, { cwd, env, - stdio: ['ignore', 'pipe', 'pipe'], + stdio: [input === undefined ? 'ignore' : 'pipe', 'pipe', 'pipe'], }); let stdout = ''; @@ -79,6 +80,9 @@ function runCommand( child.stderr?.on('data', (chunk) => { stderr += chunk.toString(); }); + if (input !== undefined) { + child.stdin?.end(input); + } child.once('error', reject); child.once('close', (code) => { resolve({ code, stdout, stderr }); @@ -117,8 +121,13 @@ async function packBuiltPackage(): Promise<{ filePaths: string[]; installedRoot: }; } -function runCli(args: string[], cwd: string, env: NodeJS.ProcessEnv = process.env): Promise { - return runCommand(process.execPath, [getInstalledBinEntrypoint(), ...args], cwd, env); +function runCli( + args: string[], + cwd: string, + env: NodeJS.ProcessEnv = process.env, + input?: string, +): Promise { + return runCommand(process.execPath, [getInstalledBinEntrypoint(), ...args], cwd, env, input); } describe('published CLI entrypoint', () => { @@ -183,6 +192,35 @@ describe('published CLI entrypoint', () => { expect(result.stdout).toContain('Usage: brunch'); }); + it('runs the packaged agent JSONL session without launching the web UI', async () => { + const workspaceCwd = makeTempDir('brunch-agent-workspace-'); + const input = `${JSON.stringify({ + id: 'create-1', + capability: 'spec.create', + input: { name: 'Packaged agent spec' }, + })}\n${JSON.stringify({ id: 'read-1', capability: 'spec.getStatus', input: { specId: 1 } })}\n`; + + const result = await runCli(['agent'], workspaceCwd, process.env, input); + const responses = result.stdout + .trim() + .split('\n') + .filter(Boolean) + .map((line) => JSON.parse(line) as unknown); + + expect(result.code).toBe(0); + expect(result.stderr).toBe(''); + expect(responses).toEqual([ + expect.objectContaining({ id: 'create-1', ok: true, output: expect.objectContaining({ specId: 1 }) }), + expect.objectContaining({ + id: 'read-1', + ok: true, + output: expect.objectContaining({ + specification: expect.objectContaining({ id: 1, name: 'Packaged agent spec' }), + }), + }), + ]); + }); + it('dry-runs the release flow against the packaged npm artifact seam', async () => { const result = await runCommand( 'npm', diff --git a/src/server/cli.ts b/src/server/cli.ts index afad1ea2..f879eb00 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -1,21 +1,42 @@ #!/usr/bin/env node +import { runAgentJsonlSession } from './agent-jsonl.js'; +import { createDb } from './db.js'; import { launch } from './launcher.js'; +import { resolveBrunchProject } from './project.js'; import { loadLocalEnvFile } from './runtime-config.js'; -const args = new Set(process.argv.slice(2)); +const rawArgs = process.argv.slice(2); +const args = new Set(rawArgs); const launchCwd = process.env.BRUNCH_LAUNCH_CWD || process.cwd(); loadLocalEnvFile(launchCwd); if (args.has('--help') || args.has('-h') || args.has('help')) { - console.log('Usage: brunch'); + console.log('Usage: brunch [agent]'); console.log(''); console.log('Launch the Brunch web UI in the current project directory.'); + console.log(''); + console.log('Commands:'); + console.log(' agent Run a JSONL capability session on stdin/stdout.'); process.exit(0); } -launch(launchCwd).catch((error) => { - console.error('Failed to start brunch:', error); - process.exit(1); -}); +if (rawArgs[0] === 'agent') { + const project = resolveBrunchProject(launchCwd); + const db = createDb(project.dbPath); + runAgentJsonlSession({ db, input: process.stdin, output: process.stdout, projectCwd: project.cwd }) + .then(() => { + db.$client.close(); + }) + .catch((error) => { + db.$client.close(); + console.error('Failed to run brunch agent session:', error); + process.exit(1); + }); +} else { + launch(launchCwd).catch((error) => { + console.error('Failed to start brunch:', error); + process.exit(1); + }); +} diff --git a/tsconfig.json b/tsconfig.json index 24423841..942661c0 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -11,5 +11,5 @@ "@/*": ["./src/*"] } }, - "include": ["src", ".ladle", "node_modules/vite/client.d.ts"] + "include": ["src", "scripts", ".ladle", "node_modules/vite/client.d.ts"] } diff --git a/vite.config.ts b/vite.config.ts index 8779c82b..255fb1e8 100644 --- a/vite.config.ts +++ b/vite.config.ts @@ -44,7 +44,7 @@ export default defineConfig(({ command, mode }) => { dedupe: ['react', 'react-dom'], }, test: { - include: ['src/**/*.test.{js,ts,jsx,tsx}'], + include: ['src/**/*.test.{js,ts,jsx,tsx}', 'scripts/**/*.test.{js,ts,jsx,tsx}'], }, };