Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
ef5d2f4
initial sync plus first status-/semantic-reconciliation of PLAN vs PL…
lunelson May 13, 2026
1c551c3
reconcile SPEC, and delete fe-705 reference copies
lunelson May 13, 2026
0deb775
FE-705: Add agent JSONL lifecycle capabilities
lunelson May 11, 2026
27e28e5
FE-705: Add agent chat read capabilities
lunelson May 11, 2026
696c8b6
FE-705: Add deterministic chat readiness
lunelson May 11, 2026
32a65e9
FE-705: Generate agent chat readiness
lunelson May 11, 2026
642d57d
FE-705: Add agent turn response capability
lunelson May 11, 2026
61f2ca2
FE-705: Harden agent readiness smoke
lunelson May 11, 2026
9855375
FE-705: Add scripted probe runner core
lunelson May 12, 2026
0e7242c
FE-705: Add process-backed probe runner
lunelson May 12, 2026
ef3b2ea
FE-705: Harden probe artifacts
lunelson May 12, 2026
58076e1
FE-705: Guard probe runner imports
lunelson May 12, 2026
0688391
add the d3k skill, as potential better solution than agent-tail
lunelson May 12, 2026
5ff5d03
Move probe runner to scripts harness
lunelson May 12, 2026
d8477bf
Preserve probe workspace state
lunelson May 12, 2026
6ddc1b1
Add probe response policy seam
lunelson May 12, 2026
befc34a
Add model-backed probe user policy
lunelson May 12, 2026
6bedcca
Add packaged LLM user smoke helper
lunelson May 12, 2026
2db789d
Add fixture candidate checkpoint
lunelson May 12, 2026
2e086f0
Harden probe JSONL transport failures
lunelson May 12, 2026
c6d4bdf
Capture process probe failure artifacts
lunelson May 12, 2026
be28f27
Add probe runner turn budget
lunelson May 12, 2026
eff1b4b
Validate fixture candidate structure
lunelson May 12, 2026
8309e54
Split fixture readiness reporting
lunelson May 12, 2026
49214bb
first full grill of spec evolution strategeis
lunelson May 12, 2026
36e72f8
RFC version of spec evolution, integrated in to spec and plan
lunelson May 12, 2026
125c84f
consolidation pass on design docs
lunelson May 13, 2026
fb68170
Map runtime design doc supersession
lunelson May 13, 2026
9bb18a7
Clarify side-chat shipped and horizon claims
lunelson May 13, 2026
34b23c3
Translate patch ledger doc to changeset vocabulary
lunelson May 13, 2026
8a946ba
Clarify multi-chat substrate authority
lunelson May 13, 2026
146c347
Refresh design doc navigation index
lunelson May 13, 2026
d7d8f97
Retire runtime docs refactor plan
lunelson May 13, 2026
a803322
update deferred reconciliations
lunelson May 13, 2026
2463617
first pass adoption of new pocock-derived skills
lunelson May 13, 2026
521e68d
activation density for new skills
lunelson May 13, 2026
c9a5aa4
refactor of the ln-plan skill and template + all skills that referenc…
lunelson May 13, 2026
ada5a47
migrate PLAN.md to the new structure
lunelson May 13, 2026
d80f37a
separate documentation of ln- skills vs product workflows
lunelson May 13, 2026
de8221e
document and synchronize policy WRT pre-release posture
lunelson May 13, 2026
1bcd5a7
plan a restructuring of SPEC doc and template
lunelson May 13, 2026
b7a6f9e
FE-705: Address agent capability review comments
lunelson May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions .agents/skills/d3k/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
name: "d3k"
description: "d3k assistant for debugging web apps"
---

# d3k Commands

d3k captures browser and server logs in a unified log file. Use these commands:

## Viewing Errors and Logs

```bash
d3k errors # Show recent errors (browser + server combined)
d3k errors --context # Show errors + user actions that preceded them
d3k errors -n 20 # Show last 20 errors

d3k logs # Show recent logs (browser + server combined)
d3k logs --type browser # Browser logs only
d3k logs --type server # Server logs only
```

## Other Commands

```bash
d3k fix # Deep analysis of application errors
d3k fix --focus build # Focus on build errors

d3k crawl # Discover app URLs
d3k crawl --depth all # Exhaustive crawl
```

## Browser Interaction

`d3k agent-browser` auto-connects to the active session's browser via CDP:

```bash
d3k agent-browser open http://localhost:3000/page
d3k agent-browser snapshot -i # Get element refs (@e1, @e2)
d3k agent-browser click @e2
d3k agent-browser fill @e3 "text"
d3k agent-browser screenshot /tmp/shot.png
```

To target a different browser, run `d3k agent-browser connect <port>` first.

## Codex Fresh Browser/Profile Startup

Use this workflow when the user asks Codex to start d3k with a fresh browser/profile.

1. Close any stale `agent-browser` daemon before launching with `--profile`. Otherwise `agent-browser` will reuse the existing daemon and print `--profile ignored`.
```bash
d3k agent-browser close --all
```

2. Start the app through d3k in `servers-only` mode and keep that command running. In Codex, this is more reliable than asking d3k to launch the browser itself when a fresh profile is required.
```bash
d3k --no-agent --no-skills --servers-only --command "npm run dev -- -H 127.0.0.1 -p 3000" --port 3000 --startup-timeout 90 --no-tui
```

Adjust the package-manager command and port for the project. Prefer `--command` over `--script` when passing framework flags. For npm scripts, put flags after `--`; otherwise tools like Next.js can interpret the port as a project directory.

3. Verify the server before opening more browser windows:
```bash
curl -I http://127.0.0.1:3000
```

4. Open the fresh profile as a separate browser step:
```bash
d3k agent-browser --profile /tmp/d3k-fresh-profile --headed open http://127.0.0.1:3000
```

5. Sanity-check the opened page:
```bash
d3k agent-browser get title
d3k agent-browser snapshot -i
d3k errors
```

Practical rules:

- Prefer `127.0.0.1` for this workflow. If `localhost` hangs or flips between IPv4/IPv6 behavior, do not keep retrying browser launches.
- If `curl -I` hangs, the server is wedged even if the port appears occupied; restart the d3k server process before opening a browser.
- In `servers-only` mode there is no d3k-monitored CDP browser. Use regular `d3k agent-browser` commands, not `d3k cdp-port`.
- In sandboxed agent environments, rerun local-network checks and `agent-browser` opens outside the sandbox when sandbox networking blocks access to `127.0.0.1`.

## Browser Tool Choice

Use `agent-browser` for browser work.

Practical rule:

- Need to drive the same monitored browser session: use `agent-browser`.
- Examples:

```bash
d3k agent-browser snapshot -i
d3k agent-browser click @e2
```

To make d3k prefer one locally when it launches helper browser commands, use:

```bash
d3k --browser-tool agent-browser
```

## Fix Workflow

1. `d3k errors --context` - See errors and what triggered them
2. Fix the code
3. `d3k agent-browser open <url>` then `d3k agent-browser click @e1` to replay
4. `d3k errors` - Verify fix worked

## Creating PRs with Before/After Screenshots

When creating a PR for visual changes, **always capture before/after screenshots** to show the impact:

1. **Before making changes**, screenshot the production site:
```bash
d3k agent-browser open https://production-url.com/affected-page
d3k agent-browser screenshot /tmp/before.png
```

2. **After making changes**, screenshot localhost:
```bash
d3k agent-browser open http://localhost:3000/affected-page
d3k agent-browser screenshot /tmp/after.png
```

3. **Or use the tooling API** to capture multiple routes at once:
```
capture_before_after_screenshots(
productionUrl: "https://myapp.vercel.app",
routes: ["/", "/about", "/contact"]
)
```

4. **Include in PR description** using markdown:
```markdown
### Visual Comparison
| Route | Before | After |
|-------|--------|-------|
| `/` | ![Before](before.png) | ![After](after.png) |
```

Upload screenshots by dragging them into the GitHub PR description.
27 changes: 18 additions & 9 deletions .agents/skills/ln-build/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ A full or light scope card from `ln-scope`, the next ready card in `memory/CARDS

Extract: target behavior / objective, acceptance criteria, and verification approach.

Treat the scope card as the next implementation step inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item; the scope card is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive scope cards may land on the same branch.
Treat the scope card as the next implementation slice inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item and Linear/branch unit; the scope-card slice is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive slices may land on the same branch.

If `memory/CARDS.md` exists, treat it as a derivative execution queue, not canonical planning state. Start with the next card marked `next` or the first unfinished card in that file. If that card is already satisfied on the current branch, do **not** manufacture a no-op build commit; verify the acceptance criteria, mark the card `done` or `dropped` as appropriate, reconcile the queue, and either continue to the next honest build target or route back to `ln-scope` if no build remains.

Expand All @@ -35,7 +35,7 @@ Do not invent new planning docs, scratch histories, or alternate memory location

## Serial execution mode

When several prepared cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
When several prepared slice cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.

Loop shape:

Expand All @@ -62,18 +62,26 @@ Stop the serial loop immediately when any of these becomes true:

Translate acceptance criteria into failing tests when the change benefits from them. For bugfixes or subtle seam changes, prefer one high-leverage regression test. For trivial maintenance or doc-only work, tests may be unnecessary.

Test behavior through public interfaces, not implementation details. A good test describes what capability exists and would survive internal refactoring. Avoid tests that mock internal collaborators, assert private call order, or inspect storage directly when the public interface can prove the behavior.

Do not horizontal-slice TDD. Never write a batch of imagined tests first and then a batch of implementation. Use tracer bullets: one failing behavioral test → minimum code to pass → next failing behavioral test. Each new test should respond to what the previous cycle taught you.

Run the relevant checks. Confirm failures are meaningful. If the card is already green before any code change, treat that as evidence the queue item is already satisfied or stale — not as permission to create a ceremonial red/green cycle.

## Green

Write the minimum code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
Write the minimum coherent code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.

No speculative abstractions. Only extract when two concrete cases force it.
Honor the repo's pre-release posture: if the current schema, fixture shape, dummy data, or terminology is wrong for the model, change it and regenerate dependent artifacts rather than preserving accidental compatibility. Delete obsolete paths in the same slice when they are inside the active seam.

No speculative abstractions. Only extract when two concrete cases force it. Do not anticipate later tests or build shape-only scaffolding; let the current behavioral test pull the interface into existence.

## Refactor

With tests green, improve names, boundaries, and obvious local structure. Do not widen scope.

Refactor only while green. Keep the tests pinned to the public behavior so they protect the slice while allowing internals to move. If refactoring reveals that the test is coupled to implementation, fix the test seam before trusting it.

## Verify and commit

Run the project's verification harness. All checks must pass. If the card proved already satisfied and no code or canonical-state change was needed, do not create an empty commit.
Expand All @@ -93,10 +101,10 @@ After the build lands and verification passes, ask:

### If all answers are no

- Mark the work done in `memory/PLAN.md` **if it was tracked there**
- Mark the containing frontier done in `memory/PLAN.md` **if the build completed the frontier item**, usually by updating `Sequencing` / frontier status rather than moving definition blocks
- Update `Recently Completed` if the plan uses it
- Do **not** add new SPEC/PLAN bookkeeping just because work happened
- If the work was non-trivial, required manual verification, or leaves residual risk, record `Done / Verified / Watch` in `memory/PLAN.md` `Recently Completed` when that watch matters beyond the current session
- Do **not** add new SPEC/PLAN bookkeeping just because a slice happened
- If the slice was non-trivial, required manual verification, or leaves residual risk that matters beyond the current session, record it in the containing frontier definition or a terse `Recently Completed` entry only when it affects frontier-level re-entry

### If any answer is yes

Expand All @@ -111,8 +119,9 @@ Update only the touched traceability items.
#### Update rules

1. **PLAN**
- Mark the item done if it was tracked
- If the change closes or unblocks a frontier item, reflect that in `Active`, `Next`, or `Recently Completed`
- Mark the frontier item done if this slice completed it
- If the change closes, blocks, or unblocks a frontier item, reflect that in `Sequencing`, the affected `Frontier Definitions` entry, or `Recently Completed`
- Do not mirror detailed slice/card history into `memory/PLAN.md`; keep active execution queues in `memory/CARDS.md`

2. **Assumptions**
- evidence answered it → update to `validated` or `invalidated`
Expand Down
137 changes: 137 additions & 0 deletions .agents/skills/ln-diagnose/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
name: ln-diagnose
description: "Scientific debugging for bugs, flakes, failures, and performance regressions. Use when something is broken, throwing, failing, slow, nondeterministic, or when the user says diagnose/debug this. Builds a trusted repro loop, tests falsifiable hypotheses, installs a regression oracle, and routes durable findings back into ln-* planning."
argument-hint: "[bug report, failing command, error, or regression description]"
---

# Ln Diagnose

Debug by scientific method: trusted repro loop, falsifiable hypotheses, one-variable probes, regression oracle. Do not fix by inspection unless the cause is already proven.

## Input

Bug, failure, flake, or regression to diagnose: $ARGUMENTS

Orient first:

1. Read `memory/SPEC.md` if present; use its lexicon and live invariants.
2. Read `memory/PLAN.md` if present; identify the containing frontier item if one exists.
3. Read `HANDOFF.md` if present.
4. For runtime/UI failures, read the relevant project praxis doc before inspecting logs or driving browsers.

Write a 2-4 bullet orientation note: symptom, suspected seam, current feedback loop, proof standard.

## 1. Build the repro loop

This is the skill. A fast deterministic pass/fail loop makes the rest mechanical. No loop, no diagnosis.

Try, in rough order:

1. failing unit/integration/e2e test at the seam that reaches the bug
2. CLI/script with fixture input and asserted output
3. HTTP/curl script against a running server
4. browser automation asserting DOM, console, or network
5. replayed artifact: request, trace, event log, fixture, HAR
6. throwaway harness around the smallest subsystem that exercises the path
7. property/fuzz loop for intermittent wrong output
8. bisection/differential loop across commits, versions, datasets, or configs
9. structured HITL loop only when a human must observe or click

Improve the loop before moving on: faster, sharper assertion, less flake. Pin time, randomness, network, filesystem, and concurrency. For nondeterministic bugs, raise reproduction rate with repetition/stress until it is debuggable.

If no loop can be built, stop. Report what you tried and ask for access, logs, traces, fixtures, timestamped recordings, or permission for temporary instrumentation.

## 2. Reproduce the user's bug

Run the loop. Confirm it demonstrates the reported bug, not a nearby failure.

Capture:

- command/script/test used
- exact symptom: error, diff, timing, screenshot, console/network evidence
- reproduction rate for flakes
- saved replay artifact, if any

Lack of reproduction is allowed only as an explicit diagnosis result.

## 3. Rank falsifiable hypotheses

Generate 3-5 hypotheses before testing any one of them. Each hypothesis must predict an observation:

```md
If [cause] is true, then [probe/change] will make [specific observation] happen.
```

Prefer hypotheses that distinguish seams or invariants from `memory/SPEC.md`. Show the ranking to the user when they are present; proceed if they are AFK.

## 4. Probe one variable at a time

Every probe maps to one prediction. Prefer debugger/REPL inspection, then targeted boundary logs, then temporary assertions/counters.

Tag temporary instrumentation with a unique prefix like `[DEBUG-a4f2]`. Cleanup must be grep-able. Never "log everything and grep".

Performance branch: measure first. Establish a baseline timing/profiler/query-plan signal, then bisect or compare. Do not optimize before the measurement identifies the seam.

## 5. Choose the fix route

Before coding, choose the route:

- **Direct fix / `ln-build`** — cause is proven and the change stays inside a settled seam.
- **`ln-scope` or `ln-spec`** — the fix changes a seam, invariant, requirement, assumption, or frontier shape.
- **`ln-spike` or `ln-design`** — diagnosis answered one question but the fix shape remains uncertain.
- **`ln-review` / `ln-refactor`** — no correct regression seam exists, or architecture contributed to the bug.

Install the regression oracle before the fix when a correct seam exists. A correct seam reproduces the real bug pattern as it occurs at the call site. Shallow tests that cannot fail for the original bug are false confidence.

## 6. Cleanup and postmortem

Before declaring done:

- [ ] original repro loop no longer reproduces the bug, or non-repro is the diagnosis
- [ ] regression oracle exists and passes, or absence of a correct seam is documented
- [ ] all `[DEBUG-...]` instrumentation is removed
- [ ] throwaway harnesses are deleted or visibly temporary
- [ ] confirmed causal hypothesis is stated in the report / commit message

Ask: what would have prevented this bug? Route missing invariants, unclear seams, weak oracles, and bad module shapes into the appropriate `ln-*` skill.

## Canonical reconciliation

Reconcile only durable truth:

- New/retired assumption → update `memory/SPEC.md` §Assumptions.
- New seam-level invariant or oracle gap → update `memory/SPEC.md` or route to `ln-oracles`.
- Frontier status changed → update `memory/PLAN.md`.
- Local bug with no durable implication → no canonical update beyond tracked PLAN status.

Do not create `CONTEXT.md`, ADRs, or alternate planning docs. Canonical docs are `memory/SPEC.md` and `memory/PLAN.md`.

## Output

```md
## Diagnosis: [symptom]

**Repro loop:** [command/script/test and reproduction rate]
**Confirmed cause:** [one sentence]
**Evidence:** [key observations]
**Fix route:** [direct fix | ln-scope | ln-build | ln-spike | ln-review | ln-refactor]
**Regression oracle:** [test/harness or why unavailable]
**Canonical updates:** [none | specific SPEC/PLAN changes needed]
```

## Routing

After diagnosis, present these options to the user (use `tool-ask-question`):

| # | Label | Target | Why |
| --- | ---------------- | ------------ | --- |
| 1 | Scope the fix | `ln-scope` | The fix needs a buildable card or durable seam update |
| 2 | Build the fix | `ln-build` | The fix is settled and ready for red-green-refactor |
| 3 | Spike deeper | `ln-spike` | A hard question remains after reproduction |
| 4 | Review structure | `ln-review` | No good seam/regression oracle exists or architecture contributed |
| 5 | Back to triage | `ln-consult` | Diagnosis changed priority or scope |

Recommended: **2** only when cause and seam are proven; otherwise **1**.

---
*Adapted from [mattpocock/skills/engineering/diagnose](https://github.com/mattpocock/skills/tree/main/skills/engineering/diagnose).*
2 changes: 1 addition & 1 deletion .agents/skills/ln-grill/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ When understanding is reached, present these options to the user (use `tool-ask-
| # | Label | Target | Why |
| --- | --------------- | ---------- | --------------------------------------- |
| 1 | Write a spec | `ln-spec` | Understanding is sufficient for a spec |
| 2 | Plan slices | `ln-plan` | Problem is clear, needs slice breakdown |
| 2 | Plan frontier | `ln-plan` | Problem is clear, needs frontier breakdown |
| 3 | Scope one slice | `ln-scope` | One slice is already obvious |

Recommended: choose based on how much structure the understanding needs.
Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/ln-handoff/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ This is the critical step. Scan the conversation for volatile artifacts — info
- **Queued scope cards** already persisted in `memory/CARDS.md` — capture only what is still volatile about them: which card is next, whether the queue is still valid, and any card-level corrections that have not been written back yet
- **Plan drafts** from `ln-plan` — slice lists, ordering decisions, dependency reasoning not yet in `memory/PLAN.md`
- **Design outputs** from `ln-design` — alternative module shapes considered, the chosen shape, and rejected tradeoffs
- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-slice verification approaches, acknowledged blind spots, and whether slice verification design is complete / pending / stale relative to the code
- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-frontier or per-slice verification approaches, acknowledged blind spots, and whether verification design is complete / pending / stale relative to the code
- **Spike state** from `ln-spike` — the question, what was tried, partial findings, verdict if reached
- **Review findings** from `ln-review` — **ALL findings, not just the one being acted on.** Review debt is critical context. Name every finding, its status (addressed / in-progress / deferred), and any remaining implications. A fresh thread that only knows about the active finding will lose track of deferred review debt.
- **Refactor state** from `ln-refactor` — commit sequence, target structure, and any constraints on safe ordering
Expand Down
Loading
Loading