hashintel · lunelson · May 15, 2026 · May 13, 2026 · May 13, 2026 · May 11, 2026
diff --git a/.agents/skills/d3k/SKILL.md b/.agents/skills/d3k/SKILL.md
@@ -0,0 +1,145 @@
+---
+name: "d3k"
+description: "d3k assistant for debugging web apps"
+---
+
+# d3k Commands
+
+d3k captures browser and server logs in a unified log file. Use these commands:
+
+## Viewing Errors and Logs
+
+```bash
+d3k errors              # Show recent errors (browser + server combined)
+d3k errors --context    # Show errors + user actions that preceded them
+d3k errors -n 20        # Show last 20 errors
+
+d3k logs                # Show recent logs (browser + server combined)
+d3k logs --type browser # Browser logs only
+d3k logs --type server  # Server logs only
+```
+
+## Other Commands
+
+```bash
+d3k fix                 # Deep analysis of application errors
+d3k fix --focus build   # Focus on build errors
+
+d3k crawl               # Discover app URLs
+d3k crawl --depth all   # Exhaustive crawl
+```
+
+## Browser Interaction
+
+`d3k agent-browser` auto-connects to the active session's browser via CDP:
+
+```bash
+d3k agent-browser open http://localhost:3000/page
+d3k agent-browser snapshot -i    # Get element refs (@e1, @e2)
+d3k agent-browser click @e2
+d3k agent-browser fill @e3 "text"
+d3k agent-browser screenshot /tmp/shot.png
+```
+
+To target a different browser, run `d3k agent-browser connect <port>` first.
+
+## Codex Fresh Browser/Profile Startup
+
+Use this workflow when the user asks Codex to start d3k with a fresh browser/profile.
+
+1. Close any stale `agent-browser` daemon before launching with `--profile`. Otherwise `agent-browser` will reuse the existing daemon and print `--profile ignored`.
+   ```bash
+   d3k agent-browser close --all
+   ```
+
+2. Start the app through d3k in `servers-only` mode and keep that command running. In Codex, this is more reliable than asking d3k to launch the browser itself when a fresh profile is required.
+   ```bash
+   d3k --no-agent --no-skills --servers-only --command "npm run dev -- -H 127.0.0.1 -p 3000" --port 3000 --startup-timeout 90 --no-tui
+   ```
+
+   Adjust the package-manager command and port for the project. Prefer `--command` over `--script` when passing framework flags. For npm scripts, put flags after `--`; otherwise tools like Next.js can interpret the port as a project directory.
+
+3. Verify the server before opening more browser windows:
+   ```bash
+   curl -I http://127.0.0.1:3000
+   ```
+
+4. Open the fresh profile as a separate browser step:
+   ```bash
+   d3k agent-browser --profile /tmp/d3k-fresh-profile --headed open http://127.0.0.1:3000
+   ```
+
+5. Sanity-check the opened page:
+   ```bash
+   d3k agent-browser get title
+   d3k agent-browser snapshot -i
+   d3k errors
+   ```
+
+Practical rules:
+
+- Prefer `127.0.0.1` for this workflow. If `localhost` hangs or flips between IPv4/IPv6 behavior, do not keep retrying browser launches.
+- If `curl -I` hangs, the server is wedged even if the port appears occupied; restart the d3k server process before opening a browser.
+- In `servers-only` mode there is no d3k-monitored CDP browser. Use regular `d3k agent-browser` commands, not `d3k cdp-port`.
+- In sandboxed agent environments, rerun local-network checks and `agent-browser` opens outside the sandbox when sandbox networking blocks access to `127.0.0.1`.
+
+## Browser Tool Choice
+
+Use `agent-browser` for browser work.
+
+Practical rule:
+
+- Need to drive the same monitored browser session: use `agent-browser`.
+- Examples:
+
+```bash
+d3k agent-browser snapshot -i
+d3k agent-browser click @e2
+```
+
+To make d3k prefer one locally when it launches helper browser commands, use:
+
+```bash
+d3k --browser-tool agent-browser
+```
+
+## Fix Workflow
+
+1. `d3k errors --context` - See errors and what triggered them
+2. Fix the code
+3. `d3k agent-browser open <url>` then `d3k agent-browser click @e1` to replay
+4. `d3k errors` - Verify fix worked
+
+## Creating PRs with Before/After Screenshots
+
+When creating a PR for visual changes, **always capture before/after screenshots** to show the impact:
+
+1. **Before making changes**, screenshot the production site:
+   ```bash
+   d3k agent-browser open https://production-url.com/affected-page
+   d3k agent-browser screenshot /tmp/before.png
+   ```
+
+2. **After making changes**, screenshot localhost:
+   ```bash
+   d3k agent-browser open http://localhost:3000/affected-page
+   d3k agent-browser screenshot /tmp/after.png
+   ```
+
+3. **Or use the tooling API** to capture multiple routes at once:
+   ```
+   capture_before_after_screenshots(
+     productionUrl: "https://myapp.vercel.app",
+     routes: ["/", "/about", "/contact"]
+   )
+   ```
+
+4. **Include in PR description** using markdown:
+   ```markdown
+   ### Visual Comparison
+   | Route | Before | After |
+   |-------|--------|-------|
+   | `/` | ![Before](before.png) | ![After](after.png) |
+   ```
+
+   Upload screenshots by dragging them into the GitHub PR description.
diff --git a/.agents/skills/ln-build/SKILL.md b/.agents/skills/ln-build/SKILL.md
@@ -14,7 +14,7 @@ A full or light scope card from `ln-scope`, the next ready card in `memory/CARDS
 
 Extract: target behavior / objective, acceptance criteria, and verification approach.
 
-Treat the scope card as the next implementation step inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item; the scope card is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive scope cards may land on the same branch.
+Treat the scope card as the next implementation slice inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item and Linear/branch unit; the scope-card slice is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive slices may land on the same branch.
 
 If `memory/CARDS.md` exists, treat it as a derivative execution queue, not canonical planning state. Start with the next card marked `next` or the first unfinished card in that file. If that card is already satisfied on the current branch, do **not** manufacture a no-op build commit; verify the acceptance criteria, mark the card `done` or `dropped` as appropriate, reconcile the queue, and either continue to the next honest build target or route back to `ln-scope` if no build remains.
 
@@ -35,7 +35,7 @@ Do not invent new planning docs, scratch histories, or alternate memory location
 
 ## Serial execution mode
 
-When several prepared cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
+When several prepared slice cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
 
 Loop shape:
 
@@ -62,18 +62,26 @@ Stop the serial loop immediately when any of these becomes true:
 
 Translate acceptance criteria into failing tests when the change benefits from them. For bugfixes or subtle seam changes, prefer one high-leverage regression test. For trivial maintenance or doc-only work, tests may be unnecessary.
 
+Test behavior through public interfaces, not implementation details. A good test describes what capability exists and would survive internal refactoring. Avoid tests that mock internal collaborators, assert private call order, or inspect storage directly when the public interface can prove the behavior.
+
+Do not horizontal-slice TDD. Never write a batch of imagined tests first and then a batch of implementation. Use tracer bullets: one failing behavioral test → minimum code to pass → next failing behavioral test. Each new test should respond to what the previous cycle taught you.
+
 Run the relevant checks. Confirm failures are meaningful. If the card is already green before any code change, treat that as evidence the queue item is already satisfied or stale — not as permission to create a ceremonial red/green cycle.
 
 ## Green
 
-Write the minimum code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
+Write the minimum coherent code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
 
-No speculative abstractions. Only extract when two concrete cases force it.
+Honor the repo's pre-release posture: if the current schema, fixture shape, dummy data, or terminology is wrong for the model, change it and regenerate dependent artifacts rather than preserving accidental compatibility. Delete obsolete paths in the same slice when they are inside the active seam.
+
+No speculative abstractions. Only extract when two concrete cases force it. Do not anticipate later tests or build shape-only scaffolding; let the current behavioral test pull the interface into existence.
 
 ## Refactor
 
 With tests green, improve names, boundaries, and obvious local structure. Do not widen scope.
 
+Refactor only while green. Keep the tests pinned to the public behavior so they protect the slice while allowing internals to move. If refactoring reveals that the test is coupled to implementation, fix the test seam before trusting it.
+
 ## Verify and commit
 
 Run the project's verification harness. All checks must pass. If the card proved already satisfied and no code or canonical-state change was needed, do not create an empty commit.
@@ -93,10 +101,10 @@ After the build lands and verification passes, ask:
 
 ### If all answers are no
 
-- Mark the work done in `memory/PLAN.md` **if it was tracked there**
+- Mark the containing frontier done in `memory/PLAN.md` **if the build completed the frontier item**, usually by updating `Sequencing` / frontier status rather than moving definition blocks
 - Update `Recently Completed` if the plan uses it
-- Do **not** add new SPEC/PLAN bookkeeping just because work happened
-- If the work was non-trivial, required manual verification, or leaves residual risk, record `Done / Verified / Watch` in `memory/PLAN.md` `Recently Completed` when that watch matters beyond the current session
+- Do **not** add new SPEC/PLAN bookkeeping just because a slice happened
+- If the slice was non-trivial, required manual verification, or leaves residual risk that matters beyond the current session, record it in the containing frontier definition or a terse `Recently Completed` entry only when it affects frontier-level re-entry
 
 ### If any answer is yes
 
@@ -111,8 +119,9 @@ Update only the touched traceability items.
 #### Update rules
 
 1. **PLAN**
-   - Mark the item done if it was tracked
-   - If the change closes or unblocks a frontier item, reflect that in `Active`, `Next`, or `Recently Completed`
+   - Mark the frontier item done if this slice completed it
+   - If the change closes, blocks, or unblocks a frontier item, reflect that in `Sequencing`, the affected `Frontier Definitions` entry, or `Recently Completed`
+   - Do not mirror detailed slice/card history into `memory/PLAN.md`; keep active execution queues in `memory/CARDS.md`
 
 2. **Assumptions**
    - evidence answered it → update to `validated` or `invalidated`

diff --git a/.agents/skills/ln-diagnose/SKILL.md b/.agents/skills/ln-diagnose/SKILL.md
@@ -0,0 +1,137 @@
+---
+name: ln-diagnose
+description: "Scientific debugging for bugs, flakes, failures, and performance regressions. Use when something is broken, throwing, failing, slow, nondeterministic, or when the user says diagnose/debug this. Builds a trusted repro loop, tests falsifiable hypotheses, installs a regression oracle, and routes durable findings back into ln-* planning."
+argument-hint: "[bug report, failing command, error, or regression description]"
+---
+
+# Ln Diagnose
+
+Debug by scientific method: trusted repro loop, falsifiable hypotheses, one-variable probes, regression oracle. Do not fix by inspection unless the cause is already proven.
+
+## Input
+
+Bug, failure, flake, or regression to diagnose: $ARGUMENTS
+
+Orient first:
+
+1. Read `memory/SPEC.md` if present; use its lexicon and live invariants.
+2. Read `memory/PLAN.md` if present; identify the containing frontier item if one exists.
+3. Read `HANDOFF.md` if present.
+4. For runtime/UI failures, read the relevant project praxis doc before inspecting logs or driving browsers.
+
+Write a 2-4 bullet orientation note: symptom, suspected seam, current feedback loop, proof standard.
+
+## 1. Build the repro loop
+
+This is the skill. A fast deterministic pass/fail loop makes the rest mechanical. No loop, no diagnosis.
+
+Try, in rough order:
+
+1. failing unit/integration/e2e test at the seam that reaches the bug
+2. CLI/script with fixture input and asserted output
+3. HTTP/curl script against a running server
+4. browser automation asserting DOM, console, or network
+5. replayed artifact: request, trace, event log, fixture, HAR
+6. throwaway harness around the smallest subsystem that exercises the path
+7. property/fuzz loop for intermittent wrong output
+8. bisection/differential loop across commits, versions, datasets, or configs
+9. structured HITL loop only when a human must observe or click
+
+Improve the loop before moving on: faster, sharper assertion, less flake. Pin time, randomness, network, filesystem, and concurrency. For nondeterministic bugs, raise reproduction rate with repetition/stress until it is debuggable.
+
+If no loop can be built, stop. Report what you tried and ask for access, logs, traces, fixtures, timestamped recordings, or permission for temporary instrumentation.
+
+## 2. Reproduce the user's bug
+
+Run the loop. Confirm it demonstrates the reported bug, not a nearby failure.
+
+Capture:
+
+- command/script/test used
+- exact symptom: error, diff, timing, screenshot, console/network evidence
+- reproduction rate for flakes
+- saved replay artifact, if any
+
+Lack of reproduction is allowed only as an explicit diagnosis result.
+
+## 3. Rank falsifiable hypotheses
+
+Generate 3-5 hypotheses before testing any one of them. Each hypothesis must predict an observation:
+
+```md
+If [cause] is true, then [probe/change] will make [specific observation] happen.
+```
+
+Prefer hypotheses that distinguish seams or invariants from `memory/SPEC.md`. Show the ranking to the user when they are present; proceed if they are AFK.
+
+## 4. Probe one variable at a time
+
+Every probe maps to one prediction. Prefer debugger/REPL inspection, then targeted boundary logs, then temporary assertions/counters.
+
+Tag temporary instrumentation with a unique prefix like `[DEBUG-a4f2]`. Cleanup must be grep-able. Never "log everything and grep".
+
+Performance branch: measure first. Establish a baseline timing/profiler/query-plan signal, then bisect or compare. Do not optimize before the measurement identifies the seam.
+
+## 5. Choose the fix route
+
+Before coding, choose the route:
+
+- **Direct fix / `ln-build`** — cause is proven and the change stays inside a settled seam.
+- **`ln-scope` or `ln-spec`** — the fix changes a seam, invariant, requirement, assumption, or frontier shape.
+- **`ln-spike` or `ln-design`** — diagnosis answered one question but the fix shape remains uncertain.
+- **`ln-review` / `ln-refactor`** — no correct regression seam exists, or architecture contributed to the bug.
+
+Install the regression oracle before the fix when a correct seam exists. A correct seam reproduces the real bug pattern as it occurs at the call site. Shallow tests that cannot fail for the original bug are false confidence.
+
+## 6. Cleanup and postmortem
+
+Before declaring done:
+
+- [ ] original repro loop no longer reproduces the bug, or non-repro is the diagnosis
+- [ ] regression oracle exists and passes, or absence of a correct seam is documented
+- [ ] all `[DEBUG-...]` instrumentation is removed
+- [ ] throwaway harnesses are deleted or visibly temporary
+- [ ] confirmed causal hypothesis is stated in the report / commit message
+
+Ask: what would have prevented this bug? Route missing invariants, unclear seams, weak oracles, and bad module shapes into the appropriate `ln-*` skill.
+
+## Canonical reconciliation
+
+Reconcile only durable truth:
+
+- New/retired assumption → update `memory/SPEC.md` §Assumptions.
+- New seam-level invariant or oracle gap → update `memory/SPEC.md` or route to `ln-oracles`.
+- Frontier status changed → update `memory/PLAN.md`.
+- Local bug with no durable implication → no canonical update beyond tracked PLAN status.
+
+Do not create `CONTEXT.md`, ADRs, or alternate planning docs. Canonical docs are `memory/SPEC.md` and `memory/PLAN.md`.
+
+## Output
+
+```md
+## Diagnosis: [symptom]
+
+**Repro loop:** [command/script/test and reproduction rate]
+**Confirmed cause:** [one sentence]
+**Evidence:** [key observations]
+**Fix route:** [direct fix | ln-scope | ln-build | ln-spike | ln-review | ln-refactor]
+**Regression oracle:** [test/harness or why unavailable]
+**Canonical updates:** [none | specific SPEC/PLAN changes needed]
+```
+
+## Routing
+
+After diagnosis, present these options to the user (use `tool-ask-question`):
+
+| #   | Label            | Target       | Why |
+| --- | ---------------- | ------------ | --- |
+| 1   | Scope the fix    | `ln-scope`   | The fix needs a buildable card or durable seam update |
+| 2   | Build the fix    | `ln-build`   | The fix is settled and ready for red-green-refactor |
+| 3   | Spike deeper     | `ln-spike`   | A hard question remains after reproduction |
+| 4   | Review structure | `ln-review`  | No good seam/regression oracle exists or architecture contributed |
+| 5   | Back to triage   | `ln-consult` | Diagnosis changed priority or scope |
+
+Recommended: **2** only when cause and seam are proven; otherwise **1**.
+
+---
+*Adapted from [mattpocock/skills/engineering/diagnose](https://github.com/mattpocock/skills/tree/main/skills/engineering/diagnose).*
diff --git a/.agents/skills/ln-grill/SKILL.md b/.agents/skills/ln-grill/SKILL.md
@@ -26,7 +26,7 @@ When understanding is reached, present these options to the user (use `tool-ask-
 | #   | Label           | Target     | Why                                     |
 | --- | --------------- | ---------- | --------------------------------------- |
 | 1   | Write a spec    | `ln-spec`  | Understanding is sufficient for a spec  |
-| 2   | Plan slices     | `ln-plan`  | Problem is clear, needs slice breakdown |
+| 2   | Plan frontier   | `ln-plan`  | Problem is clear, needs frontier breakdown |
 | 3   | Scope one slice | `ln-scope` | One slice is already obvious            |
 
 Recommended: choose based on how much structure the understanding needs.

diff --git a/.agents/skills/ln-handoff/SKILL.md b/.agents/skills/ln-handoff/SKILL.md
@@ -38,7 +38,7 @@ This is the critical step. Scan the conversation for volatile artifacts — info
 - **Queued scope cards** already persisted in `memory/CARDS.md` — capture only what is still volatile about them: which card is next, whether the queue is still valid, and any card-level corrections that have not been written back yet
 - **Plan drafts** from `ln-plan` — slice lists, ordering decisions, dependency reasoning not yet in `memory/PLAN.md`
 - **Design outputs** from `ln-design` — alternative module shapes considered, the chosen shape, and rejected tradeoffs
-- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-slice verification approaches, acknowledged blind spots, and whether slice verification design is complete / pending / stale relative to the code
+- **Oracle design outputs** from `ln-oracles` — O/R/C assessment, selected oracle families, per-frontier or per-slice verification approaches, acknowledged blind spots, and whether verification design is complete / pending / stale relative to the code
 - **Spike state** from `ln-spike` — the question, what was tried, partial findings, verdict if reached
 - **Review findings** from `ln-review` — **ALL findings, not just the one being acted on.** Review debt is critical context. Name every finding, its status (addressed / in-progress / deferred), and any remaining implications. A fresh thread that only knows about the active finding will lose track of deferred review debt.
 - **Refactor state** from `ln-refactor` — commit sequence, target structure, and any constraints on safe ordering