Revert trust-zones; stay on (verb, directory) ApprovalEntry model by Aaronontheweb · Pull Request #962 · netclaw-dev/netclaw

Aaronontheweb · 2026-05-12T22:25:15Z

Closes #940

Summary

Supersedes docs(openspec): propose approval-policy-v2 #940. Removes the unmerged approval-policy-trust-zones change proposal from openspec/changes/. Live spec at openspec/specs/tool-approval-gates/spec.md (v1.5 ApprovalEntry (verb, directory) model) is untouched and remains canonical.
Cherry-picks the orthogonal fixes that landed on openspec/approval-policy-v2 after the trust-zones rewrite began (serializer binding, JsonElement arg handling, swap-daemon systemd compatibility, Windows-skip on POSIX-only path tests).
Removes the dead trust-zones primitives (GateEvaluator, TrustState, TrustStateComposer, AudienceTrustStore, AudienceTrustState) and their test fixtures + DI registration. ~2,447 lines of unused code deleted.

Why

Trust-zones (two-gate zone+verb workflow) overprompted in practice. Sequential prompts on every untrusted path plus a too-narrow safe-verbs list meant the agent had to prompt for gh issue list, xargs, every read-only diagnostic. The verb-in-folder pair model from v1.5 worked better — geography and command shape stay coupled in a single approval per (verb, dir) tuple, matching the user's mental model.

The openspec/approval-policy-v2 branch is preserved unmodified as an archive of the trust-zones experiment.

Test plan

dotnet build + full test suite green (Security 534, Configuration 288, Daemon 504, Actors 1545)
openspec validate tool-approval-gates → valid
Slopwatch clean + headers verified
Dogfood a session: shell calls prompt with the v1.5 (verb, directory) shape, not the zone+verb two-step
Memory distillation persists without protobuf serializer errors

Breaking redesign of the persistent approval store and prompt UX: - typed (verb, directory) ApprovalEntry replaces the v1 flat string list; v1 file quarantines to .v1.bak on first read (no migration) - safe-verb ∩ safe-space short-circuit (per-OS verb list, audience-aware roots from ToolAudienceProfileResolver) auto-runs read-only inspection inside session_dir / project_dir - ShellTool cwd defaults to project_dir → session_dir (today inherits daemon-process cwd) - ShellTokenizer refuses pattern extraction on bash control-flow / unbalanced input so junk fragments never persist - 5-button prompt (Once / This chat / Always here / Always anywhere / Deny) with danger styling on the destructive options; one-line resolution message - netclaw approvals trust-verb <verb> CLI for unattended/scheduled grants - AGENTS.md + tool description + failure-path hint coordinate to push the agent toward set_working_directory; eval cases (positive/negative/recovery/ schedule pre-approval) lock the behavior in

Foundation for the approval-policy-v2 storage refactor. Adds: - ApprovalEntry record (Verb required, Directory nullable for global wildcard) - ToolApprovalEntryComparer.Equals(ApprovalEntry, ApprovalEntry) overload that delegates to the existing platform-correct string comparison No behavior change: ToolApprovalStore still operates on the v1 string-based API and the existing test suite (274 tests) passes unchanged. The actual storage cutover, matcher refactor, and caller updates land in subsequent commits per openspec/changes/approval-policy-v2/tasks.md sections 1-6.

Section 1 of the approval-policy-v2 OpenSpec change. Refactors ToolApprovalStore to a typed (verb, directory) ApprovalEntry model with a versioned on-disk schema, replacing the v1 flat string list. What changed: - ToolApprovalStore now serializes/deserializes ToolApprovalData with "version": 2 and List<ApprovalEntry> per (audience, tool). - Two-step Load(): peek schema version via JsonDocument; quarantine legacy v1 files to tool-approvals.json.v1.bak; quarantine unparseable files to .invalid; in either case, return an empty store. - AddApproval/RemoveApproval/RemoveAllForTool/Snapshot operate on ApprovalEntry. New GetApprovedEntries replaces GetApprovedPatterns. - AddApproval normalizes the directory portion (trims trailing separators while preserving "/" and "C:\") so the on-disk file does not accumulate trailing-slash variants of the same logical entry. - ToolApprovalEntryComparer gains NormalizeDirectory + Normalize(entry) helpers; Equals(ApprovalEntry, ApprovalEntry) normalizes both sides. Caller updates required to compile: - ToolApprovalActor: persistent writes wrap incoming verb strings as ApprovalEntry { Verb=pattern, Directory=null } (interim semantic preserved until section 2 lands the directory-aware matcher). - ApprovalsListView/ApprovalsCommand: list output renders entries as "<verb> in <dir>" or "<verb> anywhere"; --json emits the typed ApprovalEntry shape; --json uses IndentedOmitNull so the CLI shape matches the file shape (nulls omitted). - ApprovalsCommand.WarnIfQuarantined surfaces both .v1.bak and .invalid quarantine paths with distinct remediation guidance. - ApprovalsManagerViewModel/Page: rendering uses entry.DisplayText. - ToolAudienceProfilesDoctorCheck: drops the v1 stale-path-aware pattern detection (irrelevant under v2; v1 contents quarantine on first read). Tests: - ToolApprovalStoreTests rewritten for the v2 API and gain coverage for v1 quarantine, malformed quarantine, fresh-write-after-quarantine, trailing-slash normalization, and idempotent add. - ApprovalsCommand/ApprovalsManagerPage tests rewritten to use ApprovalEntry and the new "<verb> in <dir>" / "<verb> anywhere" rendering. - Stale-pattern doctor test removed. All 3348 tests pass; dotnet slopwatch analyze reports no new violations; file-header verification passes.

Section 2 of the approval-policy-v2 OpenSpec change. Refactors the approval matcher and gate to consume v2 typed ApprovalEntry records, plumbs the candidate cwd through the execution context, and deletes the v1 string-shape inspection logic. Matcher contract changes: - IToolApprovalMatcher.ExtractDirectoryRoots is removed; the v2 matcher has no concept of "directory roots extracted from arguments." The directory half of every (verb, directory) pair is the candidate's cwd from ToolExecutionContext. - ExtractApprovalEntries renamed to ExtractCandidateVerbs and now returns pure verb chains. The v1 fallback to normalized commands or bare directory roots is gone. - IsApproved signature: now takes (toolName, args, IReadOnlyList<ApprovalEntry>, cwd) and dispatches to ApprovalPatternMatching.MatchesShellApproval which enforces verb equality + (directory null || cwd under directory) + no-symlink-segment. Cwd plumbing: - ToolExecutionContext gains a Cwd property the session pipeline sets from candidate args / WorkingContext.ProjectDirectory / session_dir (sections 4 + 5 cover the resolution side). - IToolApprovalService.GetUnapprovedPatternsAsync and RecordApprovalAsync take a cwd parameter; AkkaToolApprovalService threads it through GetUnapprovedPatterns and RecordToolApproval actor messages. - ToolApprovalContext: ApprovalEntries field renamed to CandidateVerbs; DirectoryRoots stays but is always populated empty by the gate (section 7's prompt redesign removes the field). SessionOutput, SessionOutputDto, ParentSessionApprovalBridge, PendingToolInteraction, and the protocol mapper rename consistently. Shared symlink-segment guard: - PathUtility.ContainsSymlinkSegment hoisted from ScopedFileAccessPolicy so the matcher and the file-access policy share one implementation. Tests: - Configuration.Tests, Cli.Tests, Daemon.Tests, MemoryRetrievalPoC.Tests, Search.Tests, Security.Tests (397 incl new matcher cases), and Actors.Tests (1483) all pass. - ShellApprovalMatcherTests rewritten to assert the v2 (verb, cwd, entries) semantics: global-wildcard matches anywhere, folder-scoped matches when cwd under directory, requires concrete cwd, recurses into bash -c. - ToolApprovalGateTests' v1 directory-roots assertions replaced with v2 candidate-verb assertions; DirectoryRoots is asserted empty. - ToolApprovalActor's session HashSet now uses ToolApprovalEntryComparer.Comparer so session approvals follow the same platform-correct case rules as the persistent store. - Test plumbing across the codebase passes cwd: null where the invocation isn't directory-anchored. Slopwatch clean; file headers verified.

Section 3 of the approval-policy-v2 OpenSpec change. Adds a cheap structural scan to ShellTokenizer that detects bash control-flow keywords and unbalanced quotes/brackets, refuses verb-chain extraction in those cases, and plumbs an IsMessy flag through the gate and protocol so the section 7 prompt builder can show "complex command" hints and omit persistent-grant buttons. Detection (ShellTokenizer.IsMessyCompoundCommand): - Single-pass scan that tracks quote state and (), [], {} balance. - Flags any unquoted standalone token equal to one of: for, while, do, done, then, fi, case, esac. - Flags unbalanced quotes (open without close) and unbalanced brackets (close without open OR open without close). - Cheap structural only — no semantic bash parsing. Heredocs, command substitution, and process substitution are not analyzed beyond bracket balance. SplitCompoundCommand: - Returns an empty list when IsMessyCompoundCommand returns true. The matcher's ExtractCandidateVerbs and ExtractPatterns therefore both return empty for messy commands, and ShellApprovalMatcher.IsApproved short-circuits to false (cannot auto-approve what we cannot extract). Gate / protocol plumbing: - IToolApprovalMatcher gains IsMessy(toolName, args). Default-false for DefaultApprovalMatcher and FilePathApprovalMatcher; ShellApprovalMatcher delegates to ShellTokenizer.IsMessyCompoundCommand. - ToolApprovalContext gains an IsMessy bool field. - ToolInteractionRequest, SessionOutputDto (InteractionIsMessy), PendingToolInteraction, IParentApprovalBridge.RequestApprovalAsync, and ParentSessionApprovalBridge all carry the flag through. - DispatchingToolExecutor short-circuits messy invocations to RequiresApproval regardless of empty CandidateVerbs, so the user always sees the prompt for messy input. Trade-off accepted: a bare standalone `done`/`fi`/`esac` token at the end of a command (e.g. `git fetch && echo done`) is a false positive for the cheap heuristic — the user gets the "complex command" prompt (Once/Deny only) instead of the full 4-button row. The mitigation if this bites real usage is a smarter detector that requires the keyword to appear in a syntactically meaningful position; for now the trade favors a clean approval store over coverage of edge bash idioms. One existing test (SplitCompound_preserves_quoted_operators) updated accordingly to use a different sentinel word. Tests: - ShellTokenizerTests: positive cases (for/while/if/case/unbalanced quote/unbalanced bracket), negative cases (well-formed compounds, command substitution, brace expansion, trailing commands), and guards against keyword-substring false positives ("format", "fido"). SplitCompoundCommand returns empty for messy input; still splits well-formed compounds. - ShellApprovalMatcherTests: IsMessy true for control-flow, IsMessy false for well-formed; IsApproved returns false for messy commands even when every conceivable verb is approved. - All 3367 tests pass; slopwatch clean; file headers verified.

Section 4 of the approval-policy-v2 OpenSpec change. Establishes a deterministic cwd resolution chain for shell invocations so the approval policy can reason about safe-space membership and the spawned process never inherits the daemon's cwd. Resolution order (ToolExecutionContext.ResolveShellCwd): 1. Explicit args.WorkingDirectory when the agent provided one. 2. WorkingContext.ProjectDirectory when set via set_working_directory. 3. SessionDirectory (the per-session ~/.netclaw/sessions/<id>/ scratch). 4. null only when none is available. Plumbing: - ToolExecutionContext gains ProjectDirectory and ResolveShellCwd. The session pipeline populates ProjectDirectory at context-build time from _state.WorkingContext.ProjectDirectory. - SessionToolExecutionPipeline.ExecuteToolsAsync / ExecuteSingleToolAsync / BuildToolExecutionContext gain a projectDirectory parameter; LlmSessionActor passes _state.WorkingContext.ProjectDirectory at every dispatch. - ShellTool.ExecuteAsync uses context.ResolveShellCwd(args.WorkingDirectory) to set psi.WorkingDirectory; never falls through to ProcessStartInfo's default-of-inheriting-the-daemon's-cwd, which is a footgun the approval policy cannot reason about. - DispatchingToolExecutor.AuthorizeCoreAsync calls the same resolver and writes context.Cwd before GetUnapprovedPatternsAsync, so the approval gate evaluates folder-scoped ApprovalEntry records against the same cwd the spawned process will run in. Tests: - Cwd_falls_back_to_project_directory_when_no_explicit_arg - Cwd_falls_back_to_session_directory_when_project_directory_null - Cwd_explicit_arg_overrides_project_and_session_directories - Cwd_does_not_inherit_daemon_process_directory (asserts the spawned pwd output is the resolved session_dir, not Environment.CurrentDirectory) All 3371 tests pass; slopwatch clean; file headers verified.

Section 5 of the approval-policy-v2 OpenSpec change. Adds the load-bearing friction-reduction layer: read-only verbs invoked inside declared safe spaces auto-allow without prompting, while every other combination still routes through the interactive approval gate. Three-position policy: layer 1 ToolPathPolicy hard-deny (unchanged) layer 1.5 NEW: safe-verb ∩ safe-space short-circuit (this commit) layer 2 interactive approval gate (unchanged) A candidate (verb, cwd) short-circuits to Allow when ALL hold: - verb is on the curated SafeVerbList for the current OS - cwd resolves under one of the audience-aware safe-space roots (Personal/Team: session_dir + project_dir; Public: session_dir) - no segment of the cwd path is a filesystem symlink (reparse point) Bundled lists (Netclaw.Configuration/SafeVerbs/safe-verbs.*.json embedded as resources, additive user override at ~/.netclaw/config/safe-verbs.<os>.json): Linux/macOS: ls, find, grep, egrep, fgrep, rg, cat, head, tail, wc, sort, uniq, cut, tr, awk, sed -n, file, pwd, which, stat, tree, du, df, git status, git log, git diff, git show, git branch, git remote, git rev-parse, git ls-files, git blame. Windows: dir, type, more, where, findstr, Get-ChildItem, Get-Content, Select-String, Get-Item, Test-Path, Get-Location, Resolve-Path, plus the same git read subcommands. Mutating verbs (git push, sed -i, awk -i inplace, rm, mv, etc.) are intentionally absent from both lists. sed is pinned to "sed -n" so the matcher refuses to short-circuit "sed -i". The verb-chain matcher means "awk" auto-allows but "awk -i inplace" hits the gate because ExtractVerbChain stops at the first flag. Plumbing: - New SafeVerbList (Configuration) with platform-correct comparer. - New SafeVerbLoader that reads the bundled JSON resource and merges the user override file additively. Malformed override → silently fall back to bundled defaults (the doctor will surface the problem out of band; we do not refuse to start the daemon). - New ScopedShellSafeVerbPolicy (Netclaw.Actors.Tools) mirroring ScopedFileAccessPolicy: takes (verb, cwd, context), returns a short-circuit decision; reuses PathUtility.ContainsSymlinkSegment and the audience model. - ToolAccessPolicy gains a SafeVerbList ctor parameter and runs the safe-verb check inline in CheckApprovalGate after the messy/Auto filters but before producing the approval-prompt context. The cwd it evaluates is resolved by ToolExecutionContext.ResolveShellCwd and written back to context.Cwd so the downstream gate and the spawned process agree on "where this runs." - DispatchingToolExecutor's duplicate cwd resolution removed — CheckApprovalGate now owns the write to context.Cwd. - Program.cs constructs a SafeVerbList at startup and registers it alongside ToolAccessPolicy. - NetclawPaths.SafeVerbsOverridePath returns the per-OS user file. Tests (3388 → 3398 across the suite): - SafeVerbLoaderTests: bundled defaults present per OS, user override extends additively, malformed override falls back, missing override ignored, platform-correct case rules. - ScopedShellSafeVerbPolicyTests: all seven scenarios from the spec — safe verb + project_dir → allow; safe verb + session_dir → allow; safe verb + outside → prompt; mutating verb in safe space → prompt; Public audience cannot use project_dir as safe space; symlink segment in cwd breaks short-circuit; AllShortCircuit fails-loud when any candidate is unsafe. Slopwatch clean; file headers verified.

Section 6 of the approval-policy-v2 OpenSpec change. Replaces the section 1 interim revoke parser with a strict parser for the user- visible scope labels emitted by 'list', and adds the 'trust-verb' subcommand for pre-approving global wildcards from the CLI. Revoke parser: - Accepts only the two forms 'list' emits: '<verb> in <directory>' -> (verb, directory) entry '<verb> anywhere' -> (verb, null) global wildcard - Anything else exits 1 with a clear message — bare verb input no longer silently treated as a global wildcard, so an operator typo cannot widen the intended scope. The TryParseRevokePattern helper is internal so tests can exercise the parser surface directly without the CLI shell. trust-verb subcommand: - 'netclaw approvals trust-verb <verb> [--audience <a>] [--tool <t>]' - Default audience = personal, default tool = shell_execute. - Writes a (verb, null) entry to tool-approvals.json — the global-wildcard form. Idempotent: existing entry exits zero with a "No changes" message; otherwise prints "Trusted '<verb> anywhere' for <audience> / <tool>". - This is the deliberate scriptable path the spec calls out for unattended/scheduled task pre-approval. Combined with section 5's safe-verb short-circuit it covers two distinct user goals: short-circuit (read-only verbs in safe spaces, no persistence) versus trust-verb (any verb, anywhere, persisted). Help text updated to document both new forms; quarantine note from section 1 already covers the .v1.bak case. Tests (Cli.Tests 620 -> 629): - Revoke folder-scoped form removes entry with matching directory; folder-scoped form does not match a global-wildcard entry; unrecognized pattern exits 1 with clear message. - trust-verb adds global wildcard with default audience/tool; idempotent on repeated invocation; honors --audience/--tool; missing verb argument exits 1 with usage; unknown audience flag exits 1. - Help output mentions trust-verb subcommand. TUI display already shows verb + directory via DisplayText (landed in section 1). The trust-verb-from-TUI affordance is deferred — the agent path is CLI-only and the CLI works for human operators too; revisit if friction surfaces. All 3397 tests pass; slopwatch clean; file headers verified.

Section 7 of the approval-policy-v2 OpenSpec change. Replaces the v1 Slack approval prompt (4 buttons + Patterns/Directory Roots sections) with the v2 design: 5 buttons, danger styling on the elevated decisions, cwd in the header, verbs as bullets, and a single-line resolution message. Five-button row (ApprovalOptionKeys): Once (primary) - no persist This chat (default) - session-scoped only Always here (default) - persist (verb, cwd) Always anywhere (danger) - persist (verb, null) Deny (danger) - refuse this call ApprovalOptionKeys gains ApproveEverywhere/ApproveEverywhereLabel ("Always anywhere") and renames the existing labels to the spec spelling: "Once" / "This chat" / "Always here" / "Deny". The wire keys are unchanged so persisted resolutions still decode. ApprovalDecision and ParentApprovalDecision gain ApprovedEverywhere so the runtime can distinguish folder-scoped persistence from global wildcard. LlmSessionActor maps the new button key, picks cwd-or-null based on which decision was chosen, and threads through RecordApprovalAsync. ToolApprovalActor's persistent-write path now uses msg.Cwd directly (replacing the section 1 interim that always wrote null), so: Always here -> AddApproval(audience, tool, (verb, msg.Cwd)) Always anywhere -> AddApproval(audience, tool, (verb, null)) Button-row gating by IsMessy / cwd-shallow: IsMessy -> only Once + Deny (no persistence possible) cwd shallow -> Always here omitted (This chat / Always anywhere still available; matches the tool-approval-gates "Shallow directory prevents Always here" scenario) otherwise -> all five buttons Cwd-shallow check in ToolAccessPolicy: a path with fewer than two non-empty path segments under its root (e.g. /, /etc/, C:\) cannot host a folder-scoped grant; fail-closed on Always here so an operator cannot accidentally persist a too-shallow root. Slack prompt body changes: Header (single verb): "Approve git status in /home/user/repos/foo?" Header (multi-verb): "Approve in /home/user/repos/foo?" + "• git fetch / • git rebase / • git status" Messy: "_complex command — only one-shot approval available_" The Patterns and Directory Roots sections are gone; verb display flows from CandidateVerbs (the v2 matcher's pure verb-chain extraction) with a Patterns fallback for legacy callers. Resolution message single-line format: Always here -> "Saved: <verbs> in <cwd>" Always anywhere -> "Saved: <verbs> anywhere" This chat -> "Saved for this chat: <verbs> in <cwd>" Once -> "Approved (no save)" Deny -> "Denied" Tests (Actors.Tests 1497 -> 1507): - New SlackApprovalBlockBuilderTests covers all the spec scenarios: single-verb header, multi-verb bulleted header, messy hint, five-button row with danger styling on Always anywhere + Deny, legacy Directory Roots / Patterns sections gone, and all five resolution-message branches (Always here / Always anywhere / This chat / Once / Deny). - Existing DiscordApprovalPromptBuilderTests label expectations bumped to the new spelling ("Once" / "Always here"). All 3407 tests pass; slopwatch clean; file headers verified. Discord rendering still on v1 — section 8 mirrors this design over.

Section 8 of the approval-policy-v2 OpenSpec change. Brings the Discord approval prompt to parity with the Slack v2 layout from section 7: same 5-button row, same danger styling rules, same header format, same single-line resolution message. DiscordApprovalPromptBuilder changes: - BuildButtonPrompt now renders the v2 header ("Approve git status in /home/user/repos/foo?" for single-verb, "Approve in /home/user/repos/foo?" + bulleted verbs for multi-verb) and surfaces the "complex command — only one-shot approval available" hint when IsMessy is true. - BuildResolvedPromptText emits the single-line resolution form identical to Slack: Always here -> "Saved: <verbs> in <cwd>" Always anywhere -> "Saved: <verbs> anywhere" This chat -> "Saved for this chat: <verbs> in <cwd>" Once -> "Approved (no save)" Deny -> "Denied" - GetButtonStyle applies DiscordButtonStyle.Danger to both ApproveEverywhere and Deny, mirroring Slack's danger pair. - Verb display sources from CandidateVerbs (v2) with a Patterns fallback for legacy callers. - GetDecisionLabel handles ApproveEverywhere alongside the existing keys. No Discord-side response-handler changes required: the transport decodes button values and forwards selectedKey to the session actor, and LlmSessionActor's switch (updated in section 7) already routes ApproveEverywhere for both channels. Tests (Actors.Tests 1507 -> 1514): - Existing two BuildResolvedPromptText cases bumped to assert the v2 single-line form ("Approved (no save)" / "Denied") instead of the v1 "Decision: <label>" string. - Seven new V2_ tests parallel to SlackApprovalBlockBuilderTests: single-verb header collapse, multi-verb generic header with bullets, messy-command hint with two-button row, five-button row with danger styling on Always anywhere and Deny, and the three persistent-resolution branches (Always here / Always anywhere / This chat). All 3414 tests pass; slopwatch clean; file headers verified. Both Slack and Discord approval flows now end-to-end on v2.

…hint Section 9 of the approval-policy-v2 OpenSpec change. Steers the agent toward declaring its project root early and gives it a self-correction path when a shell call is denied for cwd-outside- safe-spaces. netclaw-operations SKILL.md (bumped to v2.0.0): - Rewrote Approval Prompts around the v2 (verb, directory) model: three-layer gate (hard-deny / safe-verb short-circuit / interactive prompt), the five-button row and its scope semantics, when fewer buttons appear (messy / shallow cwd), and how set_working_directory affects prompt cadence. - Added "Pre-approving for unattended tasks (load-bearing)" section documenting the schedule-creation pre-approval flow. Replaces the v1 "run interactively first" pattern with the new 'netclaw approvals trust-verb <verb>' path; agent dialogue example shows how to ask the user before pre-approving. - Updated the Approval Requirements for Reminders/Webhooks section to point at trust-verb instead of interactive-first. - Updated the inspecting/revoking section: list emits typed entries ('<verb> in <dir>' / '<verb> anywhere'); revoke accepts those forms verbatim; trust-verb is the deliberate scriptable path. - Last-resort recovery now mentions both .v1.bak and .invalid quarantine paths. Resources/AGENTS.md (Personal+Team identity file): - New top-level "Declare Your Project Root Early (load-bearing)" section. Tells the agent its FIRST shell-related action MUST be set_working_directory when the task is project-scoped, with the consequence framing ("burns the user's attention and your token budget" if skipped). Includes a recovery rubric: when shell denial surfaces a set_working_directory hint, read it and self-correct rather than re-prompting the user. - AGENTS.public.md unchanged because set_working_directory is profile-managed away from Public. set_working_directory tool description: - Reframed from "set the project directory for this session" to "Declare your project root and expand your trusted scope." Spells out the safe-verb short-circuit consequence so the model sees *why* this tool matters for friction reduction. Removed the cd- style framing. - Added public ToolName constant so the failure-path hint logic can reference it without string duplication. Failure-path hint (SessionToolExecutionPipeline.BuildSetWorkingDirectoryHint): - Emits a one-line hint pointing at 'set_working_directory <cwd>' when: * tool is shell_execute * decision is Denied (not TimedOut, not hard-deny) * cwd is non-null * cwd is NOT inside SessionDirectory or ProjectDirectory * set_working_directory is exposed to the current audience - LlmSessionActor pre-computes setWorkingDirectoryAvailable from the ToolAccessPolicy's IsToolExposed check and threads the bool into ExecuteToolsAsync; the pipeline appends the hint to the deny-result text the model sees on its next turn. - Suppresses for non-shell tools, timeouts, hard-deny refusals, cwd already inside a safe space, and audiences without the tool — so Public sessions don't see misleading "use set_working_directory" guidance. Tests (Actors.Tests 1514 -> 1521): - Seven hint-helper unit tests cover all the spec scenarios: emitted on cwd-outside denial; suppressed when tool unavailable; suppressed for TimedOut; suppressed for non-shell tools; suppressed when cwd is inside session_dir; suppressed when cwd is inside project_dir; suppressed when cwd is null. All 3421 tests pass; slopwatch clean; file headers verified.

Cleanup pass on the approval-policy-v2 PR. Two related dead-code removals that were marked "to delete in section 7" but never trimmed. Dead v1 directory-extraction helpers: - IShellApprovalSemantics.ExtractDirectoryRoots (interface + impl). - ShellApprovalSemanticsBase.TryCreateDirectoryApprovalRoot, ExtractDisplayDirectory, NormalizeDisplayDirectory, IsRelativeDisplayPath, EnsureTrailingSeparator, CountPathSegments, GetLastShellSeparatorIndex. - PosixShellApprovalSemantics.ExtractDisplayDirectory and EnsureTrailingSeparator overrides. - ShellTokenizer.ExtractDirectoryRoots and MinDirectoryScopeDepth. - DirectoryApprovalRoot record (file deleted). - ShellTokenizerTests.ExtractDirectoryRoots_* test methods plus the AbsoluteRootCases / RelativeRootCases / WindowsAbsoluteDirectoryRootCases TheoryData properties that fed them. These were the v1 "extract directory roots from path arguments" path. v2 derives directory exclusively from ToolExecutionContext.Cwd so nothing in production calls these anymore. DirectoryRoots field plumbing: - ToolApprovalContext, ToolInteractionRequest (SessionOutput), SessionOutputDto.InteractionDirectoryRoots, the mapper round-trip, PendingToolInteraction, IParentApprovalBridge.RequestApprovalAsync, ParentSessionApprovalBridge, SubAgentActor caller, the pipeline emit site, and the TUI rendering in ChatViewModel/ChatPage. - All carriers always passed [], per the spec's "REMOVED Requirement: Directory root extraction via IToolApprovalMatcher" and the section 7 prompt redesign which moved cwd into the prompt header. Tests updated: - DaemonClientMappingTests no longer round-trips DirectoryRoots. - ParentSessionApprovalBridgeTests passes a real verb chain instead of the synthetic "/tmp/work/logs/" placeholder it was carrying. - ToolApprovalGateTests drops Assert.Empty(DirectoryRoots) calls that only existed to document the empty-after-cutover state. - ChatPage approval prompt rendering updated to the v2 button labels ("Once / This chat / Always here / Always anywhere / Deny"). 3411 tests pass (10 fewer than before because the ExtractDirectoryRoots_* test methods were removed; nothing else changed). Slopwatch clean; file headers verified.

Followups from the simplification review pass. ApprovalEntry now owns its display + parse round-trip: - Format: ApprovalEntry.FormatScope() emits "<verb> in <dir>" or "<verb> anywhere". - Parse: ApprovalEntry.TryParseScope(input, out entry, out error) is the inverse, accepting only the two user-visible forms. - Both helpers replace duplicated implementations in ApprovalsCommand.FormatEntryForList, ApprovalsCommand.TryParseRevokePattern, and ApprovalDisplayItem.DisplayText. One round-trip source of truth. Hot-path: the actor's per-message file read. - ToolApprovalActor.GetUnapprovedPatterns now snapshots the persisted approvals once per message rather than re-reading + re-parsing tool-approvals.json per candidate verb. For a compound shell with N verbs that's N file reads → 1. Hot-path: per-verb cwd / safe-roots / symlink work. - ScopedShellSafeVerbPolicy.AllShortCircuit hoists Path.GetFullPath, ResolveSafeSpaceRoots, and ContainsSymlinkSegment out of the per-verb loop. The cwd doesn't change between verbs in the same invocation, so a 4-verb compound now does 1 path-normalize + 1 symlink scan instead of 4. ShortCircuitsApproval becomes a thin wrapper that forwards to AllShortCircuit. Wire ApprovalOptionKeys.IsDangerStyled in both channel builders instead of inlining the same `Deny or ApproveEverywhere` switch arm in two files. Consolidate WorkingDirectory/Command extraction in ShellApprovalMatcher to call ToolArgumentHelper.GetString — the helper already handles the PascalCase ↔ camelCase round-trip via key normalization, so the inline two-key TryGetValue duplication was needless and slightly inconsistent with the rest of the codebase. 3411 tests pass; slopwatch clean; file headers verified.

…approval Section 10 of the approval-policy-v2 OpenSpec change. Adds a new "Approval Policy v2" eval category covering the four behavioral guardrails introduced in sections 5 + 9: - approval_set_working_directory_positive Project-scoped prompt mentions a repo path. Asserts the agent calls set_working_directory before any shell tool call into that tree (order check: SWD line < first shell_execute line). - approval_set_working_directory_negative Unrelated prompts ("what's 2+2?", "explain a hash table"). Asserts the agent does NOT preemptively call set_working_directory just because AGENTS.md mentions it. - approval_recovery_hint (multi-turn) T1 plants the cwd-outside-safe-spaces denial hint into the conversation; T2 asserts the agent self-corrects by calling set_working_directory rather than re-prompting the user. Scripting an actual denial inside the eval container would require a preconfigured project_dir mismatch we don't have plumbing for; the hint-shape feed exercises the same self-correction code path. - approval_schedule_pre_approval User asks to schedule a daily reminder using the freshdesk verb. Asserts the agent calls `netclaw approvals trust-verb freshdesk` via shell_execute as part of schedule setup. Task 10.1 cross-checked: "Pre-approving for unattended tasks (load-bearing)" section in netclaw-operations SKILL.md (added in section 9) covers the agent-driven trust-verb flow with example dialogue. No additional skill text needed. Task 10.6 (run the suite, document baseline pass rate) is deferred to local execution — the suite needs NETCLAW_EVAL_PROVIDER_* env + Docker daemon container which only Aaron has set up. Listed in acceptance gates. `bash -n evals/run-evals.sh` parses cleanly.

Folds the change's delta specs into main specs and archives the change to openspec/changes/archive/2026-05-08-approval-policy-v2/. - tool-approval-gates: rewrites shell pattern matching, persistent approval storage, and directory-root approvals around the v2 ApprovalEntry model; adds requirements for safe-verb short-circuit, five-button prompt, single-line resolution, and bash control-flow refusal. - session-cwd: adds shell tool cwd defaults, failure-path hint, and the safe-space expansion contract that set_working_directory now carries; modifies set_working_directory tool to reflect the new framing. Also fixes a pre-existing structural defect (spec was authored with the delta '## ADDED Requirements' heading instead of '## Purpose' + '## Requirements'). - netclaw-cli: replaces the Operator CLI for persistent tool approvals requirement with the v2 version (scope-labeled list, strict revoke parser, trust-verb subcommand, .v1.bak quarantine note).

Two bugs together meant evals never tested in-repo skill changes: 1. The skill scanner expects '<skills>/.system/<skill-name>/SKILL.md' but the eval script copied to '.system/files/<skill-name>/SKILL.md' (matching the repo's feeds/ layout, not the runtime layout). The local copies were silently invisible. 2. The daemon then synced from the live R2 feed, which ships the last released set of skills. So evals always exercised whatever was published, not the source tree. Result: a v2 'netclaw-operations' SKILL.md bumped in this PR was a no-op for evals — the model in the container saw the older 1.x copy from R2 and missed the new approval/trust-verb guidance entirely. Fix: - Copy '.../files/<skill>/' → '$EVAL_HOME/skills/.system/<skill>/'. - Set 'NETCLAW_SkillSync__DisableSystemSkillSync=true' in the eval container so the daemon doesn't fetch+overwrite from the live feed. Confirmed via re-run: skill_load("netclaw-operations") now succeeds inside the eval container (previously: "Skill not found"). The new v2 approval cases ('approval_set_working_directory_positive', 'approval_schedule_pre_approval') visibly improve once the model can see the bumped skill content.

Two cases had genuine eval-design problems independent of the v2 implementation, surfaced once N=5 baselines stabilized. approval_set_working_directory_positive Old prompt: 'I'm working on the Netclaw repository at /tmp. List the files in that directory and tell me what's there.' This is ambiguous between sustained project work (which the v2 spec says SHOULD pre- declare) and a one-shot directory listing (which the spec explicitly says should NOT pre-declare). The model going straight to shell was arguably a correct read of the prompt, not a guidance failure. New prompt makes the sustained-work signal explicit ('debugging session... multiple shell commands across the tree'). approval_recovery_hint Old structure was multi-turn: T1 fed a denial message and instructed 'do not call any tools yet', T2 said 'now call the tool.' Several failure runs showed the model getting stuck in T1's no-tools conditioning and refusing T2 ('I will not call any tools.'). That tests prompt-flip resilience, not recovery-hint comprehension. Rewrote as a single conversational prompt that delivers the denial hint and asks 'how should I unblock this?' which is what a real recovery turn looks like. Side note for the PR: full N=5 baselines on local provider show the inference endpoint is intermittently flaky (Dutchman-style stream stalls — 'out=3 tok_s=8' instead of normal 'out=80 tok_s=27'), which produces eval variance unrelated to either the v2 implementation or these prompts. Aaron will validate via binary- swap before merging.

…ually scopes ToolApprovalContext was missing a Cwd field, so SessionToolExecutionPipeline emitted ToolInteractionRequest with Cwd=null even though ToolExecutionContext already had it resolved. PendingToolInteraction.Cwd was therefore always null on the session-actor side, and the persistence path var persistCwd = decision == ApprovedEverywhere ? null : pending.Cwd; silently turned every "Always here" click (ApprovedAlways) into a global wildcard ("Always anywhere"). Confirmed in a live session: tool-approvals.json contained nine entries, all with directory=null, despite most coming from "Always here" button clicks. The fix is small but the bug was load-bearing — folder-scoped trust is the whole point of the v2 button row. Without cwd flowing through, the "here" button was UX theater. - Add 'Cwd' to ToolApprovalContext (string?, defaults null). - Resolve cwd up-front in ToolAccessPolicy.CheckApprovalGate so it's populated for every shell approval path (not only the safe-verb short-circuit branch). - Pass ctx.Cwd into ToolInteractionRequest. - Regression test (SessionToolExecutionPipelineTests.Approval_request_ propagates_cwd_from_approval_context) asserts the emitted request carries the cwd through.

Three bug-fix pillars on top of v2 (which has not deployed beyond a single dogfood operator): verb extraction emits the command head only, the first path-like argument becomes the candidate's effective directory, and pure side-effect clauses (echo / printf / true / false) authorize once but do not pollute persistence. The dogfood evidence (D0AC6CKBK5K/1778303523.861279) showed nine 'tool-approvals.json' entries that never matched future calls because each call's path got baked into the persisted verb. The fix reframes the (verb, directory) pair so verbs are reusable and paths declare scope implicitly. Persistence shape is unchanged. Includes proposal.md, design.md, and a delta spec for the 'tool-approval-gates' capability with all five MODIFIED/ADDED requirements covering verb classification, effective-directory matching, file-parent inference, multi-path tiebreak, and the side-effect skip list. Tasks are next.

…nontheweb/netclaw into openspec/approval-policy-v2

Seven sections covering path classification, matcher updates, persistence, side-effect skip list, agent guidance, tests, and the sync/archive flow. Acceptance gates include a manual binary-swap check that explicitly validates the dogfood failure mode (find /repo → Always here → find /repo/sub auto-runs).

Implements sections 1-4 of approval-policy-path-extraction. The verb half of (verb, directory) is now the command head only; the directory half comes from the first path-like argument when present, falling back to cwd. Folder-scoped trust now compounds across deeper paths. Section 1 — Path classification + verb-only extraction - Add IsPathToken predicate (conservative: /, ~/, ./, ../ prefixes or exact ~, ., ..). Internal-slash regexes and URLs are NOT paths. - Strip the path-aware-append from ExtractVerbChain. For path-aware verbs (cat, grep, find, ls, ...) we now stop at the first verb so call-specific paths don't bake into the persisted verb chain. - New ExtractFirstPathArgument applies the file-parent rule via Path.HasExtension so cat ~/.bashrc resolves to "~", not "~/.bashrc". - New ApprovalCandidate(Verb, Directory?) record + ExtractCandidates on IToolApprovalMatcher. Section 2 — Matcher uses effective directory - New MatchesShellApproval overload taking (candidateVerb, candidateDirectory, cwd, approvedEntries). - Effective directory = candidateDirectory ?? cwd; relative paths resolve against cwd via PathUtility.ExpandAndNormalize. - Backwards-compat overload retained for v2.0 callers. - Symlink-segment guard runs against the resolved effective directory. Section 3 — Persistence on Always here uses effective directory - New PersistApprovalCandidatesAsync in LlmSessionActor groups candidates by effective directory and writes one RecordApprovalAsync call per bucket. (find /repo + ls /repo) → one (verb=find, dir=/repo) + one (verb=ls, dir=/repo) entry. - ApprovedEverywhere collapses all candidates to (verb, null). - ApprovedSession uses the same per-bucket grouping with persistent=false. Section 4 — Side-effect skip list - IsPureSideEffect(ApprovalCandidate) — true when verb is one of {echo, printf, :, true, false} AND Directory is null. Redirect detection is implicit: a redirect target shows up as the candidate's directory via ExtractFirstPathArgument, so echo X > /tmp/log persists as (echo, /tmp) instead of being skipped. - LlmSessionActor's persistence loop drops side-effect candidates entirely. They're authorized for the current call by the decision but don't pollute the store. Threading + protocol - ApprovalCandidate flows ToolApprovalContext → ToolInteractionRequest → PendingToolInteraction → persistence. Existing CandidateVerbs (verb-only) is now derived from Candidates for the renderers that bullet-list verbs in the prompt body. Tests - 12 new ShellApprovalMatcherPathExtractionTests covering verb-only extraction, file-parent rule, no-path-arg, compound clauses, effective-directory matching, side-effect skip. - ShellTokenizer tests updated for verb-only ExtractVerbChain. - ToolApprovalGateTests updated for new candidate shape. - Full suite: 3,463 tests passing across 7 projects, 0 failures. Deferred (binary-swap validation): - 3.2 per-candidate shallow-path skip + resolution-line note - 3.3, 3.4 LlmSessionActor end-to-end persistence tests - 4.4 resolution-line "Saved" vs "Authorized for this call" - 4.5 same — matcher-level coverage exists, end-to-end deferred

Section 5 of approval-policy-path-extraction. The runtime now treats path arguments as implicit scope declarations, and the agent-facing guidance has to match or the model will keep over-using set_working_directory and under-using folder-scoped trust. - netclaw-operations SKILL.md (bumped 2.0.0 → 2.1.0): - Rewrote `verb` / `directory` definitions to reflect the v2.1 extractor: verb is the command head only; directory comes from the first path argument (with file-parent rule) when present, else cwd. - Added "Folder-scoped trust compounds" explanation so the model knows (find, /repo) covers find /repo/sub. - Added the side-effect-only clauses note (echo / printf / : / true / false get authorized but not persisted). - Reframed the troubleshooting suggestions: read-only re-prompts in a repo usually mean the commands aren't carrying a path arg; set_working_directory is the fix for that case specifically. - Resources/AGENTS.md: - Renamed "Declare Your Project Root Early" to "Declaring Project Scope" because the imperative no longer applies universally. - Path arguments now declare scope implicitly. set_working_directory is repositioned as the fallback for sessions running multiple commands without explicit paths (REPL work, git status loops, make -C / git -C flag-hidden paths). - Kept the consequence framing ("burns the user's attention and your token budget") for the case where the agent skips scope declaration entirely. - SetWorkingDirectoryTool description: - Added a one-sentence note that shell commands with path arguments declare scope automatically, framing the tool as most useful for multi-command sessions without explicit paths.

Two corrections after applying sections 1-5 of approval-policy-path- extraction: - Revert the netclaw-operations SKILL.md version bump back to 2.0.0. v2.0.0 hasn't shipped to the live skills feed yet, so there's no installed-base to differentiate from. Bumping is only meaningful once the previous version has shipped. - Mark sections 6 (evals) and 7 (sync/archive) tasks as deferred- with-rationale rather than open. Eval cases for click-driven approval persistence can't be scripted from the eval framework (which only checks model output text), and the existing flaky inference provider blocks 6.1 re-runs anyway. Section 7 needs manual binary-swap validation before sync/archive.

Aaron's first dogfood click on Always anywhere crashed the turn (corr id b48cc740). Trace: ToolApprovalRequiredException: Tool 'shell_execute' requires approval at DispatchingToolExecutor.AuthorizeCoreAsync at DispatchingToolExecutor.ExecuteAsync at SessionToolExecutionPipeline.ExecuteToolAttemptAsync at SessionToolExecutionPipeline.ExecuteSingleToolAsync at SessionToolExecutionPipeline.ExecuteToolsAsync Sequence: 1. Compound command "cd ~/repo && git status && echo \"---\" && git remote -v && echo \"...\" && git branch ... && echo \"...\" && git log ..." prompts. 2. User clicks Always anywhere. Persistence runs: - 5 entries written: cd, git status, git remote, git branch, git log - echo skipped per side-effect rule (correct). 3. Pipeline retries the original call. 4. AuthorizeCoreAsync re-checks ALL 6 candidate verbs (incl. echo) against the store. echo isn't there → unapproved.Count > 0 → RequiresApproval → throw. 5. The throw escapes the catch at SessionToolExecutionPipeline.cs:269 because we're already inside that catch handling the ORIGINAL exception. Sibling catches (line 381 / 393 / 404) don't apply once execution is inside another catch. ExecuteToolsAsync's outer Exception handler tells the actor to FailCurrentTurn. Root cause: the side-effect skip handled persistence but not match- time. Side-effect verbs need to be treated as authorized regardless of whether they're in the store, so the matcher and persistence agree on what's reachable. Fixes: - DispatchingToolExecutor.AuthorizeCoreAsync filters IsPureSideEffect candidates from the unapproved-check verb list, using approvalContext.Candidates when available. If every candidate is side-effect-only, the decision is Allow up-front. - ShellApprovalMatcher.IsApproved skips side-effect candidates in the per-candidate match loop, mirroring the same intent at the matcher boundary used by GetUnapprovedPatternsAsync. - ShellTokenizer.SingleTokenSideEffectVerbs added; ExtractVerbChain now caps at one token for these (in addition to PathAwareVerbs) so "echo hello" resolves to verb "echo" instead of a 2-token chain that the side-effect skip wouldn't match. Aaron's exact dogfood case ("echo \"---X---\"") was already capped because --- starts with - and breaks the loop on its own; the new cap protects against shapes like build-script "echo done"-equivalents. Tests: - New ShellApprovalMatcher regression: IsApproved_treats_side_ effect_candidates_as_authorized covers the retry-after-Always- anywhere case end-to-end at the matcher level. - New ExtractCandidates_caps_echo_at_one_token ensures the tokenizer cap holds. - DispatchingToolExecutorTests.{One_time_approval_*, Session_ approval_*} updated to use git status (non-side-effect) so they still exercise the approval flow rather than auto-allowing. Full suite: 3,465 tests passing across 7 projects.

Tactical patches for the v2.1 (verb, directory) approval model captured here in case the work is useful later. Likely superseded by the upcoming trust-zones architectural rewrite — see openspec/changes/approval-policy-path-extraction/CONTINUATION.md. - ToolAccessPolicy: AllCandidatesResolveToSessionScratch helper + new BuildApprovalOptions argument so "Always here" gets pruned when every candidate's effective directory is the session_dir (dead-on-arrival folder-scoped grant). - LlmSessionActor.PersistApprovalCandidatesAsync: belt-and- suspenders persistence guard — if a candidate's effective directory is the session_dir, skip it from persistence even if the button somehow got shown. - Slack/Discord approval builders: ResolveHeaderLocation prefers the candidate's extracted directory over cwd. Single distinct directory across candidates → show that. Multiple → show "<N> directories". None → fall back to cwd, but render "this session" instead of the literal session_dir path so operators see the meaningful frame. - ShellApprovalMatcher tests: ExtractCandidates_extracts_cd_target_ as_directory regression for the cd-target-as-directory header fix (the live-session bug Aaron flagged where the header showed session_dir for a "cd /repo && git status" compound).

Long working session converged on a fundamental rewrite of the approval architecture that invalidates large parts of the existing approval-policy-path-extraction proposal/design/specs/tasks. Captured here so the next session can pick up cleanly without relitigating the architectural decisions: - CWD has zero role in approval decisions — only psi.WorkingDirectory for the spawned subprocess. - WorkingContext.ProjectDirectory + set_working_directory tool both deleted. Trust zones (audience config + session_dir + operator- extended) are the trust anchor; they're configuration, not state. - Two independent gates: zone gate (per-path geographic check) + verb-pattern gate (per-verb action check). Read-only verbs auto- pass ONLY for paths inside trusted zones. - (verb, directory) ApprovalEntry replaced by two independent stores: trusted-zones (per-audience directory globs) and approved-patterns (per-audience verb patterns). - Project_instructions auto-injection removed; agent reads project context on demand via file_read. - cd-in-compound parsing useful for layer-2 directory extraction + display, but never mutates session state. - Multi-path commands resolve naturally — each path checked independently against zones. Tactical findings from sst/opencode source (tree-sitter-bash AST parsing, BashArity dictionary, two-gate model with external_directory + bash patterns) included for the implementation phase. Strategic model is settled; implementation tactics still TBD. Twelve open design questions captured for the next session to answer before re-drafting the OpenSpec change artifacts. Plus a list of decisions to NOT relitigate. Live evidence from today's dogfood sessions referenced inline so the rewrite has real motivating examples to point at. The work in the previous commit (wip: session-scratch hide + target- dir header) is tactical patching of the v2.1 model and likely won't survive the rewrite — committed for git history reference, will probably be replaced wholesale.

Archive approval-policy-path-extraction (v2 (verb, directory) cross-product model) and create approval-policy-trust-zones with a wholesale rewrite of the approval architecture. Key locked decisions captured in the new artifacts: - Three-layer gate: hard-deny + zone gate (per-path) + verb-pattern gate (per-command-shape). Two independent persistence stores per audience (trustedZones, verbPatterns), colocated in tool-approvals.json. - Verb pattern format: globs (git push *, rm /tmp/*). - Sequential 4-button prompts on both gates: [Once / Session / Always / Deny]. Identical button shape on both prompts; multi-path commands batch into one zone prompt. - Read-only verbs auto-pass only inside trusted zones (tightening). - Trust zones are configuration, not state. Agent cannot extend trust by issuing commands. - Externalize shell parsing to new ShellSyntaxTree library (github.com/Aaronontheweb/ShellSyntaxTree, NuGet ShellSyntaxTree). Bash-first, multi-shell-ready via IShellParser interface. Develop in parallel with sibling ProjectReference, swap to PackageReference on v0.1. - BREAKING: delete set_working_directory tool and WorkingContext.ProjectDirectory. Cwd no longer factors into approval matcher logic. - BREAKING: delete project-instructions auto-injection. Agent reads .netclaw/AGENTS.md / AGENTS.md / CLAUDE.md / CONTEXT.md on demand via file_read per explicit lookup discipline added to Resources/AGENTS.md. - TUI extension: netclaw approvals gains [Z]ones and [V]erbs tabs. - No data migration; existing tool-approvals.json archived to .v2-discarded.bak on first daemon start. Spec deltas: - tool-approval-gates: 8 ADDED, 5 MODIFIED, 7 REMOVED requirements - session-cwd: 6 REMOVED requirements (capability emptied) - project-instructions: 2 REMOVED requirements (capability emptied) Tasks (~85 verifiable) span ShellSyntaxTree library buildout, Netclaw consumption, two-store persistence, in-memory session-scope grants, three-layer gate evaluator, ToolApprovalWorkflow state machine, Slack and Discord adapter rendering, CLI/TUI updates, deletions, agent guidance refresh, eval suite updates, docs, and quality gates. Old change preserved at openspec/changes/archive/2026-05-10-approval- policy-path-extraction/ for git history reference; CONTINUATION.md captures the design pivot rationale.

Consumes the ShellSyntaxTree NuGet package (the externalized shell parser from github.com/Aaronontheweb/ShellSyntaxTree). Adds: - PackageVersion entry for ShellSyntaxTree 0.1.0-alpha in Directory.Packages.props - PackageReference in Netclaw.Security.csproj - IShellParser DI registration via SecurityServiceExtensions.AddShellParser() (resolves the package's BashParser as the IShellParser implementation) - 7 integration smoke tests confirming the package's public API contract: DI resolution, simple verb extraction, multi-token verb (git push) collapse via BashArity, compound-clause splitting on &&, cd-in-compound cwd attribution to subsequent clauses, unparseable safe-fail without throwing, and dynamic-token skip on unresolved env vars. These integration tests are the per-PR canary that fires before the larger parser-version-bump CI gate (task 14.7 of approval-policy-trust- zones) runs the entire ShellSyntaxTree corpus through Netclaw's live matcher. Contract regressions in the parser package surface here first. Coexists with the legacy ShellTokenizer / ShellCommandPolicy code paths which remain in use by Netclaw consumers. Migration of those consumers to the AST-based matcher lands in subsequent commits (GateEvaluator, trust-zones tasks 2.3-2.5). Implements tasks 2.1, 2.2, and partial 2.5 of approval-policy-trust- zones (we're on PackageReference from day one because the package shipped before sibling ProjectReference was needed). Build: green. 518/518 security tests pass (7 new integration tests). Slopwatch clean. File headers present.

Picks up the upstream fd-dup redirect fix (`2>&1` no longer produces phantom `<cwd>/&1` file targets). Public API surface unchanged; existing Netclaw consumers that already skip redirects with IsDynamicSkip = true get correct behavior with no code changes. See https://github.com/Aaronontheweb/ShellSyntaxTree/releases/tag/0.1.1-alpha for release notes. Build: green. 7/7 ShellSyntaxTree integration tests pass against the new version.

Safe-verbs lists (read-only verb chains the approval gate auto-allows inside trusted zones) are now loaded exclusively from the embedded resource in Netclaw.Configuration. The previous additive-merge path from ~/.netclaw/config/safe-verbs.<os>.json is removed entirely. Threat model: adding a verb to the safe-verbs list loosens the policy (more silent auto-pass cases). A prompt-injected agent that could extend the list could widen its own read-only auto-pass surface without prompting the user. The ToolPathPolicy ConfigDirectory deny (commit 43f7e86) already blocks agent writes to that path, but defense-in-depth says: if the file isn't read at all, no future regression can re-open the vector. Changes: - NetclawPaths.SafeVerbsOverridePath property deleted. - SafeVerbLoader.Load(NetclawPaths?) public overload deleted; replaced with parameterless SafeVerbLoader.Load() that returns the bundled defaults for the current OS. - SafeVerbLoader.TryLoadOverride private method deleted along with the merge logic. - internal Load(bool isWindows, string? overrideFilePath) signature simplified to Load(bool isWindows). - Program.cs caller updated. - $comment strings in safe-verbs.{linux,windows}.json updated to reflect the immutable-at-runtime posture and the trust-zones vocabulary replacing v2 safe-space terminology. Tests: - SafeVerbLoaderTests rewritten: dropped 4 override-merge scenarios; kept 4 bundled-defaults coverage tests; added 1 architectural assertion that no public Load overload accepts a string or NetclawPaths parameter (compile-fail contract preventing future regressions). Widening the safe-verbs list now requires a code review and a daemon release rather than a config edit. The agent has no runtime path to extend its own auto-pass surface. Implements task 5.1 refinement of approval-policy-trust-zones (the read-only-verb-list source decision). Aaron's call: "should just have these be embedded resources inside the assembly - That way they're safe from modifications." Build: green. Full solution test suite passes (Cli 631, Daemon 504, Actors 1522, plus Configuration/Security/Channels). Slopwatch clean. File headers present.

Adds cd, chdir, pushd, popd to safe-verbs.linux.json. Adds the same plus PowerShell equivalents (Set-Location, Push-Location, Pop-Location) to safe-verbs.windows.json. Rationale: a cd clause has a path arg (the cd target) that the zone gate evaluates. If the target is untrusted the zone gate prompts; if trusted (or after user approval), the verb gate should auto-pass for the cd itself — cd has no side effects beyond changing the spawn cwd of subsequent compound clauses, which the parser already attributes to those clauses' path args. Making cd a read-only verb in the safe- verbs list lets the verb-pattern gate apply the existing "read-only in trusted zone → auto-pass" rule uniformly, instead of carving out a separate verb-class exception in code. This is the data-driven implementation of the design choice locked in the GateEvaluator design conversation: cd auto-passes at Layer 3 when zone gate passes. Encoded in the shipped read-only list rather than as a code-level rule, so the policy is visible in one place and follows the same immutable-at-runtime posture as the rest of the safe-verbs. Implements task 5.1 refinement of approval-policy-trust-zones (per the GateEvaluator design conversation: "add cd and other CWD verbs to the shipped read-only verb list so the rule is explicit/data-driven"). Build: green. 5/5 SafeVerbLoader tests pass.

…rd-deny) Implements the core decision engine for the trust-zones approval architecture. Composes the three layers (hard-deny → zone gate → verb- pattern gate) into a single GateEvaluation the workflow consumes. New types: - TrustState: composed view of all four trust sources for one audience evaluation (audience baseline + persisted zones + session zones + session_dir, plus persisted verb patterns + session verb patterns + shipped read-only verb list). Normalizes globs at construction time (~ expansion, trailing-slash strip, /* → directory). Exposes IsPathInTrustedZone (path-prefix recursive with directory-boundary safety), IsReadOnlyVerb (against safe-verbs list), and MatchesVerbPattern (verb-chain prefix + arg-glob suffix). - GateEvaluation + ClauseGateResult + ZonePromptInfo + VerbPromptInfo: output records the workflow walks. OverallGateDecision enum (Approved | NeedsPrompt | HardDenied), plus per-layer decision enums (ZoneGateDecision, VerbGateDecision) with explicit Skipped / NotEvaluated states for short-circuit cases. - GateEvaluator: takes ShellCommandPolicy + IShellParser in constructor. Two Evaluate overloads (raw-command string or pre-parsed AST). Per- clause logic: - Layer 1 short-circuits the clause to Skipped/NotEvaluated. - Layer 2 extracts path args + redirect targets + cd-in-compound attribution; checks each against TrustState; collects untrusted paths. - Layer 3 unconditional read-only auto-pass (the zone gate handles geography; read-only is geography-independent at the verb layer); pattern match against verbPatterns; otherwise propose <verb-chain> * for the Always button. - Cross-clause aggregation: dedup'd union of untrusted paths into a single ZonePromptInfo (multi-path batching for the trust-all-or-nothing button per the locked design); first verb- prompt-needing clause surfaces its proposed pattern in VerbPromptInfo. - Unparseable input routes to safe-fail: hard-deny still consulted against raw source; ZonePrompt carries the raw command as a single untrusted path; VerbPrompt populated for Once/Deny-only rendering (workflow uses GateEvaluation.IsUnparseable to constrain options). Tests (21 covering every spec scenario): - Hard-deny short-circuit (first clause, second clause, against unparseable input via rawText fallback). - Read-only verb in trusted zone → silent. - Read-only verb outside zone → one zone prompt. - Mutating verb in trusted zone → one verb prompt. - Mutating verb outside zone → both prompts. - Multi-path zone batching (cp /etc/foo /var/log/bar → batched prompt). - Dedup across clauses (cd /etc && cat /etc/hosts → no path duplicates). - cd-in-compound attribution propagates target to subsequent clauses. - cd to untrusted dir → zone prompt only (read-only auto-pass at verb). - Mixed-zone clause with read-only verb → zone prompt only. - Persisted + session verb patterns → silent pass. - Pattern doesn't match different verb (git pull won't match git push *). - Session-scope zones → silent pass. - Redirect target outside zone → zone prompt only. - Dynamic-skip arg ($VAR) excluded from zone gate. - Unparseable safe-fail (zone + verb prompts populated, IsUnparseable flag set, raw command carried into prompts). - Audience wire-value surfaces in prompt info. Notable design choices captured in code comments: - Read-only verb auto-pass is UNCONDITIONAL at Layer 3 (not conditional on all-paths-trusted) — the zone gate handles geography, the verb gate handles action shape. Read-only actions are geography-independent at the verb layer. This produces the spec-mandated "user sees exactly one prompt" behavior for read-only verbs on mixed-zone clauses. - TrustState normalizes zones once at construction; matcher does no per-call resolution beyond path-prefix comparison. - IsCwdAttribution args are excluded from MatchesVerbPattern but included in ExtractClausePaths — they're path context for zone gating but not args the verb pattern should match against. Implements task 5.5 of approval-policy-trust-zones (the GateEvaluator itself plus its supporting record types). The trust-state composition plumbing (loading baseline from netclaw.json, wiring session zones from LlmSessionActor, etc.) lands in subsequent commits when the workflow is wired up. Build: green. 539/539 security tests pass (21 new GateEvaluator tests). Slopwatch clean. File headers present.

Brings in two upstream fixes: - netclaw-dev#948 systemd unit PATH fix + SystemdUnitPathDoctorCheck - netclaw-dev#950 doctor warning suppression for explicit Personal posture Conflict resolution: netclaw-operations SKILL.md version bumped to 2.1.0 (combining the 2.0.0 path-extraction-era content already on this branch with dev's 1.28.0 systemd table-row addition). The skill content itself will be rewritten in task 11.4 of the trust-zones change when the implementation completes; this merge just preserves both sets of operational guidance for the interim. Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1522, plus all others). Slopwatch clean. Headers present.

…sion) Implements task 5.7 of approval-policy-trust-zones (the composer that feeds GateEvaluator's TrustState input). Pulls together all four trust sources for one gate evaluation: TrustState = audience-baseline-zones (netclaw.json trust profile ReadFiles.Roots) ∪ persisted zones (AudienceTrustStore.GetTrustedZones) ∪ session-scope zones (from LlmSessionActor in-memory, passed in per call) ∪ session_dir (always trusted) — verb patterns from AudienceTrustStore + session list — shipped SafeVerbList (embedded resource) The composer is captured at DI registration time with the audience profiles, the AudienceTrustStore singleton, and the SafeVerbList. Per- call inputs (session_dir, session-scope grants) flow through the Compose() method. No mutable state inside the composer — safe to register as a singleton and call concurrently across sessions. Files added: - src/Netclaw.Security/TrustStateComposer.cs (~85 LOC) - src/Netclaw.Security.Tests/TrustStateComposerTests.cs (10 tests) Tests cover: - Audience-specific baseline zones come from the matching profile (Personal vs Team vs Public). - Persisted zones from AudienceTrustStore overlay correctly. - Session-scope zones passed per-call overlay correctly. - session_dir is always trusted regardless of audience. - Persisted + session verb patterns are surfaced. - SafeVerbList is reachable via IsReadOnlyVerb. - Tilde expansion in zones uses the composer's home override. - Unknown audience values throw with a clear message. Also adds tasks.md updates capturing the scripts-as-units-of-approval guidance (task 11.3 expansion + new task 11.5) per the design conversation: agent persists multi-step / reusable scripts into the project workspace (already a trusted zone), proposes execution as a single approval, and reminders/webhooks fire bash <workspace>/<script> against the same pre-approved verb pattern. Slack/Discord adapters will render script contents inline in the approval prompt body when the command is a script invocation. Build: green. 549/549 security tests pass (10 new). Slopwatch clean. File headers present.

Wires the trust-zones-rewrite components (HardDenyOverridesLoader, AudienceTrustStore, GateEvaluator, TrustStateComposer, IShellParser) into the daemon's DI container at Program.cs:648-722. v2 ToolApprovalStore remains the authoritative approval-decision path for shell tools until ToolAccessPolicy gets the gate-evaluator integration in a follow-up commit; the new services exist as singletons ready to consume. Files: - NetclawPaths: adds TrustZonesPath (~/.netclaw/config/trust-zones.json, sibling to the v2 tool-approvals.json) and HardDenyOverridesPath (~/.netclaw/config/hard-deny-overrides.json). The new store is at a sibling path during transition so the v2 store can keep handling existing entries without conflict; the two file shapes are incompatible (AudienceTrustStore.Load would archive a v2-shape file on first read). - Program.cs: - HardDenyOverridesLoader.Load() invoked at startup; loaded rules flow into ShellCommandPolicy's new (additionalDenyPatterns, overrideRules) constructor. Malformed override file throws InvalidDataException with operator context and refuses startup rather than silently dropping rules. - AudienceTrustStore registered with TrustZonesPath as singleton. - IShellParser registered via services.AddShellParser() (BashParser impl from the ShellSyntaxTree package). - GateEvaluator registered as singleton wrapping ShellCommandPolicy + IShellParser. - TrustStateComposer registered as singleton wrapping toolConfig.AudienceProfiles + AudienceTrustStore + SafeVerbList. Per-call session_dir and session-scope grants flow through Compose(). - using ShellSyntaxTree; added to the Program.cs import list. Behavior on swap: nothing changes for existing sessions. v2 ToolApprovalStore still gates shell tools. The new services are constructed at startup but unused until ToolAccessPolicy integration lands. HardDenyOverrides will activate immediately if an operator drops a hard-deny-overrides.json file into ~/.netclaw/config/ — the shipped defaults remain in force regardless. Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1522, plus Configuration/Security/Channels). Slopwatch clean. File headers present. Next: ToolAccessPolicy.AuthorizeInvocation gets the gate-evaluator integration path (feature-flagged so v2 remains the default while operators opt in for testing).

…essPolicy Activates the trust-zones approval pipeline as an auto-allow fast path ahead of the v2 (verb, directory) matcher. The integration is deliberately minimal: GateEvaluator only short-circuits to silent execution when it returns Approved (read-only verb in trusted zone, or clause matches a persisted/session verb pattern). If the new evaluator hits HardDenied, the call denies with the gate's reason. For NeedsPrompt, control falls through to the existing v2 5-button prompt path so user-facing approval UX stays unchanged until the prompt builder rewrite lands. This is the load-bearing change for reminder reliability. Operators configure broad audience baseline zones in netclaw.json (ToolAudienceProfile.ReadFiles.Roots) and pre-approve verb patterns via the CLI; reminders firing read-only verbs inside trusted zones now auto-allow at the GateEvaluator without ever entering the prompt- required v2 code path that previously blocked them. ToolAccessPolicy.cs changes: - Constructor accepts optional GateEvaluator + TrustStateComposer (back-compat: existing call sites without these parameters get identical v2-only behavior). - CheckApprovalGate's shell branch invokes GateEvaluator ahead of the v2 safe-verb short-circuit when: isShell && !isMessy && both new services are present && context.SessionDirectory is set && shellCommand is non-empty. Builds TrustState via composer (with empty session-scope lists for now — those plumb through LlmSessionActor when the prompt UI moves to the new 4-button shape). Evaluates command; on Approved returns Allow; on HardDenied returns Deny with the GateEvaluator's category prefix; on NeedsPrompt falls through to v2. - Existing safe-verb short-circuit retained as fallback for sessions that don't exercise the GateEvaluator fast path. Program.cs changes: - Constructs BashParser, GateEvaluator, TrustStateComposer locally; registers each as singleton. - Passes GateEvaluator + TrustStateComposer into ToolAccessPolicy constructor. What this delivers for binary-swap testing: - Reminders that hit read-only verbs (grep, cat, ls, find, etc.) inside the audience's baseline trusted zones (configured in netclaw.json's ToolAudienceProfile.ReadFiles.Roots) now run silently without any approval prompt. - Verb patterns pre-approved in the new AudienceTrustStore at ~/.netclaw/config/trust-zones.json auto-pass. - Existing v2 approvals (~/.netclaw/config/tool-approvals.json) are still consulted for any command that falls through to v2. - Hard-deny rules in the new structured DSL (compiled defaults + hard-deny-overrides.json) fire here. What's NOT yet wired (deferred to subsequent commits): - v2 button clicks (Once / This chat / Always here / Always anywhere) still write to the v2 ToolApprovalStore, not the new AudienceTrustStore. Operators populate trust-zones.json via direct CLI commands or config edit; the daemon never writes to it. - Session-scope grants are passed as null to the composer; LlmSessionActor plumbing lands when prompts switch to the new 4-button shape. - 4-button prompt UI not built yet; existing v2 5-button row renders. Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1522, plus all others). Slopwatch clean. File headers present.

Approvals are user decisions, not clock-bounded operations. The 5-minute auto-deny manufactured a race condition: when a user took longer than 5 minutes to click an approval button, the workflow's TaskCompletionSource silently transitioned to TimedOut. The late click then arrived at an already-terminated workflow, routing through a buggy retry path that re-authorized the tool, didn't find a matching grant, and threw ToolApprovalRequiredException — surfacing to the user as "I encountered an error executing a tool" with a correlation ID. No security benefit: the user is the authority. If they need 20 minutes (or 2 hours) to evaluate a prompt, that's their call. The system should wait, not pre-empt the decision on a timer. Changes: - SessionToolExecutionPipeline.cs:77 — default for null approvalTimeout flipped from TimeSpan.FromMinutes(5) to Timeout.InfiniteTimeSpan, matching the existing default at line 274 and LlmSessionActor's explicit InfiniteTimeSpan pass at line 1678. Net: every caller that passes null now waits forever for user response. - openspec/specs/tool-approval-gates/spec.md — Mid-turn approval pause requirement rewritten to mandate indefinite wait; the "Approval timeout auto-denies" scenario replaced with "Approval pause waits indefinitely for user response." - openspec/changes/approval-policy-trust-zones/specs/tool-approval-gates/spec.md — same edits in the trust-zones delta spec. Session passivation and daemon-restart approval recovery are tracked separately in netclaw-dev#939 and remain out of scope for this fix. This commit only removes the clock-driven auto-deny that was generating false errors on legitimate late clicks. Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1522, plus all others). Slopwatch clean. Specs re-validate.

…licy-v2 # Conflicts: # src/Netclaw.Actors/Sessions/LlmSessionActor.cs # src/Netclaw.Cli.Tests/Cli/DaemonClientMappingTests.cs

Messy commands (bash control-flow `for/while/case`, unbalanced quotes/brackets) cannot have verb-chain patterns extracted, so the approval prompt only offers Once + Deny and ApprovalContext.Patterns is empty. The IsOneTimeApprovalSatisfied retry-bypass guard required both sides' patterns to match — empty == empty technically passes the All() check, but the guards on lines 200/203 short-circuited to return false on empty inputs. Result: clicking Once on a complex bash command landed the retry into AuthorizeCoreAsync, hit the guards, threw ToolApprovalRequiredException — surfacing to the user as "I encountered an error executing a tool" with a correlation ID. Repro on production: session D0AC6CKBK5K/1778542266.328629 on 2026-05-11 ran a `for repo in ...; do ... worktree list ...; done` loop. Prompt fired with "complex command — only Once available." User clicked Once 28 minutes later (no longer auto-denied at 5 min since af645f7). Click landed on the now-living workflow, retry hit the bypass guard, failed with the same error message that previously came from late-click-on-timed-out-workflow. Fix: reorder IsOneTimeApprovalSatisfied so messy commands take a tool-name-match-only fast path before the empty-patterns guards. The per-retry cleanup at SessionToolExecutionPipeline.cs:467-475 still clears OneTimeApprovedToolName afterward, so the messy bypass cannot be reused for subsequent calls — bypass scope stays per-call as designed. Test added (DispatchingToolExecutorTests.One_time_approval_bypasses_for_messy_command_via_tool_name_match): - Sets up shell_execute with `for i in 1 2 3; do echo $i; done`. - First attempt throws ToolApprovalRequiredException with ApprovalContext.IsMessy=true and Patterns=[]. - Sets OneTimeApprovedToolName, leaves OneTimeApprovedPatterns empty (because Patterns is empty per messy contract). - Second attempt succeeds (no exception). - After bypass cleanup, third attempt re-throws (proves bypass is per-retry only). Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1528, plus Configuration/Security/Channels). Slopwatch clean. File headers present.

Refactors the messy-command ApprovedOnce test to follow the akka-testing-patterns skill — inheriting from Akka.Hosting.TestKit.TestKit instead of manual ActorSystem.Create + try/finally + Terminate. Test moves to its own file (MessyCommandOneTimeApprovalTests.cs) with proper ConfigureServices / ConfigureAkka overrides and ActorRegistry-based actor retrieval. The original DispatchingToolExecutorTests file's other one-time- approval tests still use the manual ActorSystem.Create pattern; those are pre-existing test smell that could be similarly refactored in a follow-up but are out of scope for this fix. The messy-command test gets a fresh file as the seed for proper-pattern adoption — future test additions in this area should follow the same shape. Per the akka-testing-patterns skill: - Inherits from Akka.Hosting.TestKit.TestKit (framework class) - ConfigureServices wires ToolConfig + EffectivePolicyDefaults + ToolAccessPolicy + ToolRegistry into the host's DI container - ConfigureAkka registers ToolApprovalActor in the ActorRegistry - Test method retrieves dependencies via Host.Services and ActorRegistry — no manual lifecycle code - Automatic actor system shutdown via TestKit fixture teardown (no try/finally for system.Terminate()) Functional behavior unchanged: same scenario, same assertions (IsMessy=true, empty Patterns, ApprovedOnce satisfies bypass on tool-name match alone, per-retry cleanup re-prompts). 1528/1528 Actors tests pass. Build: green. Slopwatch clean. File headers present.

Picks up the bash-comment lexing fix per Aaronontheweb/ShellSyntaxTree#25 — `#` starting a token outside quotes now begins a comment that runs to end-of-line and produces no AST tokens. Closes the cascade where comment text was extracted as the leading verb of the clause it sat in, leading to: 1. Misleading approval prompts: "Approve `# Get` in <dir>?" instead of "Approve `git pull` in <dir>?" 2. Persistence-versus-recheck verb-set mismatches that broke ApprovedSession on commented commands. The persistence path saw one verb set ([# Get, echo]); the retry-authorization path saw a different finer-grained set ([# Get, curl, jq, echo]) when decomposed per-pipeline-segment. The unmatched verbs (curl, jq) threw ToolApprovalRequiredException — surfacing as "I encountered an error executing a tool" in Slack. Both symptoms collapse once `BashLexer` strips comments at lex time: both extraction passes see the same clauses with the same leading verbs, so they agree. Adds two regression smoke tests in ShellSyntaxTreeIntegrationTests: - Leading_line_comment_is_stripped_from_clause_extraction: Confirms `# fetch the latest\ngit pull origin main` produces one clause with verb chain `git pull` (not `# fetch`). - Hash_inside_double_quotes_is_not_a_comment: confirms `# inside double quotes is preserved as a literal arg character (POSIX rule that # is only a comment when starting a word AND outside quotes). Build: green. Full test suite passes (Cli 640, Daemon 504, Actors 1528, plus Configuration/Security 9 new ShellSyntaxTree integration tests). Slopwatch clean. File headers present.

…oser When an audience's ReadFiles profile is configured with `Mode: All`, the composer was still only reading the (now-meaningless) Roots list, handing TrustState an empty zone set. The zone gate then prompted on every path operand even though the operator had explicitly declared the audience as filesystem-unrestricted at the profile layer. The composer now detects Mode=All and threads `trustsAllPaths: true` into TrustState, which short-circuits IsPathInTrustedZone. Mode=Roots and Mode=None still rely on the explicit Roots list (None typically empty). The verb-pattern gate is unaffected — mutating verbs without a pattern match still prompt regardless. Geography is the only thing waived here.

0.1.4-alpha lands the issue netclaw-dev#27 fix: the parser extends the verb chain greedily through every "verb-like" token until it hits a flag (-x) or a path (anything containing / or .). Production hit on `git worktree list` (extracted as `git worktree`, mismatching at retry time) is fixed — multi-token CLI subcommands now extract cleanly without per-CLI tables. Side effects: - Auto-proposed verb patterns are narrower. `git push origin main` now proposes `git push origin main *` instead of `git push *`. This is intentionally tighter — approving the specific argument set is safer than approving the whole verb family. - Test expectations updated for the new shape. Three new integration cases cover stop-at-flag, stop-at-path, and the multi-token CLI subcommand regression directly. - TrustState xmldoc updated: verb-pattern matching is exact verb-chain equality + arg-glob suffix, so a stale `git push *` no longer matches a `git push origin main` invocation. Operators with persisted `git push *` from older runs will be re-prompted on the new shape. 557 Security tests + 314 Configuration tests + 1528 Actors tests pass.

…licy-v2

The v2 approval matcher's `ExtractVerbChain` was capping at depth 2, truncating multi-token CLI subcommands like `freshdesk ticket list` and `git worktree list` to two tokens (`freshdesk ticket`, `git worktree`). The truncation surfaced two ways in production: - Approval prompts displayed misleading verb names ("Approve `freshdesk ticket` in this session?" for what is really `freshdesk ticket list`). - Verb-chain mismatch between approval-prompt time and retry time threw `ToolApprovalRequiredException` mid-flight, surfacing as "I encountered an error executing a tool" with a correlation ID. The 0.1.4-alpha ShellSyntaxTree bump shipped a greedy verb-chain extractor (issue netclaw-dev#27), but it was only wired into the new GateEvaluator/TrustStateComposer code path — which isn't on the live runtime approval flow yet (trust-zones milestones B-N pending). This puts it on the v2 path so the live prompt benefits immediately. `ShellApprovalSemanticsBase.ExtractVerbChain` now delegates to `BashParser.Parse(...).Clauses[0].Verb.Joined` for greedy extraction. The path-aware/side-effect short-circuit (cap at depth 1 for cat, grep, find, ls, echo, printf, etc.) is preserved as a post-check so positional search patterns and target paths don't bake into persisted approval keys (`grep secret /var/log/syslog` still extracts as `grep` alone). `maxDepth` is now an upper bound rather than a default cap; the default is `int.MaxValue` so callers get greedy extraction unless they explicitly request a tighter chain. Tests: - ShellTokenizerTests: existing depth-2 expectations updated to greedy shape (`git push origin main`, `kubectl delete pod my-pod`, `docker compose up`). Path-aware verbs unchanged (cat, grep, ls still cap at 1). - New regression theory `ExtractVerbChain_extracts_multi_token_cli_subcommands` pins the production hits: freshdesk ticket list, git worktree list, gh pr view, kubectl get pods. - ToolApprovalGateTests: gate test renamed and re-asserted to expect the greedy chain on `git push origin main`. 561 Security + 1530 Actors + 314 Configuration tests pass.

…licy-v2

…ates Compound commands like `cd /repo && git checkout -b feature/foo` were failing ApprovedSession retry with ToolApprovalRequiredException. Root cause: the v2 matcher extracted each clause's path argument independently. The `git checkout` clause has no anchored path arg of its own (feature/foo's slash doesn't make it a path token), so the candidate landed as (git checkout, null). At persistence time the candidate inherited cwd → session_dir → the session-scratch guard at PersistApprovalCandidatesAsync dropped the grant on the floor → retry saw the verb as unapproved → throw → "I encountered an error executing a tool". ShellSyntaxTree 0.1.4-alpha already tracks cd-in-compound cwd propagation: every clause that follows a `cd X` in the same compound (including across `bash -c "..."` boundaries) carries a synthetic IsCwdAttribution arg whose Resolved is the absolute tilde-expanded cd target. The matcher now consumes that signal: - POSIX ExtractCandidates iterates BashParser.Parse(command).Clauses directly. For each clause: own anchored path arg wins over cd attribution, then falls back to cwd attribution. Consecutive Pipe clauses fold into one approval unit so `cat /etc/hosts | wc -l` stays one decision. - Side-effect verbs (echo, printf, :, true, false) opt out of cd inheritance — they ignore cwd anyway and the IsPureSideEffect skip in ApprovalPatternMatching requires a null Directory. - Windows keeps the legacy ShellTokenizer-based path; ShellSyntaxTree is bash-only. Tests added in ShellApprovalMatcherTests: - cd target propagates to verbs without anchored path args - Multiple cd hops — latest wins - bash -c "cd X && verb" — verb inherits X - Explicit own path beats cd attribution (`cd /tmp && dotnet test /home/foo` → /home/foo) - Side-effect verbs do not inherit cd - Pipe chain collapses into a single candidate Pre-existing cd-target test updated to assert the new contract (subsequent verb's Directory inherits cd target instead of staying null) and to use generic path placeholders rather than developer- specific home directories. 567 Security + 1537 Actors + 314 Configuration tests pass.

The approval prompt for `cd ~/x && git checkout -b feature/foo` was displaying "Saved for this chat: cd, git checkout in 2 directories" even though both clauses operate on the same folder. The header's distinct-directory counter saw two strings — `~/x` from cd's own path arg and `/home/<user>/x` from git checkout's inherited cwd attribution — and reported them as separate locations. The mismatch was self-inflicted: ExtractCandidates' path branch read `arg.Raw` (user-facing form) while the cwd attribution branch reads `arg.Resolved` (parser-resolved absolute form, which is always pre-expanded for ~/, $HOME, and relative segments). For the same logical directory, the two branches produced different strings. Switch the path branch to `arg.Resolved` when available (falling back to `arg.Raw` for arg kinds where the parser doesn't resolve). Path classification still runs on `arg.Raw` so branch names whose Resolved happens to look path-like don't get misclassified. Tests: - New: ExtractCandidates_normalizes_tilde_cd_to_absolute_path_so_clauses_share_one_directory pins the production header behavior. - Updated: ExtractCandidates_applies_file_parent_rule now expects the absolute home directory (Environment.GetFolderPath) rather than the raw `~`, matching the new canonicalization contract. - Sanitized: ExtractCandidates_strips_path_from_verb no longer uses /home/petabridge in its test fixture. 568 Security + 1537 Actors tests pass.

ApprovedSession on standalone shell verbs with no anchored path argument (curl https://..., gh pr list, git status) failed retry with ToolApprovalRequiredException. Repro: D0AC6CKBK5K/1778588682.018849, 13:37:31-39 — four parallel curl calls to api.github.com, each ApprovedSession sequentially. 1ms after the final approval, the executor threw and the whole turn failed. Root cause: in PersistApprovalCandidatesAsync, every candidate's effective directory fell back to pending.Cwd when candidate.Directory was null. For a curl call with a URL operand (URLs are explicitly excluded from IsPathToken) and no preceding `cd` to attribute, the candidate's Directory was null → effectiveDirectory resolved to the session directory → the session-scratch dead-on-arrival guard fired and skipped the verb entirely. The verb never landed in ToolApprovalActor._sessionApprovals; retry's IsApproved check returned false; throw. Session-scope entries are matched verb-only at lookup time — ToolApprovalActor._sessionApprovals is keyed by (sessionId, audience, tool) and IsSessionApproved consults only the verb, never the directory. Threading session_dir through here just fed the dead-on-arrival filter that drops folder-scoped entries with no viable scope. Fix: only apply the cwd fallback + session-scratch guard to persistent scope (Always / Everywhere); for session scope use candidate.Directory directly so verbs with null Directory persist under the null-directory bucket and remain queryable. Refactored the bucketing branch into the internal-static `LlmSessionActor.BuildApprovalBuckets(...)` helper so the scope-vs- directory logic is unit-testable without spinning up an actor system. Added BuildApprovalBucketsTests pinning: - Session scope + no path arg → verb persists in null-directory bucket - Session scope + concrete directory → verb persists in that bucket - Persistent scope + cwd resolving to session_dir → dropped (existing) - Persistent scope + concrete directory → kept (existing) - Global wildcard → directory=null regardless of inputs - Pure side-effect verbs → never persisted at either scope This bug class evaporates structurally under the trust-zones architecture, where session-scope grants are verb-pattern globs on `LlmSessionActor.SessionVerbPatterns` with no directory dimension at all. The fix here is necessary for v2's remaining lifespan and goes away on the trust-zones cutover. 568 Security + 1543 Actors tests pass.

Strip the unmerged trust-zones (two-gate) proposal from openspec/changes/. Live spec openspec/specs/tool-approval-gates/spec.md (the v1.5 ApprovalEntry (verb, directory) model) is untouched and remains canonical. Trust-zones overprompted in practice; product direction is to stay on (verb, directory) pairs.

`SessionMemoryObserverActor.cs` was created `MemoriesDistilledV2` events at runtime but the type was never wired into the protobuf serializer, so every persistence attempt failed under strict serialization: [ERR] Rejected to persist event type [MemoriesDistilledV2] due to [No serializer binding found for type MemoriesDistilledV2. Configure a binding in 'akka.actor.serialization-bindings' or set 'akka.actor.serialization-settings.allow-unregistered-types = true' to use the default fallback.] Three sessions hit this in a four-minute window today. The memory observer's distillation events were being silently dropped from the journal. Fix the four-step binding: - Add `MemoriesDistilledV2Proto` and `ProposedMemoryContextProto` to `netclaw_messages.proto` so Grpc.Tools generates the wire types. - Add `MemoriesDistilledV2 → MemoriesDistilledV2Proto` (and back) mappings in `NetclawProtoMapper`. ProposedMemoryContext maps through its three string fields; the parent type carries the repeated anchors, the repeated proposals, and the timestamp. - Register `MemoriesDistilledV2Manifest = "mdv2-v1"` in `NetclawProtobufSerializer.TypeToManifest` and add the `FromBinary` branch. - Append `typeof(MemoriesDistilledV2)` to the bound types list in `NetclawAkkaHostingExtensions.WithNetclawSerialization`. Regression coverage in `SerializationRoundTripTests`: two new cases exercise a populated event and an empty-collections edge case through the real `Sys.Serialization` pipeline. The tests would have caught this gap before production. Baseline regen via `dotnet slopwatch init --force`: trust-zones POSIX-only test skips (SW001) and upstream's `ConfigWatcherService` delays (SW004) are now baselined with current line numbers. Pre-existing entries retained with refreshed line numbers where the underlying files moved. 1586 Actors tests pass. Slopwatch clean. See also netclaw-dev#961 for the broader test-coverage gap that let this slip through (Akka serialization verification not enabled in test configurations).

ToolAccessPolicy.ExtractShellCommand was using a direct `command is string text` pattern match that fails for the JsonElement values LLM-generated tool calls actually deserialize to. Real-world impact: - The hard-deny pre-check at AuthorizeInvocation line 152 was silently no-op for every real shell call — _shellCommandPolicy.Evaluate was gated on `shellCommand is not null`, so destructive-command rules weren't actually firing. - The trust-zones GateEvaluator fast-path in CheckApprovalGate was also gated on `!string.IsNullOrEmpty(shellCommandForGate)`, so it never entered. The actor fell back to v2 for every approval — exactly the symptom diagnosed via the gate_fastpath_eval log showing `hasCommand=False` despite all other preconditions True (issue netclaw-dev#962). Route through ToolArgumentHelper.GetString (the same helper the matcher's GetCommand uses), which properly unwraps JsonElement via .GetString(). Both call sites benefit: - Hard-deny rules now evaluate against the real command text. - GateEvaluator fast-path runs on shell calls, populating Gate on ApprovalContext, which routes through the trust-zones workflow on LlmSessionActor instead of v2. 1586 Actors tests pass. Slopwatch clean. Worth a follow-up issue: three places extract "Command" from tool args (this one, ShellApprovalMatcher.GetCommand, ShellTool.cs). Consolidating to a single helper would prevent the JsonElement type-mismatch class of bug from recurring.

When the daemon is installed under user systemd (via the installer or `netclaw doctor`), `netclaw daemon stop` triggers SIGTERM but systemd's Restart policy immediately respawns a replacement. That races with the subsequent `cp` and produces "Text file busy" — or worse, a silent "already running" with the old binary still in place. Detect the systemd unit and use `systemctl --user stop/start` directly when present, falling back to the CLI commands when not. Also switch publish verbosity from `quiet` to `minimal` so users see incremental progress instead of ~50 seconds of dead air that looks like a hang.

The 100ms restartDrainTimeout in RequestConfigRestartAsync_reopens_ingress_when_coordination_fails races the stub's InvalidOperationException on slow CI runners — the Ask call gets cancelled before the stub throws, and the test sees TaskCanceledException instead of the expected InvalidOperationException. The timeout is irrelevant to the behavior under test (exception propagation + ingress reopen). Widen to 30s so the stub's throw wins the race deterministically.

Windows CI runners on `pr_validation.yml` have been failing the trust-zones path-extraction tests since the BashParser integration landed. The failures aren't bugs — the v2 matcher gates the BashParser fast path behind `!OperatingSystem.IsWindows()`, so Windows falls through to the legacy ShellTokenizer which doesn't carry cwd attribution or `arg.Resolved` canonicalization. Tests pinning the trust-zones path semantics correctly fail to match POSIX-shaped inputs on a Windows host. Uses xunit.v3 structured skip (`[Fact(SkipUnless = nameof(IsPosix), Skip = "...")]`) so the affected tests surface as "Skipped" in the CI log rather than hiding behind an early-return — preserves the platform-gap signal and matches the convention used elsewhere in the suite (`ShellTokenizerTests.WindowsAnchoredPathCases` etc.). Affected (10 fixtures, all under `Netclaw.Security.Tests`): - ShellApprovalMatcherPathExtractionTests (8 new + 2 modified pre-existing tests that exercise BashParser cwd attribution and `arg.Resolved` canonicalization) - TrustStateComposerTests.Compose_uses_home_directory_override_for_tilde_expansion (Path.Combine produces backslash-mixed paths on Windows) - ShellTokenizerTests inline data for `cp /src/a.txt /dst/b.txt` / `cat /etc/hosts.conf` extracted into a POSIX-gated theory (Path.GetDirectoryName platform-aware separator) The trust-zones path is POSIX-first by design — ShellSyntaxTree is bash-only and the legacy v2 path is being retired. Once the rewrite finishes, Windows shell handling moves to whatever shell-parser shape the trust-zones spec lands on for cmd.exe / PowerShell. 568 Security tests pass on Linux. Windows CI should now show 568 - 10 skipped = 558 passed instead of 9 failed.

The trust-zones experiment was reverted at the spec layer in ca29f5e and the user-facing workflow wiring never landed on this branch. The supporting primitives — GateEvaluator, TrustState, TrustStateComposer, AudienceTrustStore, AudienceTrustState — were still in the tree, registered in DI, and consulted as a fast-path auto-allow inside ToolAccessPolicy.CheckApprovalGate. None of that code path was reachable in the v1.5 ApprovalEntry model we're staying on. Strip the lot: - Delete TrustState.cs, TrustStateComposer.cs, GateEvaluator.cs, GateEvaluation.cs (Netclaw.Security) - Delete AudienceTrustStore.cs, AudienceTrustState.cs (Netclaw.Configuration) - Delete the matching test fixtures (GateEvaluatorTests, TrustStateComposerTests, AudienceTrustStoreTests) - Strip the DI registrations + ToolAccessPolicy constructor params and fast-path block in Program.cs and ToolAccessPolicy.cs - Drop NetclawPaths.TrustZonesPath (orphan) Build clean, slopwatch clean, headers verified, all four major test suites pass with zero failures (Security 534, Configuration 288, Daemon 504, Actors 1545).

Aaronontheweb added 30 commits May 9, 2026 03:18

Merge branch 'dev' into openspec/approval-policy-v2

0154c61

Merge branch 'openspec/approval-policy-v2' of https://github.com/Aaro…

0a70f69

…nontheweb/netclaw into openspec/approval-policy-v2

Merge branch 'dev' into openspec/approval-policy-v2

3574a14

Aaronontheweb added 30 commits May 11, 2026 14:25

Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…

87a0ee8

…licy-v2 # Conflicts: # src/Netclaw.Actors/Sessions/LlmSessionActor.cs # src/Netclaw.Cli.Tests/Cli/DaemonClientMappingTests.cs

Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…

86c7460

…licy-v2

Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…

ee8ac4d

…licy-v2

Merge branch 'dev' into approvals/prompt-less

1381dc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert trust-zones; stay on (verb, directory) ApprovalEntry model#962

Revert trust-zones; stay on (verb, directory) ApprovalEntry model#962
Aaronontheweb wants to merge 64 commits into
netclaw-dev:devfrom
Aaronontheweb:approvals/prompt-less

Aaronontheweb commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aaronontheweb commented May 12, 2026 •

edited

Loading