Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
2c14d31
docs(openspec): propose approval-policy-v2
Aaronontheweb May 8, 2026
d82cf4f
feat(approvals): add ApprovalEntry foundation type for v2 store
Aaronontheweb May 8, 2026
3885f04
feat(approvals): cut over tool-approvals storage to v2 schema
Aaronontheweb May 8, 2026
ad95fab
feat(approvals): matcher operates on (verb, directory) ApprovalEntry
Aaronontheweb May 8, 2026
beace4a
feat(approvals): refuse pattern extraction for messy shell commands
Aaronontheweb May 8, 2026
b68137c
feat(approvals): ShellTool cwd defaults to project_dir then session_dir
Aaronontheweb May 8, 2026
6af4622
feat(approvals): safe-verbs ∩ safe-space approval short-circuit
Aaronontheweb May 8, 2026
406ae03
feat(approvals): netclaw approvals trust-verb + folder-scoped revoke
Aaronontheweb May 8, 2026
8de3f73
feat(approvals): five-button Slack approval prompt with cwd header
Aaronontheweb May 8, 2026
39aa01b
feat(approvals): five-button Discord approval prompt mirroring Slack
Aaronontheweb May 8, 2026
2132bb5
feat(approvals): agent guidance + set_working_directory failure-path …
Aaronontheweb May 8, 2026
dcf0fb0
refactor(approvals): remove dead v1 helpers and DirectoryRoots field
Aaronontheweb May 8, 2026
b619ddc
refactor(approvals): consolidate scope formatting + hot-path cleanups
Aaronontheweb May 8, 2026
064374a
test(approvals): eval cases for set_working_directory + schedule pre-…
Aaronontheweb May 8, 2026
12ec628
docs(openspec): sync approval-policy-v2 deltas + archive change
Aaronontheweb May 8, 2026
6c7e63a
fix(evals): load system skills from source, not published feed
Aaronontheweb May 9, 2026
1c96848
fix(evals): rewrite ambiguous approval-policy-v2 prompts
Aaronontheweb May 9, 2026
01a142e
fix(approvals): thread cwd into approval context so 'Always here' act…
Aaronontheweb May 9, 2026
f54c2e6
docs(openspec): propose approval-policy-path-extraction
Aaronontheweb May 9, 2026
0154c61
Merge branch 'dev' into openspec/approval-policy-v2
Aaronontheweb May 9, 2026
0a70f69
Merge branch 'openspec/approval-policy-v2' of https://github.com/Aaro…
Aaronontheweb May 9, 2026
ffb60ee
docs(openspec): add tasks for approval-policy-path-extraction
Aaronontheweb May 9, 2026
25e34f7
feat(approvals): path-extraction matcher + side-effect skip list
Aaronontheweb May 9, 2026
7e84da8
docs(approvals): align agent guidance with path-extraction model
Aaronontheweb May 9, 2026
c9721c1
docs(approvals): pin operations skill at 2.0.0 + clarify deferred tasks
Aaronontheweb May 9, 2026
3574a14
Merge branch 'dev' into openspec/approval-policy-v2
Aaronontheweb May 9, 2026
579a4f6
fix(approvals): side-effect candidates auto-allow at match time
Aaronontheweb May 9, 2026
f39b781
wip(approvals): session-scratch hide + target-dir header
Aaronontheweb May 9, 2026
771bb2c
docs(openspec): continuation memo for trust-zones rewrite
Aaronontheweb May 9, 2026
713d2c7
docs(openspec): rewrite approval policy as trust-zones change
Aaronontheweb May 10, 2026
6d843f6
docs(openspec): fold interview decisions into trust-zones change
Aaronontheweb May 10, 2026
43f7e86
fix(security): hard-deny agent writes to ~/.netclaw/config/
Aaronontheweb May 10, 2026
63e4cce
feat(security): add two-store audience trust persistence
Aaronontheweb May 10, 2026
fdcbd38
feat(security): structured hard-deny override DSL with raw-text escape
Aaronontheweb May 10, 2026
0e35b05
feat(security): wire ShellSyntaxTree 0.1.0-alpha into Netclaw.Security
Aaronontheweb May 11, 2026
7570af3
chore(deps): bump ShellSyntaxTree to 0.1.1-alpha
Aaronontheweb May 11, 2026
a0e7742
fix(security): remove safe-verbs on-disk override path
Aaronontheweb May 11, 2026
3f79813
fix(security): add CWD verbs to safe-verbs lists
Aaronontheweb May 11, 2026
7fe7598
feat(security): three-layer GateEvaluator (zones + verb patterns + ha…
Aaronontheweb May 11, 2026
aa71604
Merge branch 'dev' into openspec/approval-policy-v2
Aaronontheweb May 11, 2026
a88094e
feat(security): trust-state composer (audience baseline + store + ses…
Aaronontheweb May 11, 2026
70f356d
feat(security): register trust-zones DI services alongside v2 store
Aaronontheweb May 11, 2026
3cd7a52
feat(security): wire GateEvaluator as fast-path auto-allow in ToolAcc…
Aaronontheweb May 11, 2026
af645f7
fix(approvals): remove 5-minute approval timeout
Aaronontheweb May 11, 2026
87a0ee8
Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…
Aaronontheweb May 11, 2026
9a8e7c8
fix(approvals): ApprovedOnce on messy commands bypasses retry guard
Aaronontheweb May 12, 2026
33a515d
test(approvals): use Akka.Hosting.TestKit for messy-command bypass test
Aaronontheweb May 12, 2026
b5dfbe3
chore(deps): bump ShellSyntaxTree to 0.1.3-alpha
Aaronontheweb May 12, 2026
0316fba
fix(security): honor Mode=All on filesystem profile in TrustStateComp…
Aaronontheweb May 12, 2026
fd51f94
chore(deps): bump ShellSyntaxTree to 0.1.4-alpha (greedy verb-chain)
Aaronontheweb May 12, 2026
86c7460
Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…
Aaronontheweb May 12, 2026
c69ea0a
fix(security): use ShellSyntaxTree for v2 verb-chain extraction
Aaronontheweb May 12, 2026
ee8ac4d
Merge remote-tracking branch 'upstream/dev' into openspec/approval-po…
Aaronontheweb May 12, 2026
d8c26f5
fix(security): consume BashParser cwd attribution in v2 ExtractCandid…
Aaronontheweb May 12, 2026
4a566f1
fix(security): canonicalize candidate directory via parser-resolved path
Aaronontheweb May 12, 2026
00793d6
fix(approvals): persist session-scope grants for no-path-arg verbs
Aaronontheweb May 12, 2026
ca29f5e
revert(openspec): remove approval-policy-trust-zones change proposal
Aaronontheweb May 12, 2026
b5ce04d
fix(serialization): bind MemoriesDistilledV2 to netclaw-protobuf
Aaronontheweb May 12, 2026
b83126b
fix(security): ExtractShellCommand handles JsonElement-valued args
Aaronontheweb May 12, 2026
d283b61
fix(scripts): swap-daemon.sh handles systemd-managed daemons
Aaronontheweb May 12, 2026
3ac28f9
fix(tests): widen drain timeout in restart-coordinator failure test
Aaronontheweb May 13, 2026
7a4fcec
test(security): skip POSIX-only path tests on Windows runners
Aaronontheweb May 12, 2026
a6a2eee
chore(security): remove dead trust-zones primitives
Aaronontheweb May 13, 2026
1381dc8
Merge branch 'dev' into approvals/prompt-less
Aaronontheweb May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 73 additions & 37 deletions .slopwatch/baseline.json
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
{
"version": 1,
"createdAt": "2026-04-11T20:25:26.1743299+00:00",
"updatedAt": "2026-04-11T20:25:26.1800921+00:00",
"description": "Baseline created on 2026-04-11 20:25:26 UTC",
"createdAt": "2026-05-12T17:20:55.7365203+00:00",
"updatedAt": "2026-05-12T17:20:55.74213+00:00",
"description": "Initial baseline created by 'slopwatch init' on 2026-05-12 17:20:55 UTC",
"entries": [
{
"hash": "9d4a53dc5193e639",
"ruleId": "SW005",
"filePath": "Directory.Build.props",
"lineNumber": 6,
"lineNumber": 7,
"codeSnippet": "<NoWarn>$(NoWarn);CS1591</NoWarn>",
"message": "Adding warnings to NoWarn: CS1591",
"baselinedAt": "2026-04-11T20:25:26.1799366+00:00"
"baselinedAt": "2026-05-12T17:20:55.7419017+00:00"
},
{
"hash": "c70b817b8444deb3",
Expand All @@ -20,7 +20,7 @@
"lineNumber": 12,
"codeSnippet": "<NoWarn>$(NoWarn);OPENAI001</NoWarn>",
"message": "Adding warnings to NoWarn: OPENAI001",
"baselinedAt": "2026-04-11T20:25:26.1800377+00:00"
"baselinedAt": "2026-05-12T17:20:55.7420234+00:00"
},
{
"hash": "e5c152257aa8816d",
Expand All @@ -29,79 +29,115 @@
"lineNumber": 8,
"codeSnippet": "<NoWarn>$(NoWarn);OPENAI001</NoWarn>",
"message": "Adding warnings to NoWarn: OPENAI001",
"baselinedAt": "2026-04-11T20:25:26.1800436+00:00"
"baselinedAt": "2026-05-12T17:20:55.7420301+00:00"
},
{
"hash": "1a29ed65e4ed3efb",
"ruleId": "SW004",
"filePath": "src/Netclaw.Daemon.Tests/Services/ConfigWatcherServiceTests.cs",
"lineNumber": 139,
"codeSnippet": "Task.Delay(50, ct)",
"message": "Test uses Task.Delay(50) which may indicate a timing-dependent test",
"baselinedAt": "2026-05-12T17:20:55.7420338+00:00"
},
{
"hash": "fcb5e461d7f70a7c",
"ruleId": "SW004",
"filePath": "src/Netclaw.Daemon.Tests/Services/ConfigWatcherServiceTests.cs",
"lineNumber": 154,
"codeSnippet": "Task.Delay(100, ct)",
"message": "Test uses Task.Delay(100) which may indicate a timing-dependent test",
"baselinedAt": "2026-05-12T17:20:55.7420454+00:00"
},
{
"hash": "6ea5c8bbead4b59c",
"ruleId": "SW004",
"filePath": "src/Netclaw.Daemon.Tests/Gateway/DaemonRuntimeStatusServiceTests.cs",
"lineNumber": 62,
"lineNumber": 76,
"codeSnippet": "Task.Delay(25 * (i + 1))",
"message": "Test uses Task.Delay(25 * (i + 1)) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800482+00:00"
"baselinedAt": "2026-05-12T17:20:55.7420571+00:00"
},
{
"hash": "87955c2f94cc69fb",
"ruleId": "SW001",
"filePath": "src/Netclaw.Security.Tests/ShellTokenizerTests.cs",
"lineNumber": 281,
"codeSnippet": "Theory(SkipUnless = nameof(IsPosix), Skip = \"POSIX-only Path.GetDirectoryName semantics\")",
"message": "Test method 'ExtractFirstPathArgument_applies_file_parent_rule_posix' is disabled: POSIX-only Path.GetDirectoryName semantics",
"baselinedAt": "2026-05-12T17:20:55.7420635+00:00"
},
{
"hash": "2b5354a745d6eabb",
"ruleId": "SW001",
"filePath": "src/Netclaw.Security.Tests/TrustStateComposerTests.cs",
"lineNumber": 147,
"codeSnippet": "Fact(SkipUnless = nameof(IsPosix), Skip = \"POSIX-only path semantics\")",
"message": "Test method 'Compose_uses_home_directory_override_for_tilde_expansion' is disabled: POSIX-only path semantics",
"baselinedAt": "2026-05-12T17:20:55.7420675+00:00"
},
{
"hash": "89f7104059c82e18",
"ruleId": "SW001",
"filePath": "src/Netclaw.Security.Tests/ShellApprovalMatcherTests.cs",
"lineNumber": 231,
"codeSnippet": "Fact(SkipUnless = nameof(IsPosix), Skip = \"POSIX-only path semantics\")",
"message": "Test method 'ExtractCandidates_strips_path_from_verb' is disabled: POSIX-only path semantics",
"baselinedAt": "2026-05-12T17:20:55.7420713+00:00"
},
{
"hash": "6e43e1de0090c276",
"ruleId": "SW003",
"filePath": "src/Netclaw.Actors/Protocol/InboxWriter.cs",
"lineNumber": 114,
"lineNumber": 119,
"codeSnippet": "catch\n {\n // best-effort cleanup; do not mask the original exception\n }",
"message": "Empty catch block swallows exceptions without handling",
"baselinedAt": "2026-04-11T20:25:26.1800601+00:00"
"baselinedAt": "2026-05-12T17:20:55.7421078+00:00"
},
{
"hash": "8777b7954cd69fa1",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/Sessions/LlmSessionIntegrationTests.cs",
"lineNumber": 1879,
"codeSnippet": "Task.Delay(Delay, cancellationToken)",
"message": "Test uses Task.Delay(Delay) which may indicate a timing-dependent test",
"baselinedAt": "2026-05-12T17:20:55.7421117+00:00"
},
{
"hash": "16969d3453617fc8",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/Sessions/MemoryRecallScenarioTests.cs",
"lineNumber": 318,
"lineNumber": 329,
"codeSnippet": "Task.Delay(25 * (i + 1))",
"message": "Test uses Task.Delay(25 * (i + 1)) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800637+00:00"
"baselinedAt": "2026-05-12T17:20:55.7421146+00:00"
},
{
"hash": "c00fb5b6beafab8b",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/SubAgents/SubAgentActorTests.cs",
"lineNumber": 334,
"lineNumber": 459,
"codeSnippet": "Task.Delay(Delay, cancellationToken)",
"message": "Test uses Task.Delay(Delay) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800701+00:00"
},
{
"hash": "8777b7954cd69fa1",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/Sessions/LlmSessionIntegrationTests.cs",
"lineNumber": 1679,
"codeSnippet": "Task.Delay(Delay, cancellationToken)",
"message": "Test uses Task.Delay(Delay) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800748+00:00"
"baselinedAt": "2026-05-12T17:20:55.7421212+00:00"
},
{
"hash": "b691cefe260611c6",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/Memory/SQLiteMemoryStoreTests.cs",
"lineNumber": 263,
"codeSnippet": "Task.Delay(25 * (i + 1))",
"message": "Test uses Task.Delay(25 * (i + 1)) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800788+00:00"
},
{
"hash": "1305b3c2911b984b",
"ruleId": "SW004",
"filePath": "src/Netclaw.Actors.Tests/Memory/MemoryEvalSeedSuiteTests.cs",
"lineNumber": 344,
"lineNumber": 268,
"codeSnippet": "Task.Delay(25 * (i + 1))",
"message": "Test uses Task.Delay(25 * (i + 1)) which may indicate a timing-dependent test",
"baselinedAt": "2026-04-11T20:25:26.1800844+00:00"
"baselinedAt": "2026-05-12T17:20:55.7421241+00:00"
},
{
"hash": "5c3b00ebd2426d97",
"ruleId": "SW003",
"filePath": "src/Netclaw.Actors.Tests/Protocol/InboxWriterTests.cs",
"lineNumber": 26,
"lineNumber": 31,
"codeSnippet": "catch\n {\n // best-effort cleanup\n }",
"message": "Empty catch block swallows exceptions without handling",
"baselinedAt": "2026-04-11T20:25:26.1800921+00:00"
"baselinedAt": "2026-05-12T17:20:55.7421299+00:00"
}
]
}
1 change: 1 addition & 0 deletions Directory.Packages.props
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
<PackageVersion Include="SlackNet.Extensions.DependencyInjection" Version="$(SlackNetVersion)" />
<PackageVersion Include="Cronos" Version="0.12.0" />
<PackageVersion Include="Netclaw.SkillClient" Version="0.3.0" />
<PackageVersion Include="ShellSyntaxTree" Version="0.1.4-alpha" />
<PackageVersion Include="Termina" Version="0.8.0" />
</ItemGroup>
<!-- Serialization -->
Expand Down
88 changes: 86 additions & 2 deletions evals/run-evals.sh
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,12 @@ start_eval_daemon() {

# Copy system skills from the repo into the eval home so Skill Discovery
# tests use the skills being developed, not whatever is synced on the host.
mkdir -p "$EVAL_HOME/skills/.system/files"
# SkillScanner expects <skills>/.system/<skill-name>/SKILL.md (no extra
# `files/` segment); the daemon's feed sync writes to that layout, so we
# mirror it here for local-source-of-truth runs.
mkdir -p "$EVAL_HOME/skills/.system"
if [[ -d "$REPO_ROOT/feeds/skills/.system/files" ]]; then
cp -r "$REPO_ROOT/feeds/skills/.system/files/." "$EVAL_HOME/skills/.system/files/"
cp -r "$REPO_ROOT/feeds/skills/.system/files/." "$EVAL_HOME/skills/.system/"
else
echo "WARN: no system skills at $REPO_ROOT/feeds/skills/.system/files/ — Skill Discovery evals will fail." >&2
fi
Expand Down Expand Up @@ -382,6 +385,11 @@ start_eval_daemon() {
-e "NETCLAW_Security__ShellExecutionMode=HostAllowed"
-e "NETCLAW_Security__StrictDefaults=false"
-e "NETCLAW_Tools__ShellMode=HostAllowed"
# Evals test the source tree, not the published feed. Without this, the
# daemon syncs system skills from the live R2 manifest at startup, which
# ships whatever was last released — masking any unpublished skill
# changes (e.g. version bumps in this PR) and the local copies above.
-e "NETCLAW_SkillSync__DisableSystemSkillSync=true"
)

if [[ -n "$EVAL_CONTEXT_WINDOW" ]]; then
Expand Down Expand Up @@ -1067,6 +1075,57 @@ assert_multi_turn_conflicting_speakers() {
stdout_contains 'block *= *bob'
}

# Category 9: Approval Policy v2
# Exercises the load-bearing set_working_directory adoption guidance and the
# schedule-creation pre-approval flow added in approval-policy-v2.

# Positive: project-scoped prompt mentions a repo path. Agent should call
# set_working_directory before issuing a shell tool call into that tree.
# Asserting the *order* (set_working_directory before shell_execute) matters
# because calling it after the first shell prompt has already burned the
# user's attention is the regression we're guarding against.
assert_approval_set_working_directory_positive() {
stdout_tool_called 'set_working_directory' || return 1

# If shell_execute also happened, ensure set_working_directory came first.
if stdout_tool_called 'shell_execute'; then
local swd_line shell_line
swd_line=$(grep -nE '\[tool:call\] set_working_directory' "$STDOUT_FILE" | head -1 | cut -d: -f1)
shell_line=$(grep -nE '\[tool:call\] shell_execute' "$STDOUT_FILE" | head -1 | cut -d: -f1)
[[ -n "$swd_line" && -n "$shell_line" && "$swd_line" -lt "$shell_line" ]]
fi
}

# Negative: no project signal. Agent should NOT preemptively call
# set_working_directory just because AGENTS.md mentions it.
assert_approval_set_working_directory_negative() {
! stdout_tool_called 'set_working_directory'
}

# Recovery: T1 agent issues a shell call that gets denied for cwd-outside-
# safe-spaces (the daemon emits the set_working_directory hint in the result).
# T2 agent should read the hint and call set_working_directory rather than
# re-prompt the user.
#
# Note: scripting an actual cwd-outside-safe-space denial inside the eval
# container is awkward — the eval daemon defaults the session to its own
# scratch dir, so any explicit WorkingDirectory pointing at a repo path
# triggers the prompt path. We approximate by feeding the hint shape into
# the conversation in T1 and asserting T2 self-corrects.
assert_approval_recovery_hint() {
stdout_tool_called 'set_working_directory'
}

# Schedule pre-approval: user asks to schedule an unattended task that
# needs a specific verb. Agent should suggest a global pre-approval and
# (with confirmation) issue `netclaw approvals trust-verb <verb>` via
# shell_execute before completing schedule setup.
assert_approval_schedule_pre_approval() {
stdout_contains '\[tool:call\] shell_execute' && \
stdout_contains 'netclaw approvals trust-verb' && \
stdout_contains 'freshdesk'
}

# ─── Case & Category Runner ──────────────────────────────────────────────────

print_category() {
Expand Down Expand Up @@ -1399,6 +1458,31 @@ run_all() {
"Without using any tools, answer exactly in this format and nothing else: deploy=<name>; block=<name>."

end_category

# ── Category 9: Approval Policy v2 ──
# Exercises the load-bearing set_working_directory adoption guidance from
# AGENTS.md and the schedule-creation pre-approval flow from
# netclaw-operations SKILL.md. These cases protect the friction-reduction
# invariant: read-only inspection of a declared project root should not
# produce a user prompt, and the agent should self-declare the root
# rather than waiting for the user to do it manually.
print_category "Approval Policy v2"

run_case approval_set_working_directory_positive "calls set_working_directory before shell tool when project mentioned" \
"I'm starting a debugging session on the project checked out at /tmp. Get oriented in that codebase — look at the layout, identify build files, and figure out what kind of project it is. We'll be running multiple shell commands across the tree." \
"I want to start working on the Netclaw checkout at /tmp. Plan to run several commands across that tree — start by getting yourself oriented."

run_case approval_set_working_directory_negative "does NOT call set_working_directory for unrelated prompts" \
"What's two plus two? Just give me the number." \
"Explain what a hash table is in one sentence."

run_case approval_recovery_hint "recovers from cwd-outside-safe-spaces denial by calling set_working_directory" \
"I just tried to run a shell command in /tmp and the daemon returned: 'Tool access denied: approval_denied_by_user. Hint: \"/tmp\" is outside the session'\\''s trusted scope. Call set_working_directory \"/tmp\" first, then retry — that brings the directory into your trusted scope so the approval policy can reason about it.' How should I unblock this so the next shell call works?"

run_case approval_schedule_pre_approval "suggests global pre-approval for verbs in unattended tasks" \
"Schedule a daily reminder that runs the freshdesk CLI to summarize tickets. The reminder fires unattended and won't be able to answer approval prompts, so the verb needs to be globally pre-approved before the schedule fires. Call netclaw approvals trust-verb freshdesk via shell_execute as part of the setup."

end_category
}

# ─── Main ─────────────────────────────────────────────────────────────────────
Expand Down
Loading
Loading