feat: one-click prune nodes outside geofilter (#669 M4)#738
Conversation
Kpa-clawbot
left a comment
There was a problem hiding this comment.
Review: one-click prune nodes outside geofilter (M4)
Good overall design: dry-run by default, API key required, no-GPS nodes always kept. Tests are thorough. A few items to address:
Must-fix
1. TOCTOU race between preview and confirm delete
The dry-run and confirm are two separate API calls. Between them, nodes could be ingested (or coordinates updated). The confirm call re-evaluates the filter independently — it doesn't use the preview list. This means:
- Nodes not shown in preview could be deleted on confirm (ingested between calls, with out-of-bounds coords)
- Nodes shown in preview might not be deleted (if their coords were updated to be inside)
The user sees a preview list, clicks "Delete N nodes", but a different set gets deleted. This violates the principle of least surprise for a destructive operation.
Fix: Either (a) pass the pubkey list from the preview to the confirm call (server-side or client-side), so only previewed nodes are deleted, or (b) return the actual deleted list in the confirm response so the UI shows what really happened, and document this behavior.
2. No cascade/cleanup of related data
DeleteNodesByPubkeys does DELETE FROM nodes WHERE public_key IN (...). If other tables reference these pubkeys (packets with source/dest fields, any future neighbor/edge table), orphaned references will remain. This isn't breaking today (no FK constraints), but it's technical debt that will bite when the schema evolves.
Fix: At minimum, add a code comment documenting that this only deletes from the nodes table and does not cascade. Ideally, also delete from any tables that reference the node pubkey (check transmissions/observations).
Should-fix
3. Irreversibility — no safety net
This is a permanent DELETE with no backup, no soft-delete, no undo. For a one-click operation in a UI panel, that's aggressive. The confirm() dialog is the only guard.
Suggestion: Log the deleted pubkeys + names at INFO level (not just the count) so an operator can at least see what was removed. The nodes will re-appear if the mesh re-advertises them, but historical metadata (first_seen, advert_count) is lost permanently.
4. Frontend prune confirm doesn't show the re-evaluated list
_gfPruneConfirm sends confirm=true but the response only contains { deleted: N }, not the list of what was deleted. If the set changed between preview and confirm (per issue #1), the user has no visibility into what actually happened.
Nits
handlePruneGeoFiltercallsNodePassesGeoFilter(lat, lon, ...)wherelat/lonare concretefloat64(defaulting to0when nil). This works correctly becausePassesFilterhas thelat == 0 && lon == 0escape hatch, but it's semantically confusing — a nil coordinate is being passed as0.0wrapped ininterface{}. Consider checkingn.Lat == nildirectly and skipping the filter call entirely for no-GPS nodes.geofilter-builder.htmlloads Leaflet from unpkg CDN — consider using the same CDN/version as the rest of the app for consistency and offline scenarios.
What's good
- Dry-run default is the right call — safe by default
requireAPIKeymiddleware +IsWeakAPIKeygate on the UI controls- Tests cover all the important paths: dry-run, confirm, no-filter-400, no-key-401
- No-GPS nodes correctly preserved (verified through the full call chain)
- Atomic config write (temp file + rename)
- Documentation is comprehensive
Verdict: Requesting changes for the TOCTOU issue (#1). The rest can be addressed in follow-up, but #1 is a data safety concern for a destructive endpoint.
- Fix TOCTOU race: confirm now requires pubkeys from preview in request body; server intersects with still-outside nodes so exactly the previewed set is deleted (no more, no less) - Add cascade comment to DeleteNodesByPubkeys documenting that only the nodes table is affected (no FK constraints today) - Log each deleted node by name + pubkey for operator visibility - Return deleted node list in confirm response so UI shows what happened - Check lat/lon nil directly instead of passing 0.0 to NodePassesGeoFilter - Update confirm test to send pubkeys body; add test for missing body → 400 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ebcfc18 to
b3d29be
Compare
- Fix TOCTOU race: confirm now requires pubkeys from preview in request body; server intersects with still-outside nodes so exactly the previewed set is deleted (no more, no less) - Add cascade comment to DeleteNodesByPubkeys documenting that only the nodes table is affected (no FK constraints today) - Log each deleted node by name + pubkey for operator visibility - Return deleted node list in confirm response so UI shows what happened - Check lat/lon nil directly instead of passing 0.0 to NodePassesGeoFilter - Update confirm test to send pubkeys body; add test for missing body → 400 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-review: PR #738 — geo-prune M4 (post
|
| # | Item | Status |
|---|---|---|
| 1 | TOCTOU race between preview and confirm — different node set could be deleted | ✅ Fixed. Confirm now requires pubkeys in request body; server intersects with current outside-filter set (routes.go:2565-2580). Frontend sends preview pubkeys in _gfPruneConfirm. Solid fix. |
| 2 | No cascade/cleanup comment on DeleteNodesByPubkeys | ✅ Fixed. Comment added in db.go:2355-2357 documenting no FK cascade. |
| 3 | Irreversibility — no audit trail | ✅ Fixed. Each deleted node logged individually at INFO: log.Printf("[geo-prune] deleted node %q (%s)") (routes.go:2596). |
| 4 | Confirm response didn't show what was deleted | ✅ Fixed. Response now includes "nodes": toDelete with full list (routes.go:2600). |
| 5 | Nil coords passed as 0.0 to filter | ✅ Fixed. NodeForGeoPrune uses *float64 pointers; handler checks n.Lat == nil before calling NodePassesGeoFilter (routes.go:2554). |
| 6 | Leaflet CDN in geofilter-builder.html | Minor — still uses unpkg, rest of app uses CDN link in index.html. Non-blocking. |
New findings (djb + munger)
No must-fix items found. The TOCTOU fix is well-designed — the intersection approach means confirm can never delete more than what was previewed, even if the polygon or GPS data changes between calls.
Observations (informational, not blocking):
confirm=trueis a query param on a POST — acceptable since the body carries the pubkey payload and the endpoint requiresX-API-Key. A future improvement could move it to the body.- Single
DELETE FROM nodes WHERE public_key IN (...)is atomic — no partial-delete risk. - Same
X-API-Keygates all admin endpoints. Acceptable for a single-operator tool; if multi-operator access is added later, consider per-endpoint permissions. - No soft-delete/undo — nodes re-appear when the mesh re-advertises them, but historical metadata (first_seen, advert_count) is lost. Documented behavior; operator is warned via
confirm()dialog.
Merge verdict
Ready after rebase. PR is currently CONFLICTING against master. Depends on PR #736 (M3) — merge that first, then rebase this. Zero must-fix items remain.
|
@efiten same comment as elsewhere - please clean up this PR and get it ready for merging or abandon it. |
…bot#669 M3) Backend: - Add PUT /api/config/geo-filter (requires X-API-Key) — saves geo_filter back to config.json atomically and updates in-memory config immediately, no restart needed - Add SaveGeoFilter() to config.go: reads config as raw map (preserving _comment fields), updates geo_filter key, writes back via temp+rename - Add writeEnabled field to GET /api/config/geo-filter response so the frontend can gate editing controls on server write capability - Add Server.configDir field; wired from -config-dir flag in main.go - Tests: TestPutConfigGeoFilter (4 cases) + TestSaveGeoFilter (3 cases) Frontend: - Add GeoFilter tab (🗺️) to the customizer between Display and Export - Tab shows current polygon on a Leaflet map (read-only for all users) - Editing controls (undo, clear, buffer km, API key input, save/remove) are only revealed when the server reports writeEnabled=true — i.e. the deployment has a write-capable apiKey configured. Public instances see a read-only polygon view. - Save calls PUT /api/config/geo-filter; Remove clears the filter - Map is destroyed on tab switch and panel close to avoid Leaflet leaks Docs: - Add docs/user-guide/geofilter.md (full guide: config, customizer, builder, prune script, API) - Update configuration.md and customization.md with geo_filter section - Update config.example.json _comment to mention the Customizer tab Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clicking the small inline map in the customizer GeoFilter tab now opens a full-screen modal (92vw × 86vh) with Undo/Clear/Done/Cancel controls. The inline map becomes a read-only preview. Both maps and the standalone geofilter-builder.html now use CartoDB Positron (light) instead of dark. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix data race on s.cfg.GeoFilter: add cfgMu RWMutex with getGeoFilter/ setGeoFilter accessors used in all handler goroutines - Add 1 MB MaxBytesReader cap on PUT /api/config/geo-filter request body - Validate polygon coordinate ranges (lat ∈ [-90,90], lon ∈ [-180,180]) and reject NaN/Inf values - Cap polygon at 1000 points to bound storage and computation - Document writeEnabled information-leak trade-off with a comment - Add tests for coordinate range rejection and oversized polygon rejection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds POST /api/admin/prune-geo-filter endpoint (dry-run by default, ?confirm=true to delete). Wires a Prune section into the GeoFilter customizer tab — preview lists affected nodes, confirm deletes them. Requires write-capable apiKey (writeEnabled gate, same as PUT). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix TOCTOU race: confirm now requires pubkeys from preview in request body; server intersects with still-outside nodes so exactly the previewed set is deleted (no more, no less) - Add cascade comment to DeleteNodesByPubkeys documenting that only the nodes table is affected (no FK constraints today) - Log each deleted node by name + pubkey for operator visibility - Return deleted node list in confirm response so UI shows what happened - Check lat/lon nil directly instead of passing 0.0 to NodePassesGeoFilter - Update confirm test to send pubkeys body; add test for missing body → 400 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b3d29be to
0f565e4
Compare
|
Must-fix #1 (TOCTOU) is addressed:
Tests cover: dry-run preview, confirm delete with pubkeys, missing body → 400, no filter → 400, no API key → 401. Ready for re-review. |
…xample/builder Brings Kpa-clawbot#1289 (server read-only, prune moved to ingestor) and other master changes into the geo-prune branch. Resolutions: - cmd/server/config.go: keep both SaveGeoFilter (this branch) and obsBlacklistSet/AnalyticsConfig (master). - cmd/server/routes.go: drop /api/admin/prune handler+route per Kpa-clawbot#1283; retain /api/admin/prune-geo-filter (this branch). - cmd/server/db.go: drop server-side PruneOldPackets / PruneOldMetrics / RemoveStaleObservers (now ingestor-owned). DeleteNodesByPubkeys still present — moved to ingestor in the next commit. - config.example.json: prefer master's GeoFilter Builder wording. - public/geofilter-builder.html: take master version. - s.store.graph is now atomic.Pointer; add .Load() at the call site.
…pa-clawbot#738) After Kpa-clawbot#1283/Kpa-clawbot#1289 the server opens SQLite mode=ro. The geo-prune feature introduced in PR Kpa-clawbot#738 invokes DB.DeleteNodesByPubkeys from an HTTP handler; in production this would fail with 'attempt to write a readonly database' — the feature is dead on arrival. Tests pass only because setupTestDB uses :memory: without mode=ro. This commit reinstates the master-side cmd/server/readonly_invariant_test.go and extends TestServerDBHasNoWriteMethods to assert DeleteNodesByPubkeys is NOT a method on the server *DB. With the method still present (from the M4 feature commit), the test FAILS: readonly_invariant_test.go:85: server *DB exposes forbidden write method "DeleteNodesByPubkeys" — must be relocated to ingestor (Kpa-clawbot#1283) The next commit (GREEN) removes DeleteNodesByPubkeys from cmd/server, moves the DELETE to the ingestor via the new internal/prunequeue marker-file protocol, and rewrites handlePruneGeoFilter to enqueue requests rather than write directly.
…a-clawbot#738 / Kpa-clawbot#1289) GREEN for the invariant test added in the prior commit. Resolves the dead-on- arrival blocker introduced by PR Kpa-clawbot#738: the server opens SQLite mode=ro after Kpa-clawbot#1283/Kpa-clawbot#1289, so the `confirm=true` path of `/api/admin/prune-geo-filter` would fail in production with "attempt to write a readonly database". The author's tests only passed because setupTestDB uses `:memory:` without mode=ro. ## Architecture: file-marker queue (chosen over IPC/HTTP proxy) Server → marker file → ingestor → result file → server. New package `internal/prunequeue` defines the on-disk protocol: <dir(dbPath)>/prune-requests/request-<id>.json (written by server) <dir(dbPath)>/prune-requests/result-<id>.json (written by ingestor) Atomic via os.Rename. The ingestor removes the request file as part of writing the result, so the directory naturally drains. Why file markers over an internal localhost HTTP socket: * No new listener / port / auth surface to manage. * Inherits the same backup/permissions story as the SQLite file — they share a directory and a deployment. * Survives ingestor restarts: pending requests are processed on next boot. * Trivial to inspect / clear with `ls` / `trash` during ops. * Matches the existing maintenance-ticker pattern in cmd/ingestor/main.go. ## HTTP surface (unchanged for clients in spirit) POST /api/admin/prune-geo-filter * dry-run (default): same response shape as before — preview list. * confirm=true: now returns 202 Accepted + {requestId, statusUrl, count, nodes}. Body must still carry the pubkeys snapshot from the preview (TOCTOU guard preserved — the ingestor honors the list verbatim, does NOT re-evaluate geo_filter membership). GET /api/admin/prune-geo-filter/status?id=<id> (NEW, requires API key) * pending: 200 {requestId, status:"pending"} * done: 200 {requestId, status:"done", deleted, requestedAt, completedAt} * error: 200 {requestId, status:"error", error, ...} * 404 when neither marker nor result exist. * 400 for invalid ids (path-traversal guard: hex/[0-9a-f] only, ≤64 chars). ## TOCTOU + cascade-cleanup from prior CR * TOCTOU: server snapshots the preview list at confirm time and writes the PRUNED list (intersection of preview + still-outside) into the marker. The ingestor does NOT re-check geo_filter — it executes exactly what the operator confirmed. * Cascade cleanup: ingestor's DeleteNodesByPubkeys chunks under SQLite's variable limit (500 per stmt). It deliberately limits scope to the `nodes` table; the schema has no FKs, so observation/transmission rows are left for the regular packet-retention ticker. Documented in-line. ## Read-only invariant test Extends cmd/server/readonly_invariant_test.go (TestServerDBHasNoWriteMethods) to forbid `DeleteNodesByPubkeys` on the server `*DB`. With this commit the method is gone, so the test passes: $ go test -run TestServerDBHasNoWriteMethods ./... ok github.com/corescope/server 0.190s ## Tests * cmd/server/routes_test.go: rewrote the confirm-test to assert 202 + marker file existence; added pending→done status-roundtrip test; added 404 + path-traversal-id rejection tests. Old TestDeleteNodesByPubkeys removed (the method now lives in cmd/ingestor). * cmd/ingestor/prune_geofilter_test.go: end-to-end queue test (request file on disk → RunPendingPruneRequests → result file + DB rows actually deleted), plus empty-queue no-op. * internal/prunequeue/prunequeue_test.go: round-trip, path-traversal rejection, missing-result returns (nil, nil). ## Files * internal/prunequeue/{prunequeue.go,prunequeue_test.go,go.mod} — new package * cmd/ingestor/prune_geofilter.go — Store.DeleteNodesByPubkeys + RunPendingPruneRequests * cmd/ingestor/prune_geofilter_test.go — queue end-to-end test * cmd/ingestor/main.go — startup pass + 15s ticker for the queue, Stop on shutdown * cmd/server/db.go — DeleteNodesByPubkeys removed; only GetNodesForGeoPrune (read) remains * cmd/server/routes.go — handlePruneGeoFilter rewritten to enqueue; handlePruneGeoFilterStatus added; route registered * cmd/server/routes_test.go — new tests; old direct-DELETE coverage removed
Summary
POST /api/admin/prune-geo-filterendpoint — dry-run by default,?confirm=trueto permanently delete nodes outside the current geofilter polygon + buffer. RequiresX-API-Keyheader.writeEnabledgate as PUT). Preview lists affected nodes; Confirm delete removes them.GetNodesForGeoPruneandDeleteNodesByPubkeysDB helpers.docs/user-guide/geofilter.md— documents the UI button as primary workflow, CLI script as alternative.Test plan
cd cmd/server && go test ./...— all passapiKey— Prune section not visibleapiKey+ polygon active — Prune section visiblePOST /api/admin/prune-geo-filterwithoutX-API-Key→ 401POST /api/admin/prune-geo-filterwith no polygon configured → 400🤖 Generated with Claude Code