perf(pgsql): optimize shortest path harness functions#60
perf(pgsql): optimize shortest path harness functions#60zinic wants to merge 1 commit intoSpecterOps:mainfrom
Conversation
- Refactor edges_to_path to scan the edge table once via CTE instead of three separate scans; fix volatility from immutable to stable - Drop primary keys on variable-length int8[] path columns in temp tables to eliminate O(depth) B-tree maintenance per insert - Remove redundant satisfied/is_cycle partial indexes on frontier tables, keeping only the next_id btree needed for expansion joins - Fix cross-join bug in SP harness DELETE USING visited that caused O(|next_front| × |visited|) evaluation; split into two statements - Replace EXISTS + RETURN QUERY double-scans with RETURN QUERY + GET DIAGNOSTICS single-pass pattern in all ASP harness satisfaction checks - Move cycle/null-satisfaction pruning from swap functions into harness callers to avoid redundant post-swap scans; retain only dead-end pruning in swap functions and correct its join direction - Add per-direction visited sets to ASP harnesses to prevent exponential frontier growth from re-expanding nodes at every depth level - Cache frontier sizes returned by swap functions and use them for loop termination and smaller-frontier selection, eliminating EXISTS and COUNT(\*) full-table scans each iteration - Remove unconditional COUNT(\*) subqueries from raise debug statements - Use writable CTE for single-pass dedup/split in SP harness instead of two sequential scans of next_front
WalkthroughThe changes refactor PostgreSQL functions for graph path computation and traversal in a database schema. Key modifications include optimizing Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant EdgePath[edges_to_path]
participant Harness[ASP Harness]
participant VisitedSet[Temporary Visited Set]
participant SwapFunc[swap_*_front()]
participant FrontierMgmt[Frontier Management]
Client->>Harness: Start expansion
Note over Harness: Initialize frontier, depth counter
loop Expansion iteration
Harness->>EdgePath: Compute paths from current frontier
EdgePath-->>Harness: Return pathComposite (single CTE)
Harness->>Harness: Load next_front nodes
Harness->>VisitedSet: Check if next_id already visited
VisitedSet-->>Harness: Return visited status
Note over Harness: Prune nodes with shallower depth visits
Harness->>FrontierMgmt: Deduplicate, prefer satisfied nodes
FrontierMgmt-->>Harness: Return deduplicated frontier
Harness->>SwapFunc: swap_*_front() for next iteration
SwapFunc->>SwapFunc: Delete unsatisfied frontier nodes
SwapFunc-->>Harness: Return int4 (frontier size)
Note over Harness: Use returned count (cached)
Harness->>VisitedSet: Update visited set with current depth
alt Bidirectional: meet-in-the-middle
Harness->>Harness: Join forward/backward frontiers
Harness-->>Client: Return meeting paths
end
end
Note over Harness: Exit when frontier empty or depth limit reached
Harness-->>Client: Return final results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
drivers/pg/query/sql/schema_up.sql (1)
656-668: Effective RETURN QUERY + GET DIAGNOSTICS pattern.The pattern correctly replaces the previous double-scan (EXISTS check followed by RETURN QUERY). However, I notice that the return value from
swap_forward_front()is discarded here (line 668 usesperform), while the loop termination on line 635 still usesexists(select 1 from forward_front).For consistency with the bidirectional harness optimization, consider caching the swap return value to eliminate the EXISTS scan:
♻️ Optional: use swap return value for loop termination
declare forward_front_depth int4 := 0; row_count int4; + forward_front_size int4 := 0; begin ... - while forward_front_depth < max_depth and (forward_front_depth = 0 or exists(select 1 from forward_front)) + while forward_front_depth < max_depth and (forward_front_depth = 0 or forward_front_size > 0) loop ... -- Swap the next_front table into the forward_front - perform swap_forward_front(); + select swap_forward_front() into forward_front_size; end loop;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/query/sql/schema_up.sql` around lines 656 - 668, The code currently discards the return value of swap_forward_front() (performed with PERFORM) and later uses an EXISTS(select 1 from forward_front) scan to decide loop termination; change this to capture swap_forward_front()'s return into a boolean (or integer) variable (e.g., has_forward) and use that cached value for the loop termination check instead of EXISTS, so modify the call to swap_forward_front() (in this block that inserts into asp_visited and then calls swap_forward_front()) to return a value and assign it, and replace any subsequent EXISTS(select 1 from forward_front) logic with the captured variable (referencing swap_forward_front(), forward_front, next_front, and asp_visited to locate the places to update).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 475-481: The comment above the DELETE on table forward_front
incorrectly names a non-existent index; update the comment to reference the
actual covering index on the edge table used by the predicate e.start_id =
r.next_id by replacing edge_start_id_kind_id_id_end_id_index with
edge_start_id_index (the existing index on edge(start_id)), and keep the
surrounding explanation about dead-ends and the join condition involving
forward_front.next_id and edge.start_id intact; reference the tables/columns
forward_front.next_id, edge.start_id and the index edge_start_id_index so
readers can locate the relevant objects.
---
Nitpick comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 656-668: The code currently discards the return value of
swap_forward_front() (performed with PERFORM) and later uses an EXISTS(select 1
from forward_front) scan to decide loop termination; change this to capture
swap_forward_front()'s return into a boolean (or integer) variable (e.g.,
has_forward) and use that cached value for the loop termination check instead of
EXISTS, so modify the call to swap_forward_front() (in this block that inserts
into asp_visited and then calls swap_forward_front()) to return a value and
assign it, and replace any subsequent EXISTS(select 1 from forward_front) logic
with the captured variable (referencing swap_forward_front(), forward_front,
next_front, and asp_visited to locate the places to update).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7571324f-c399-4333-bd97-fdce24cccde1
📒 Files selected for processing (1)
drivers/pg/query/sql/schema_up.sql
| -- Remove dead-ends: unsatisfied frontier nodes that have no outgoing edges in the graph | ||
| -- and therefore can never extend a forward path. The forward expansion step joins on | ||
| -- e.start_id = frontier.next_id, so nodes without any outgoing edge (start_id) are | ||
| -- guaranteed dead-ends. Uses the edge_start_id_kind_id_id_end_id_index covering index. | ||
| delete from forward_front r | ||
| where not r.satisfied | ||
| and not exists (select 1 from edge e where e.start_id = r.next_id); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the referenced index name in the codebase
rg -n "edge_start_id_kind_id" --type sqlRepository: SpecterOps/DAWGS
Length of output: 188
🏁 Script executed:
#!/bin/bash
# Search for all edge table indexes in SQL files
rg -n "create.*index" drivers/pg/query/sql/schema_up.sql | head -20
# Also search for the full index name in all file types
rg -n "edge_start_id_kind_id_id_end_id" .Repository: SpecterOps/DAWGS
Length of output: 1128
🏁 Script executed:
#!/bin/bash
# Look at the area around line 183-193 to see the actual indexes defined
sed -n '180,200p' drivers/pg/query/sql/schema_up.sqlRepository: SpecterOps/DAWGS
Length of output: 854
Update comment to reference the correct index.
The comment on line 478 references edge_start_id_kind_id_id_end_id_index, but this index does not exist in the schema. The actual indexes defined on the edge table are:
edge_start_id_indexon(start_id)(line 183)edge_start_kind_indexon(start_id, kind_id)(line 193)
Since the query filters on e.start_id = r.next_id, use edge_start_id_index in the comment instead of the non-existent index name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@drivers/pg/query/sql/schema_up.sql` around lines 475 - 481, The comment above
the DELETE on table forward_front incorrectly names a non-existent index; update
the comment to reference the actual covering index on the edge table used by the
predicate e.start_id = r.next_id by replacing
edge_start_id_kind_id_id_end_id_index with edge_start_id_index (the existing
index on edge(start_id)), and keep the surrounding explanation about dead-ends
and the join condition involving forward_front.next_id and edge.start_id intact;
reference the tables/columns forward_front.next_id, edge.start_id and the index
edge_start_id_index so readers can locate the relevant objects.
Description
Resolves: BED-420
Type of Change
Testing
go test -tags manual_integration ./integration/...)Screenshots (if appropriate):
Driver Impact
drivers/pg)drivers/neo4j)Checklist
go.mod/go.sumare up to date if dependencies changedSummary by CodeRabbit
Bug Fixes
Chores