perf(pgsql): optimize shortest path harness functions by zinic · Pull Request #60 · SpecterOps/DAWGS

zinic · 2026-04-10T15:43:38Z

Description

Refactor edges_to_path to scan the edge table once via CTE instead of three separate scans; fix volatility from immutable to stable
Drop primary keys on variable-length int8[] path columns in temp tables to eliminate O(depth) B-tree maintenance per insert
Remove redundant satisfied/is_cycle partial indexes on frontier tables, keeping only the next_id btree needed for expansion joins
Fix cross-join bug in SP harness DELETE USING visited that caused O(|next_front| × |visited|) evaluation; split into two statements
Replace EXISTS + RETURN QUERY double-scans with RETURN QUERY + GET DIAGNOSTICS single-pass pattern in all ASP harness satisfaction checks
Move cycle/null-satisfaction pruning from swap functions into harness callers to avoid redundant post-swap scans; retain only dead-end pruning in swap functions and correct its join direction
Add per-direction visited sets to ASP harnesses to prevent exponential frontier growth from re-expanding nodes at every depth level
Cache frontier sizes returned by swap functions and use them for loop termination and smaller-frontier selection, eliminating EXISTS and COUNT(*) full-table scans each iteration
Remove unconditional COUNT(*) subqueries from raise debug statements
Use writable CTE for single-pass dedup/split in SP harness instead of two sequential scans of next_front

Resolves: BED-420

Type of Change

Chore (a change that does not modify the application functionality)
Bug fix (a change that fixes an issue)
New feature / enhancement (a change that adds new functionality)
Refactor (no behaviour change)
Test coverage
Build / CI / tooling
Documentation

Testing

Unit tests added / updated
Integration tests added / updated
Manual integration tests run (go test -tags manual_integration ./integration/...)

Screenshots (if appropriate):

Driver Impact

PostgreSQL driver (drivers/pg)
Neo4j driver (drivers/neo4j)

Checklist

Code is formatted
All existing tests pass
go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

Bug Fixes
- Improved path computation logic and frontier expansion efficiency in database queries.
- Enhanced anti-reexpansion behavior for more accurate path calculations.
Chores
- Optimized database schema to reduce unnecessary scans and improve query performance.
- Updated function signatures for better resource management.

- Refactor edges_to_path to scan the edge table once via CTE instead of three separate scans; fix volatility from immutable to stable - Drop primary keys on variable-length int8[] path columns in temp tables to eliminate O(depth) B-tree maintenance per insert - Remove redundant satisfied/is_cycle partial indexes on frontier tables, keeping only the next_id btree needed for expansion joins - Fix cross-join bug in SP harness DELETE USING visited that caused O(|next_front| × |visited|) evaluation; split into two statements - Replace EXISTS + RETURN QUERY double-scans with RETURN QUERY + GET DIAGNOSTICS single-pass pattern in all ASP harness satisfaction checks - Move cycle/null-satisfaction pruning from swap functions into harness callers to avoid redundant post-swap scans; retain only dead-end pruning in swap functions and correct its join direction - Add per-direction visited sets to ASP harnesses to prevent exponential frontier growth from re-expanding nodes at every depth level - Cache frontier sizes returned by swap functions and use them for loop termination and smaller-frontier selection, eliminating EXISTS and COUNT(\*) full-table scans each iteration - Remove unconditional COUNT(\*) subqueries from raise debug statements - Use writable CTE for single-pass dedup/split in SP harness instead of two sequential scans of next_front

coderabbitai · 2026-04-10T15:48:00Z

Walkthrough

The changes refactor PostgreSQL functions for graph path computation and traversal in a database schema. Key modifications include optimizing edges_to_path() with a single CTE and changing its volatility to stable, altering swap_forward_front() and swap_backward_front() to return int4 values instead of void, and restructuring expansion logic in ASP harness functions to use temporary visited sets and eliminate redundant COUNT operations.

Changes

Cohort / File(s)	Summary
Graph Path & Frontier Management Functions `drivers/pg/query/sql/schema_up.sql`	Optimized `edges_to_path()` using single CTE (changed volatility to `stable`); modified `swap_forward_front()` and `swap_backward_front()` return types (`void` → `int4`) for frontier size caching; refactored `create_unidirectional_pathspace_tables()` and `create_bidirectional_pathspace_tables()` by removing primary keys and secondary indexes; reworked unidirectional and bidirectional ASP harness functions to introduce temporary visited sets, prune previously-visited nodes, eliminate COUNT subqueries from debug messages, and restructure frontier deduplication/routing and termination logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant EdgePath[edges_to_path]
    participant Harness[ASP Harness]
    participant VisitedSet[Temporary Visited Set]
    participant SwapFunc[swap_*_front()]
    participant FrontierMgmt[Frontier Management]
    
    Client->>Harness: Start expansion
    Note over Harness: Initialize frontier, depth counter
    
    loop Expansion iteration
        Harness->>EdgePath: Compute paths from current frontier
        EdgePath-->>Harness: Return pathComposite (single CTE)
        
        Harness->>Harness: Load next_front nodes
        Harness->>VisitedSet: Check if next_id already visited
        VisitedSet-->>Harness: Return visited status
        
        Note over Harness: Prune nodes with shallower depth visits
        
        Harness->>FrontierMgmt: Deduplicate, prefer satisfied nodes
        FrontierMgmt-->>Harness: Return deduplicated frontier
        
        Harness->>SwapFunc: swap_*_front() for next iteration
        SwapFunc->>SwapFunc: Delete unsatisfied frontier nodes
        SwapFunc-->>Harness: Return int4 (frontier size)
        
        Note over Harness: Use returned count (cached)
        
        Harness->>VisitedSet: Update visited set with current depth
        
        alt Bidirectional: meet-in-the-middle
            Harness->>Harness: Join forward/backward frontiers
            Harness-->>Client: Return meeting paths
        end
    end
    
    Note over Harness: Exit when frontier empty or depth limit reached
    Harness-->>Client: Return final results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy paths through SQL's grand maze,
Where visited sets mark our ways,
Frontiers swap and prune with care,
CTEs streamline everywhere,
Optimization makes graphs fare!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the main performance optimization of shortest path harness functions in PostgreSQL driver with appropriate specificity.
Description check	✅ Passed	Description provides comprehensive details of all changes, includes the required issue reference (BED-420), appropriate type and driver selections, and confirms test/format completion.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

drivers/pg/query/sql/schema_up.sql (1)

656-668: Effective RETURN QUERY + GET DIAGNOSTICS pattern.

The pattern correctly replaces the previous double-scan (EXISTS check followed by RETURN QUERY). However, I notice that the return value from swap_forward_front() is discarded here (line 668 uses perform), while the loop termination on line 635 still uses exists(select 1 from forward_front).

For consistency with the bidirectional harness optimization, consider caching the swap return value to eliminate the EXISTS scan:

♻️ Optional: use swap return value for loop termination

 declare
   forward_front_depth int4 := 0;
   row_count           int4;
+  forward_front_size  int4 := 0;
 begin
   ...
-  while forward_front_depth < max_depth and (forward_front_depth = 0 or exists(select 1 from forward_front))
+  while forward_front_depth < max_depth and (forward_front_depth = 0 or forward_front_size > 0)
     loop
     ...
       -- Swap the next_front table into the forward_front
-      perform swap_forward_front();
+      select swap_forward_front() into forward_front_size;
     end loop;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/sql/schema_up.sql` around lines 656 - 668, The code
currently discards the return value of swap_forward_front() (performed with
PERFORM) and later uses an EXISTS(select 1 from forward_front) scan to decide
loop termination; change this to capture swap_forward_front()'s return into a
boolean (or integer) variable (e.g., has_forward) and use that cached value for
the loop termination check instead of EXISTS, so modify the call to
swap_forward_front() (in this block that inserts into asp_visited and then calls
swap_forward_front()) to return a value and assign it, and replace any
subsequent EXISTS(select 1 from forward_front) logic with the captured variable
(referencing swap_forward_front(), forward_front, next_front, and asp_visited to
locate the places to update).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 475-481: The comment above the DELETE on table forward_front
incorrectly names a non-existent index; update the comment to reference the
actual covering index on the edge table used by the predicate e.start_id =
r.next_id by replacing edge_start_id_kind_id_id_end_id_index with
edge_start_id_index (the existing index on edge(start_id)), and keep the
surrounding explanation about dead-ends and the join condition involving
forward_front.next_id and edge.start_id intact; reference the tables/columns
forward_front.next_id, edge.start_id and the index edge_start_id_index so
readers can locate the relevant objects.

---

Nitpick comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 656-668: The code currently discards the return value of
swap_forward_front() (performed with PERFORM) and later uses an EXISTS(select 1
from forward_front) scan to decide loop termination; change this to capture
swap_forward_front()'s return into a boolean (or integer) variable (e.g.,
has_forward) and use that cached value for the loop termination check instead of
EXISTS, so modify the call to swap_forward_front() (in this block that inserts
into asp_visited and then calls swap_forward_front()) to return a value and
assign it, and replace any subsequent EXISTS(select 1 from forward_front) logic
with the captured variable (referencing swap_forward_front(), forward_front,
next_front, and asp_visited to locate the places to update).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7571324f-c399-4333-bd97-fdce24cccde1

📥 Commits

Reviewing files that changed from the base of the PR and between 4b82ee3 and 52632c7.

📒 Files selected for processing (1)

drivers/pg/query/sql/schema_up.sql

coderabbitai · 2026-04-10T15:56:09Z

drivers/pg/query/sql/schema_up.sql

+  -- Remove dead-ends: unsatisfied frontier nodes that have no outgoing edges in the graph
+  -- and therefore can never extend a forward path. The forward expansion step joins on
+  -- e.start_id = frontier.next_id, so nodes without any outgoing edge (start_id) are
+  -- guaranteed dead-ends. Uses the edge_start_id_kind_id_id_end_id_index covering index.
+  delete from forward_front r
+  where not r.satisfied
+    and not exists (select 1 from edge e where e.start_id = r.next_id);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for the referenced index name in the codebase rg -n "edge_start_id_kind_id" --type sql

Repository: SpecterOps/DAWGS

Length of output: 188

🏁 Script executed:

#!/bin/bash # Search for all edge table indexes in SQL files rg -n "create.*index" drivers/pg/query/sql/schema_up.sql | head -20 # Also search for the full index name in all file types rg -n "edge_start_id_kind_id_id_end_id" .

Repository: SpecterOps/DAWGS

Length of output: 1128

🏁 Script executed:

#!/bin/bash # Look at the area around line 183-193 to see the actual indexes defined sed -n '180,200p' drivers/pg/query/sql/schema_up.sql

Repository: SpecterOps/DAWGS

Length of output: 854

Update comment to reference the correct index.

The comment on line 478 references edge_start_id_kind_id_id_end_id_index, but this index does not exist in the schema. The actual indexes defined on the edge table are:

edge_start_id_index on (start_id) (line 183)

edge_start_kind_index on (start_id, kind_id) (line 193)

Since the query filters on e.start_id = r.next_id, use edge_start_id_index in the comment instead of the non-existent index name.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@drivers/pg/query/sql/schema_up.sql` around lines 475 - 481, The comment above the DELETE on table forward_front incorrectly names a non-existent index; update the comment to reference the actual covering index on the edge table used by the predicate e.start_id = r.next_id by replacing edge_start_id_kind_id_id_end_id_index with edge_start_id_index (the existing index on edge(start_id)), and keep the surrounding explanation about dead-ends and the join condition involving forward_front.next_id and edge.start_id intact; reference the tables/columns forward_front.next_id, edge.start_id and the index edge_start_id_index so readers can locate the relevant objects.

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pgsql): optimize shortest path harness functions#60

perf(pgsql): optimize shortest path harness functions#60
zinic wants to merge 1 commit intoSpecterOps:mainfrom
zinic:sp-harness-work

zinic commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zinic commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Screenshots (if appropriate):

Driver Impact

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zinic commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading