Skip to content

perf(pgsql): optimize shortest path harness functions#60

Open
zinic wants to merge 1 commit intoSpecterOps:mainfrom
zinic:sp-harness-work
Open

perf(pgsql): optimize shortest path harness functions#60
zinic wants to merge 1 commit intoSpecterOps:mainfrom
zinic:sp-harness-work

Conversation

@zinic
Copy link
Copy Markdown
Contributor

@zinic zinic commented Apr 10, 2026

Description

  • Refactor edges_to_path to scan the edge table once via CTE instead of three separate scans; fix volatility from immutable to stable
  • Drop primary keys on variable-length int8[] path columns in temp tables to eliminate O(depth) B-tree maintenance per insert
  • Remove redundant satisfied/is_cycle partial indexes on frontier tables, keeping only the next_id btree needed for expansion joins
  • Fix cross-join bug in SP harness DELETE USING visited that caused O(|next_front| × |visited|) evaluation; split into two statements
  • Replace EXISTS + RETURN QUERY double-scans with RETURN QUERY + GET DIAGNOSTICS single-pass pattern in all ASP harness satisfaction checks
  • Move cycle/null-satisfaction pruning from swap functions into harness callers to avoid redundant post-swap scans; retain only dead-end pruning in swap functions and correct its join direction
  • Add per-direction visited sets to ASP harnesses to prevent exponential frontier growth from re-expanding nodes at every depth level
  • Cache frontier sizes returned by swap functions and use them for loop termination and smaller-frontier selection, eliminating EXISTS and COUNT(*) full-table scans each iteration
  • Remove unconditional COUNT(*) subqueries from raise debug statements
  • Use writable CTE for single-pass dedup/split in SP harness instead of two sequential scans of next_front

Resolves: BED-420

Type of Change

  • Chore (a change that does not modify the application functionality)
  • Bug fix (a change that fixes an issue)
  • New feature / enhancement (a change that adds new functionality)
  • Refactor (no behaviour change)
  • Test coverage
  • Build / CI / tooling
  • Documentation

Testing

  • Unit tests added / updated
  • Integration tests added / updated
  • Manual integration tests run (go test -tags manual_integration ./integration/...)

Screenshots (if appropriate):

Driver Impact

  • PostgreSQL driver (drivers/pg)
  • Neo4j driver (drivers/neo4j)

Checklist

  • Code is formatted
  • All existing tests pass
  • go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

  • Bug Fixes

    • Improved path computation logic and frontier expansion efficiency in database queries.
    • Enhanced anti-reexpansion behavior for more accurate path calculations.
  • Chores

    • Optimized database schema to reduce unnecessary scans and improve query performance.
    • Updated function signatures for better resource management.

- Refactor edges_to_path to scan the edge table once via CTE instead of
  three separate scans; fix volatility from immutable to stable
- Drop primary keys on variable-length int8[] path columns in temp
  tables to eliminate O(depth) B-tree maintenance per insert
- Remove redundant satisfied/is_cycle partial indexes on frontier tables,
  keeping only the next_id btree needed for expansion joins
- Fix cross-join bug in SP harness DELETE USING visited that caused
  O(|next_front| × |visited|) evaluation; split into two statements
- Replace EXISTS + RETURN QUERY double-scans with RETURN QUERY + GET
  DIAGNOSTICS single-pass pattern in all ASP harness satisfaction checks
- Move cycle/null-satisfaction pruning from swap functions into harness
  callers to avoid redundant post-swap scans; retain only dead-end
  pruning in swap functions and correct its join direction
- Add per-direction visited sets to ASP harnesses to prevent exponential
  frontier growth from re-expanding nodes at every depth level
- Cache frontier sizes returned by swap functions and use them for loop
  termination and smaller-frontier selection, eliminating EXISTS and
  COUNT(\*) full-table scans each iteration
- Remove unconditional COUNT(\*) subqueries from raise debug statements
- Use writable CTE for single-pass dedup/split in SP harness instead of
  two sequential scans of next_front
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Walkthrough

The changes refactor PostgreSQL functions for graph path computation and traversal in a database schema. Key modifications include optimizing edges_to_path() with a single CTE and changing its volatility to stable, altering swap_forward_front() and swap_backward_front() to return int4 values instead of void, and restructuring expansion logic in ASP harness functions to use temporary visited sets and eliminate redundant COUNT operations.

Changes

Cohort / File(s) Summary
Graph Path & Frontier Management Functions
drivers/pg/query/sql/schema_up.sql
Optimized edges_to_path() using single CTE (changed volatility to stable); modified swap_forward_front() and swap_backward_front() return types (voidint4) for frontier size caching; refactored create_unidirectional_pathspace_tables() and create_bidirectional_pathspace_tables() by removing primary keys and secondary indexes; reworked unidirectional and bidirectional ASP harness functions to introduce temporary visited sets, prune previously-visited nodes, eliminate COUNT subqueries from debug messages, and restructure frontier deduplication/routing and termination logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant EdgePath[edges_to_path]
    participant Harness[ASP Harness]
    participant VisitedSet[Temporary Visited Set]
    participant SwapFunc[swap_*_front()]
    participant FrontierMgmt[Frontier Management]
    
    Client->>Harness: Start expansion
    Note over Harness: Initialize frontier, depth counter
    
    loop Expansion iteration
        Harness->>EdgePath: Compute paths from current frontier
        EdgePath-->>Harness: Return pathComposite (single CTE)
        
        Harness->>Harness: Load next_front nodes
        Harness->>VisitedSet: Check if next_id already visited
        VisitedSet-->>Harness: Return visited status
        
        Note over Harness: Prune nodes with shallower depth visits
        
        Harness->>FrontierMgmt: Deduplicate, prefer satisfied nodes
        FrontierMgmt-->>Harness: Return deduplicated frontier
        
        Harness->>SwapFunc: swap_*_front() for next iteration
        SwapFunc->>SwapFunc: Delete unsatisfied frontier nodes
        SwapFunc-->>Harness: Return int4 (frontier size)
        
        Note over Harness: Use returned count (cached)
        
        Harness->>VisitedSet: Update visited set with current depth
        
        alt Bidirectional: meet-in-the-middle
            Harness->>Harness: Join forward/backward frontiers
            Harness-->>Client: Return meeting paths
        end
    end
    
    Note over Harness: Exit when frontier empty or depth limit reached
    Harness-->>Client: Return final results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy paths through SQL's grand maze,
Where visited sets mark our ways,
Frontiers swap and prune with care,
CTEs streamline everywhere,
Optimization makes graphs fare!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main performance optimization of shortest path harness functions in PostgreSQL driver with appropriate specificity.
Description check ✅ Passed Description provides comprehensive details of all changes, includes the required issue reference (BED-420), appropriate type and driver selections, and confirms test/format completion.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
drivers/pg/query/sql/schema_up.sql (1)

656-668: Effective RETURN QUERY + GET DIAGNOSTICS pattern.

The pattern correctly replaces the previous double-scan (EXISTS check followed by RETURN QUERY). However, I notice that the return value from swap_forward_front() is discarded here (line 668 uses perform), while the loop termination on line 635 still uses exists(select 1 from forward_front).

For consistency with the bidirectional harness optimization, consider caching the swap return value to eliminate the EXISTS scan:

♻️ Optional: use swap return value for loop termination
 declare
   forward_front_depth int4 := 0;
   row_count           int4;
+  forward_front_size  int4 := 0;
 begin
   ...
-  while forward_front_depth < max_depth and (forward_front_depth = 0 or exists(select 1 from forward_front))
+  while forward_front_depth < max_depth and (forward_front_depth = 0 or forward_front_size > 0)
     loop
     ...
       -- Swap the next_front table into the forward_front
-      perform swap_forward_front();
+      select swap_forward_front() into forward_front_size;
     end loop;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/sql/schema_up.sql` around lines 656 - 668, The code
currently discards the return value of swap_forward_front() (performed with
PERFORM) and later uses an EXISTS(select 1 from forward_front) scan to decide
loop termination; change this to capture swap_forward_front()'s return into a
boolean (or integer) variable (e.g., has_forward) and use that cached value for
the loop termination check instead of EXISTS, so modify the call to
swap_forward_front() (in this block that inserts into asp_visited and then calls
swap_forward_front()) to return a value and assign it, and replace any
subsequent EXISTS(select 1 from forward_front) logic with the captured variable
(referencing swap_forward_front(), forward_front, next_front, and asp_visited to
locate the places to update).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 475-481: The comment above the DELETE on table forward_front
incorrectly names a non-existent index; update the comment to reference the
actual covering index on the edge table used by the predicate e.start_id =
r.next_id by replacing edge_start_id_kind_id_id_end_id_index with
edge_start_id_index (the existing index on edge(start_id)), and keep the
surrounding explanation about dead-ends and the join condition involving
forward_front.next_id and edge.start_id intact; reference the tables/columns
forward_front.next_id, edge.start_id and the index edge_start_id_index so
readers can locate the relevant objects.

---

Nitpick comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 656-668: The code currently discards the return value of
swap_forward_front() (performed with PERFORM) and later uses an EXISTS(select 1
from forward_front) scan to decide loop termination; change this to capture
swap_forward_front()'s return into a boolean (or integer) variable (e.g.,
has_forward) and use that cached value for the loop termination check instead of
EXISTS, so modify the call to swap_forward_front() (in this block that inserts
into asp_visited and then calls swap_forward_front()) to return a value and
assign it, and replace any subsequent EXISTS(select 1 from forward_front) logic
with the captured variable (referencing swap_forward_front(), forward_front,
next_front, and asp_visited to locate the places to update).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7571324f-c399-4333-bd97-fdce24cccde1

📥 Commits

Reviewing files that changed from the base of the PR and between 4b82ee3 and 52632c7.

📒 Files selected for processing (1)
  • drivers/pg/query/sql/schema_up.sql

Comment on lines +475 to +481
-- Remove dead-ends: unsatisfied frontier nodes that have no outgoing edges in the graph
-- and therefore can never extend a forward path. The forward expansion step joins on
-- e.start_id = frontier.next_id, so nodes without any outgoing edge (start_id) are
-- guaranteed dead-ends. Uses the edge_start_id_kind_id_id_end_id_index covering index.
delete from forward_front r
where not r.satisfied
and not exists (select 1 from edge e where e.start_id = r.next_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the referenced index name in the codebase
rg -n "edge_start_id_kind_id" --type sql

Repository: SpecterOps/DAWGS

Length of output: 188


🏁 Script executed:

#!/bin/bash
# Search for all edge table indexes in SQL files
rg -n "create.*index" drivers/pg/query/sql/schema_up.sql | head -20

# Also search for the full index name in all file types
rg -n "edge_start_id_kind_id_id_end_id" .

Repository: SpecterOps/DAWGS

Length of output: 1128


🏁 Script executed:

#!/bin/bash
# Look at the area around line 183-193 to see the actual indexes defined
sed -n '180,200p' drivers/pg/query/sql/schema_up.sql

Repository: SpecterOps/DAWGS

Length of output: 854


Update comment to reference the correct index.

The comment on line 478 references edge_start_id_kind_id_id_end_id_index, but this index does not exist in the schema. The actual indexes defined on the edge table are:

  • edge_start_id_index on (start_id) (line 183)
  • edge_start_kind_index on (start_id, kind_id) (line 193)

Since the query filters on e.start_id = r.next_id, use edge_start_id_index in the comment instead of the non-existent index name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/sql/schema_up.sql` around lines 475 - 481, The comment above
the DELETE on table forward_front incorrectly names a non-existent index; update
the comment to reference the actual covering index on the edge table used by the
predicate e.start_id = r.next_id by replacing
edge_start_id_kind_id_id_end_id_index with edge_start_id_index (the existing
index on edge(start_id)), and keep the surrounding explanation about dead-ends
and the join condition involving forward_front.next_id and edge.start_id intact;
reference the tables/columns forward_front.next_id, edge.start_id and the index
edge_start_id_index so readers can locate the relevant objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant