feat (pg): optimize graph schema for larger datasets#59
feat (pg): optimize graph schema for larger datasets#59zinic wants to merge 1 commit intoSpecterOps:mainfrom
Conversation
WalkthroughUpdated edge uniqueness and conflict-target column ordering from (graph_id, start_id, end_id, kind_id) to (start_id, end_id, kind_id, graph_id); consolidated and adjusted indexes, set fillfactor for HOT updates, and removed primary keys from certain temporary path tables. Minor regex/group parsing and query formatting adjustments accompany these schema changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
drivers/pg/query/sql/schema_up.sql (1)
461-485:⚠️ Potential issue | 🟡 MinorMisleading comment: this is not an anti-join pattern.
The comment on lines 462-463 states "The NOT EXISTS filter uses a LEFT JOIN anti-pattern for hash anti-join eligibility," but the actual
WHEREclause implements the opposite logic:where not r.is_cycle and r.satisfied is not null and (r.satisfied or e.end_id is not null);This keeps rows where either
r.satisfiedis true OR a matching edge exists (e.end_id is not null). This is a semi-join/existence check, not an anti-join. An anti-join would usee.end_id IS NULLto exclude rows with matching edges.Consider updating the comment to accurately describe the filtering logic: rows are retained if they're satisfied or have outgoing edges to explore.
📝 Suggested comment fix
--- swap_forward_front replaces the forward_front contents with filtered rows from next_front using -INSERT-SELECT instead of ALTER TABLE RENAME, avoiding AccessExclusiveLock catalog churn on temp tables. --- The NOT EXISTS filter uses a LEFT JOIN anti-pattern for hash anti-join eligibility. +-- swap_forward_front replaces the forward_front contents with filtered rows from next_front using +-- INSERT-SELECT instead of ALTER TABLE RENAME, avoiding AccessExclusiveLock catalog churn on temp tables. +-- Rows are retained if satisfied or if outgoing edges exist (checked via LEFT JOIN).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/query/sql/schema_up.sql` around lines 461 - 485, The comment above function swap_forward_front() is incorrect: the WHERE clause in the INSERT-select keeps rows where r.satisfied is true or a matching edge exists (e.end_id IS NOT NULL), so this is a semi-join/existence check, not an anti-join; update the comment to describe that rows are retained when they are satisfied or have outgoing edges to explore (or remove the misleading mention of a "NOT EXISTS/LEFT JOIN anti-pattern") and reference the INSERT into forward_front and the WHERE conditions (r.is_cycle, r.satisfied, e.end_id) to ensure the text matches the implemented logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 461-485: The comment above function swap_forward_front() is
incorrect: the WHERE clause in the INSERT-select keeps rows where r.satisfied is
true or a matching edge exists (e.end_id IS NOT NULL), so this is a
semi-join/existence check, not an anti-join; update the comment to describe that
rows are retained when they are satisfied or have outgoing edges to explore (or
remove the misleading mention of a "NOT EXISTS/LEFT JOIN anti-pattern") and
reference the INSERT into forward_front and the WHERE conditions (r.is_cycle,
r.satisfied, e.end_id) to ensure the text matches the implemented logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07f31577-843d-4652-9386-a70f26f53d2f
📒 Files selected for processing (3)
drivers/pg/query/format.godrivers/pg/query/sql/schema_up.sqldrivers/pg/statements.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
drivers/pg/query/sql/schema_up.sql (1)
366-387:⚠️ Potential issue | 🔴 CriticalAdd DISTINCT ON dedup to unidirectional_asp_harness and bidirectional_asp_harness before swaps, matching the pattern used in unidirectional_sp_harness.
After removing the primary key on the
pathcolumn (lines 366–387), deduplication responsibility shifts to harness-level logic. Theunidirectional_sp_harnessappliesDISTINCT ON (f.next_id)when copying data intoforward_front(line 555) and when inserting visited nodes (line 549). However,unidirectional_asp_harnessandbidirectional_asp_harnessexecute dynamic SQL parameters directly without equivalent dedup, and theswap_forward_front()andswap_backward_front()functions filter only by cycle/satisfaction conditions—not by deduplication. This allows duplicate paths to accumulate innext_front, potentially increasing memory usage and changing traversal behavior across iterations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/query/sql/schema_up.sql` around lines 366 - 387, The harnesses unidirectional_asp_harness and bidirectional_asp_harness no longer deduplicate before swapping fronts after the PK on path was removed, so duplicates can accumulate in forward_front/next_front; mirror the pattern from unidirectional_sp_harness by applying DISTINCT ON (f.next_id) when copying into forward_front and when inserting visited nodes (the same places where unidirectional_sp_harness uses DISTINCT ON) so that swap_forward_front() and swap_backward_front() operate on deduplicated rows and next_front does not grow with duplicate paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 366-387: The harnesses unidirectional_asp_harness and
bidirectional_asp_harness no longer deduplicate before swapping fronts after the
PK on path was removed, so duplicates can accumulate in
forward_front/next_front; mirror the pattern from unidirectional_sp_harness by
applying DISTINCT ON (f.next_id) when copying into forward_front and when
inserting visited nodes (the same places where unidirectional_sp_harness uses
DISTINCT ON) so that swap_forward_front() and swap_backward_front() operate on
deduplicated rows and next_front does not grow with duplicate paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1333cc7e-0fa0-4a07-86c0-e05686847ab4
📒 Files selected for processing (5)
drivers/pg/query/definitions.godrivers/pg/query/format.godrivers/pg/query/query.godrivers/pg/query/sql/schema_up.sqldrivers/pg/statements.go
✅ Files skipped from review due to trivial changes (1)
- drivers/pg/query/format.go
🚧 Files skipped from review as they are similar to previous changes (1)
- drivers/pg/statements.go
Description
Resolves: <TICKET_OR_ISSUE_NUMBER>
edge_start_id_index,edge_end_id_index,edge_kind_index) that are already covered by composite index left-prefixes.edge_start_kind_indexandedge_end_kind_indexto enable index-only scans during traversal joins(start_id, end_id, kind_id, graph_id)so the useful columns form the B-tree left prefixpath int8[]arrays in pathspace temp tables; deduplication is already handled by DISTINCT ON in harness functionsALTER TABLE RENAMEswap pattern withTRUNCATE+INSERT SELECTusingLEFT JOINanti-join, avoiding AccessExclusiveLock catalog churnType of Change
Testing
go test -tags manual_integration ./integration/...)Screenshots (if appropriate):
Driver Impact
drivers/pg)drivers/neo4j)Checklist
go.mod/go.sumare up to date if dependencies changedSummary by CodeRabbit