Skip to content

feat (pg): optimize graph schema for larger datasets#59

Open
zinic wants to merge 1 commit intoSpecterOps:mainfrom
zinic:schema-updates
Open

feat (pg): optimize graph schema for larger datasets#59
zinic wants to merge 1 commit intoSpecterOps:mainfrom
zinic:schema-updates

Conversation

@zinic
Copy link
Copy Markdown
Contributor

@zinic zinic commented Apr 9, 2026

Description

Resolves: <TICKET_OR_ISSUE_NUMBER>

  • Drop redundant edge indexes (edge_start_id_index, edge_end_id_index, edge_kind_index) that are already covered by composite index left-prefixes.
  • Add INCLUDE columns to edge_start_kind_index and edge_end_kind_index to enable index-only scans during traversal joins
  • Set fillfactor=80 on node and edge tables for HOT update headroom
  • Rewrite edges_to_path to scan the edge table once via lateral join instead of twice via UNION subqueries
  • Reorder edge unique constraint to (start_id, end_id, kind_id, graph_id) so the useful columns form the B-tree left prefix
  • Remove primary key on path int8[] arrays in pathspace temp tables; deduplication is already handled by DISTINCT ON in harness functions
  • Replace ALTER TABLE RENAME swap pattern with TRUNCATE + INSERT SELECT using LEFT JOIN anti-join, avoiding AccessExclusiveLock catalog churn

Type of Change

  • Chore (a change that does not modify the application functionality)
  • Bug fix (a change that fixes an issue)
  • New feature / enhancement (a change that adds new functionality)
  • Refactor (no behaviour change)
  • Test coverage
  • Build / CI / tooling
  • Documentation

Testing

  • Unit tests added / updated
  • Integration tests added / updated
  • Manual integration tests run (go test -tags manual_integration ./integration/...)

Screenshots (if appropriate):

Driver Impact

  • PostgreSQL driver (drivers/pg)
  • Neo4j driver (drivers/neo4j)

Checklist

  • Code is formatted
  • All existing tests pass
  • go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

  • Optimizations
    • Restructured database indexes and uniqueness constraint ordering to improve graph query and path lookup performance and resource usage.
    • Adjusted batch insert/update behavior to be more consistent and deterministic under conflict, improving stability for relationship writes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Walkthrough

Updated edge uniqueness and conflict-target column ordering from (graph_id, start_id, end_id, kind_id) to (start_id, end_id, kind_id, graph_id); consolidated and adjusted indexes, set fillfactor for HOT updates, and removed primary keys from certain temporary path tables. Minor regex/group parsing and query formatting adjustments accompany these schema changes.

Changes

Cohort / File(s) Summary
Conflict target / INSERT formatting
drivers/pg/query/format.go, drivers/pg/statements.go
Reordered ON CONFLICT (...) target columns from (graph_id, start_id, end_id, kind_id) to (start_id, end_id, kind_id, graph_id) in upsert/insert formatting and batch insert statement.
Schema migration & indexes
drivers/pg/query/sql/schema_up.sql
Reordered edge uniqueness constraint columns, dropped several single-column indexes and created two composite covering indexes, set table fillfactor for HOT-update behavior, and removed primary keys from temporary path tables.
Index regex & parsing
drivers/pg/query/definitions.go, drivers/pg/query/query.go
Extended pgColumnIndexRegex to capture optional INCLUDE (...) fields; added/renamed regex group constants and adjusted parsing in describeGraphPartition to use the new group for USING fields.
Minor metadata
(small diffs across the above files)
Single-line adjustments and renames to reflect new regex groups and ordering; no exported API signature changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop through rows and indexes bright,
I swap the order, set things right.
Start then end now lead the way,
Graph ID follows — hip hooray!
Covering leaves, the queries sing.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat (pg): optimize graph schema for larger datasets' clearly and concisely summarizes the main objective of the pull request—optimizing the PostgreSQL schema for improved performance with larger datasets.
Description check ✅ Passed The pull request description is complete and well-structured, following the template with all major sections filled out including type of change, testing, driver impact, and checklist items appropriately marked.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
drivers/pg/query/sql/schema_up.sql (1)

461-485: ⚠️ Potential issue | 🟡 Minor

Misleading comment: this is not an anti-join pattern.

The comment on lines 462-463 states "The NOT EXISTS filter uses a LEFT JOIN anti-pattern for hash anti-join eligibility," but the actual WHERE clause implements the opposite logic:

where not r.is_cycle
  and r.satisfied is not null
  and (r.satisfied or e.end_id is not null);

This keeps rows where either r.satisfied is true OR a matching edge exists (e.end_id is not null). This is a semi-join/existence check, not an anti-join. An anti-join would use e.end_id IS NULL to exclude rows with matching edges.

Consider updating the comment to accurately describe the filtering logic: rows are retained if they're satisfied or have outgoing edges to explore.

📝 Suggested comment fix
--- swap_forward_front replaces the forward_front contents with filtered rows from next_front using
-INSERT-SELECT instead of ALTER TABLE RENAME, avoiding AccessExclusiveLock catalog churn on temp tables.
--- The NOT EXISTS filter uses a LEFT JOIN anti-pattern for hash anti-join eligibility.
+-- swap_forward_front replaces the forward_front contents with filtered rows from next_front using
+-- INSERT-SELECT instead of ALTER TABLE RENAME, avoiding AccessExclusiveLock catalog churn on temp tables.
+-- Rows are retained if satisfied or if outgoing edges exist (checked via LEFT JOIN).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/sql/schema_up.sql` around lines 461 - 485, The comment above
function swap_forward_front() is incorrect: the WHERE clause in the
INSERT-select keeps rows where r.satisfied is true or a matching edge exists
(e.end_id IS NOT NULL), so this is a semi-join/existence check, not an
anti-join; update the comment to describe that rows are retained when they are
satisfied or have outgoing edges to explore (or remove the misleading mention of
a "NOT EXISTS/LEFT JOIN anti-pattern") and reference the INSERT into
forward_front and the WHERE conditions (r.is_cycle, r.satisfied, e.end_id) to
ensure the text matches the implemented logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 461-485: The comment above function swap_forward_front() is
incorrect: the WHERE clause in the INSERT-select keeps rows where r.satisfied is
true or a matching edge exists (e.end_id IS NOT NULL), so this is a
semi-join/existence check, not an anti-join; update the comment to describe that
rows are retained when they are satisfied or have outgoing edges to explore (or
remove the misleading mention of a "NOT EXISTS/LEFT JOIN anti-pattern") and
reference the INSERT into forward_front and the WHERE conditions (r.is_cycle,
r.satisfied, e.end_id) to ensure the text matches the implemented logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07f31577-843d-4652-9386-a70f26f53d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 4b82ee3 and 1e5bd3e.

📒 Files selected for processing (3)
  • drivers/pg/query/format.go
  • drivers/pg/query/sql/schema_up.sql
  • drivers/pg/statements.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
drivers/pg/query/sql/schema_up.sql (1)

366-387: ⚠️ Potential issue | 🔴 Critical

Add DISTINCT ON dedup to unidirectional_asp_harness and bidirectional_asp_harness before swaps, matching the pattern used in unidirectional_sp_harness.

After removing the primary key on the path column (lines 366–387), deduplication responsibility shifts to harness-level logic. The unidirectional_sp_harness applies DISTINCT ON (f.next_id) when copying data into forward_front (line 555) and when inserting visited nodes (line 549). However, unidirectional_asp_harness and bidirectional_asp_harness execute dynamic SQL parameters directly without equivalent dedup, and the swap_forward_front() and swap_backward_front() functions filter only by cycle/satisfaction conditions—not by deduplication. This allows duplicate paths to accumulate in next_front, potentially increasing memory usage and changing traversal behavior across iterations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/sql/schema_up.sql` around lines 366 - 387, The harnesses
unidirectional_asp_harness and bidirectional_asp_harness no longer deduplicate
before swapping fronts after the PK on path was removed, so duplicates can
accumulate in forward_front/next_front; mirror the pattern from
unidirectional_sp_harness by applying DISTINCT ON (f.next_id) when copying into
forward_front and when inserting visited nodes (the same places where
unidirectional_sp_harness uses DISTINCT ON) so that swap_forward_front() and
swap_backward_front() operate on deduplicated rows and next_front does not grow
with duplicate paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@drivers/pg/query/sql/schema_up.sql`:
- Around line 366-387: The harnesses unidirectional_asp_harness and
bidirectional_asp_harness no longer deduplicate before swapping fronts after the
PK on path was removed, so duplicates can accumulate in
forward_front/next_front; mirror the pattern from unidirectional_sp_harness by
applying DISTINCT ON (f.next_id) when copying into forward_front and when
inserting visited nodes (the same places where unidirectional_sp_harness uses
DISTINCT ON) so that swap_forward_front() and swap_backward_front() operate on
deduplicated rows and next_front does not grow with duplicate paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1333cc7e-0fa0-4a07-86c0-e05686847ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5bd3e and 5f86443.

📒 Files selected for processing (5)
  • drivers/pg/query/definitions.go
  • drivers/pg/query/format.go
  • drivers/pg/query/query.go
  • drivers/pg/query/sql/schema_up.sql
  • drivers/pg/statements.go
✅ Files skipped from review due to trivial changes (1)
  • drivers/pg/query/format.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • drivers/pg/statements.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant