Skip to content

perf(index): batch edge writes in one transaction + show post-loop phases#5

Merged
prom3theu5 merged 1 commit into
mainfrom
perf/index-resolve-phases
Jun 1, 2026
Merged

perf(index): batch edge writes in one transaction + show post-loop phases#5
prom3theu5 merged 1 commit into
mainfrom
perf/index-resolve-phases

Conversation

@prom3theu5

Copy link
Copy Markdown
Member

Problem

On large repos, indexing stalled hard at the very end. Two causes, both in the post-loop edge-resolution phase:

  1. Frozen progress bar. The bar tracks the per-file scan and hit N/N, then sat there silently while the four post-loop passes (imports, type relations, references, project membership) ran — looking hung.
  2. One transaction per edge. Every edge was written with its own Connection::new + prepare + implicit commit. On a 1559-file React Native repo that's 57,557 REFERENCES edges = 57k separate commits/fsyncs — indexing didn't finish in 15 minutes.

Fix

  • New GraphStore::link_edges(&[GraphEdge]) batch API. The LadybugDB backend writes the whole batch in one transaction (BEGIN/COMMIT) on a single connection, preparing one statement per edge kind and reusing it across rows (rolls back on error). The default trait impl falls back to per-edge writes, so the in-memory store needs no change. All four post-loop passes now accumulate edges and write a single batch each.
  • Phase progress. IndexProgress gains a phase field; the indexer reports resolving imports / resolving type relationships / resolving references / linking project membership through the existing callback, and the CLI shows the active phase instead of a frozen bar.

Results

Repo Before After
aeontis-backend (119 files, 648 ref edges) 9.6s 6.1s (~35% faster)
aeontis-new-reactnative (1559 files, 57,557 ref edges) >15 min, never finished ~4 min

Edge/symbol/project counts are identical before/after — batching changed how edges are written, not what. Spot-checked the stored edges on the React Native index against ripgrep ground truth: declaration correctly excluded, all real usage sites found, zero false positives.

The win scales with edge count, which is exactly why the effect is dramatic on the reference-heavy repo.

Tests

New ladybug smoke test for the batched multi-kind link_edges path including MERGE idempotency. Full suite (79 tests), cargo fmt --check, and cargo clippy --all-targets all green. Bumps version to 0.1.4.

…ases

Two problems on large repos, both at the end of indexing:

1. The progress bar froze at N/N while the post-loop edge-resolution passes
   ran (imports, type relations, references, project membership), so indexing
   looked hung.
2. Each edge was written with its own connection + prepared statement + commit.
   On a 1559-file React Native repo that produced 57,557 REFERENCES edges —
   57k separate transactions — and indexing didn't finish in 15 minutes.

Fixes:
- New `GraphStore::link_edges(&[GraphEdge])` batch API. The LadybugDB backend
  writes the whole batch in ONE transaction (BEGIN/COMMIT) on a single
  connection, preparing one statement per edge kind and reusing it across rows;
  rolls back on error. The default trait impl falls back to per-edge writes, so
  the in-memory store is unchanged. All four post-loop passes
  (resolve_imports / resolve_supertypes / resolve_references / CONTAINS_FILE)
  now accumulate edges and write one batch each.
- IndexProgress gains a `phase` field; the indexer reports "resolving imports",
  "resolving type relationships", "resolving references", and "linking project
  membership" via the existing progress callback, and the CLI shows the active
  phase instead of a frozen bar.

Measured (aeontis-backend, 119 files, 648 reference edges): clean index 9.6s ->
6.1s (~35% faster), identical edge/symbol/project counts. On the React Native
repo (57,557 reference edges) the same change took indexing from
">15 min, never finished" to ~4 min, and a spot check confirms stored edges
match ripgrep ground truth (declaration excluded, all real usages found, no
false positives). Bumps version to 0.1.4.

Tests: new ladybug smoke test for the batched multi-kind link_edges path incl.
MERGE idempotency; full suite (79 tests), fmt and clippy all green.
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Batch edge writes in transactions and show post-loop phases

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Batch edge writes in one transaction instead of per-edge commits
  - Reduces 57k separate transactions to one on large repos
  - Dramatically improves indexing performance (57k refs: >15min → ~4min)
• Add GraphStore::link_edges() batch API with transaction support
  - LadybugDB backend prepares statements once per edge kind, reuses across rows
  - Default trait impl falls back to per-edge writes for in-memory store
• Show post-loop phase progress in CLI instead of frozen bar
  - Reports "resolving imports/type relationships/references/project membership"
  - Prevents appearance of hung indexing during edge resolution
• Accumulate edges in all four post-loop passes before batch writing
  - Imports, supertypes, references, and project membership now use batching
Diagram
flowchart LR
  A["Per-file scan\n(existing)"] --> B["Accumulate edges\nin memory"]
  B --> C["Batch write\nto GraphStore"]
  C --> D["One transaction\nper phase"]
  D --> E["Progress UI\nshows phase"]
  F["LadybugDB backend"] --> G["Prepare statement\nonce per kind"]
  G --> H["Reuse across\nall rows"]
  H --> D

Loading

Grey Divider

File Changes

1. src/graph/model.rs ✨ Enhancement +17/-0

Define GraphEdge enum for batch edge representation

• Add GraphEdge enum representing five edge types (SymbolReferences, SymbolInherits,
 SymbolImplements, FileImportsPackage, ProjectContainsFile)
• Each variant holds the two endpoint IDs needed for the edge
• Used by indexer post-passes to accumulate edges for batch writing

src/graph/model.rs


2. src/graph/store.rs ✨ Enhancement +27/-0

Add batch edge writing trait method

• Add link_edges() trait method accepting slice of GraphEdge
• Default impl dispatches to existing per-edge link_* methods for backward compatibility
• Allows backends to override with optimized batch implementations

src/graph/store.rs


3. src/graph/ladybug_store.rs ✨ Enhancement +86/-0

Implement batched edge writes in one transaction

• Implement link_edges() with transaction batching
• Prepare one statement per edge kind, reuse across all rows of that kind
• Wrap entire batch in BEGIN/COMMIT with rollback on error
• Reduces 57k commits to one for large reference-heavy repos

src/graph/ladybug_store.rs


View more (4)
4. src/indexer/mod.rs ✨ Enhancement +88/-18

Batch edges in post-loop passes with phase reporting

• Add phase: Option<&'static str> field to IndexProgress struct
• Add report_phase() helper to emit progress updates during post-loop passes
• Refactor all four post-loop passes (imports, supertypes, references, project membership) to
 accumulate edges in vectors
• Call store.link_edges() once per pass instead of per-edge
• Report phase names: "resolving imports", "resolving type relationships", "resolving references",
 "linking project membership"

src/indexer/mod.rs


5. src/main.rs ✨ Enhancement +10/-6

Display post-loop phase in progress bar

• Update progress callback to display active phase when available
• Show phase name (e.g. "resolving references…") instead of current file path during post-loop
 passes
• Prevents frozen progress bar appearance during edge resolution

src/main.rs


6. tests/ladybug_smoke.rs 🧪 Tests +79/-0

Add smoke test for batched edge writing

• Add new test ladybug_link_edges_batch_roundtrips() validating batched edge writing
• Tests multiple edge kinds (SymbolInherits, SymbolReferences) in single batch
• Verifies edges are queryable after batch write
• Confirms MERGE idempotency when re-running same batch

tests/ladybug_smoke.rs


7. Cargo.toml ⚙️ Configuration changes +1/-1

Bump version to 0.1.4

• Bump version from 0.1.3 to 0.1.4

Cargo.toml


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 1, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Action required

1. No rollback on commit 🐞 Bug ☼ Reliability
Description
In LadybugGraphStore::link_edges, if the batch executes successfully but the final COMMIT fails, the
function returns the commit error without attempting a rollback/cleanup of the open transaction.
This can leave the database in an open transaction state (and potentially holding locks) until the
connection is torn down.
Code

src/graph/ladybug_store.rs[R430-439]

Evidence
The function explicitly begins a transaction, but only issues ROLLBACK in the Err(err) branch
(batch execution failure). In the Ok(()) branch it calls COMMIT and directly returns its error
without rollback, so a failing commit has no cleanup path.

src/graph/ladybug_store.rs[404-440]
src/graph/ladybug_store.rs[42-44]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`LadybugGraphStore::link_edges` starts a transaction and rolls back on mid-batch execution errors, but does not roll back when `COMMIT` itself fails. This leaves error handling asymmetric and risks keeping an open transaction/locks around after a failed commit.

### Issue Context
The batched write path uses explicit `BEGIN TRANSACTION` / `COMMIT` statements and reuses prepared statements across edges.

### Fix Focus Areas
- src/graph/ladybug_store.rs[404-440]

### Suggested fix
- Capture the result of `conn.query("COMMIT")`.
- If `COMMIT` returns `Err`, perform a best-effort `ROLLBACK` (ignore rollback errors), then return the commit error with context.
- Optionally, consider a small RAII/guard pattern so any early-return after `BEGIN` triggers rollback unless a `committed` flag is set.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@prom3theu5 prom3theu5 merged commit f2df053 into main Jun 1, 2026
1 check passed
@prom3theu5 prom3theu5 deleted the perf/index-resolve-phases branch June 1, 2026 21:22
Comment on lines +430 to +439
match result {
Ok(()) => conn
.query("COMMIT")
.map(|_| ())
.map_err(|e| anyhow!("commit transaction: {e}")),
Err(err) => {
// Best-effort rollback; report the original error.
let _ = conn.query("ROLLBACK");
Err(err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No rollback on commit 🐞 Bug ☼ Reliability

In LadybugGraphStore::link_edges, if the batch executes successfully but the final COMMIT fails, the
function returns the commit error without attempting a rollback/cleanup of the open transaction.
This can leave the database in an open transaction state (and potentially holding locks) until the
connection is torn down.
Agent Prompt
### Issue description
`LadybugGraphStore::link_edges` starts a transaction and rolls back on mid-batch execution errors, but does not roll back when `COMMIT` itself fails. This leaves error handling asymmetric and risks keeping an open transaction/locks around after a failed commit.

### Issue Context
The batched write path uses explicit `BEGIN TRANSACTION` / `COMMIT` statements and reuses prepared statements across edges.

### Fix Focus Areas
- src/graph/ladybug_store.rs[404-440]

### Suggested fix
- Capture the result of `conn.query("COMMIT")`.
- If `COMMIT` returns `Err`, perform a best-effort `ROLLBACK` (ignore rollback errors), then return the commit error with context.
- Optionally, consider a small RAII/guard pattern so any early-return after `BEGIN` triggers rollback unless a `committed` flag is set.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant