perf(index): batch edge writes in one transaction + show post-loop phases#5
Merged
Conversation
…ases Two problems on large repos, both at the end of indexing: 1. The progress bar froze at N/N while the post-loop edge-resolution passes ran (imports, type relations, references, project membership), so indexing looked hung. 2. Each edge was written with its own connection + prepared statement + commit. On a 1559-file React Native repo that produced 57,557 REFERENCES edges — 57k separate transactions — and indexing didn't finish in 15 minutes. Fixes: - New `GraphStore::link_edges(&[GraphEdge])` batch API. The LadybugDB backend writes the whole batch in ONE transaction (BEGIN/COMMIT) on a single connection, preparing one statement per edge kind and reusing it across rows; rolls back on error. The default trait impl falls back to per-edge writes, so the in-memory store is unchanged. All four post-loop passes (resolve_imports / resolve_supertypes / resolve_references / CONTAINS_FILE) now accumulate edges and write one batch each. - IndexProgress gains a `phase` field; the indexer reports "resolving imports", "resolving type relationships", "resolving references", and "linking project membership" via the existing progress callback, and the CLI shows the active phase instead of a frozen bar. Measured (aeontis-backend, 119 files, 648 reference edges): clean index 9.6s -> 6.1s (~35% faster), identical edge/symbol/project counts. On the React Native repo (57,557 reference edges) the same change took indexing from ">15 min, never finished" to ~4 min, and a spot check confirms stored edges match ripgrep ground truth (declaration excluded, all real usages found, no false positives). Bumps version to 0.1.4. Tests: new ladybug smoke test for the batched multi-kind link_edges path incl. MERGE idempotency; full suite (79 tests), fmt and clippy all green.
Review Summary by QodoBatch edge writes in transactions and show post-loop phases
WalkthroughsDescription• Batch edge writes in one transaction instead of per-edge commits - Reduces 57k separate transactions to one on large repos - Dramatically improves indexing performance (57k refs: >15min → ~4min) • Add GraphStore::link_edges() batch API with transaction support - LadybugDB backend prepares statements once per edge kind, reuses across rows - Default trait impl falls back to per-edge writes for in-memory store • Show post-loop phase progress in CLI instead of frozen bar - Reports "resolving imports/type relationships/references/project membership" - Prevents appearance of hung indexing during edge resolution • Accumulate edges in all four post-loop passes before batch writing - Imports, supertypes, references, and project membership now use batching Diagramflowchart LR
A["Per-file scan\n(existing)"] --> B["Accumulate edges\nin memory"]
B --> C["Batch write\nto GraphStore"]
C --> D["One transaction\nper phase"]
D --> E["Progress UI\nshows phase"]
F["LadybugDB backend"] --> G["Prepare statement\nonce per kind"]
G --> H["Reuse across\nall rows"]
H --> D
File Changes1. src/graph/model.rs
|
Code Review by Qodo
1. No rollback on commit
|
Comment on lines
+430
to
+439
| match result { | ||
| Ok(()) => conn | ||
| .query("COMMIT") | ||
| .map(|_| ()) | ||
| .map_err(|e| anyhow!("commit transaction: {e}")), | ||
| Err(err) => { | ||
| // Best-effort rollback; report the original error. | ||
| let _ = conn.query("ROLLBACK"); | ||
| Err(err) | ||
| } |
There was a problem hiding this comment.
1. No rollback on commit 🐞 Bug ☼ Reliability
In LadybugGraphStore::link_edges, if the batch executes successfully but the final COMMIT fails, the function returns the commit error without attempting a rollback/cleanup of the open transaction. This can leave the database in an open transaction state (and potentially holding locks) until the connection is torn down.
Agent Prompt
### Issue description
`LadybugGraphStore::link_edges` starts a transaction and rolls back on mid-batch execution errors, but does not roll back when `COMMIT` itself fails. This leaves error handling asymmetric and risks keeping an open transaction/locks around after a failed commit.
### Issue Context
The batched write path uses explicit `BEGIN TRANSACTION` / `COMMIT` statements and reuses prepared statements across edges.
### Fix Focus Areas
- src/graph/ladybug_store.rs[404-440]
### Suggested fix
- Capture the result of `conn.query("COMMIT")`.
- If `COMMIT` returns `Err`, perform a best-effort `ROLLBACK` (ignore rollback errors), then return the commit error with context.
- Optionally, consider a small RAII/guard pattern so any early-return after `BEGIN` triggers rollback unless a `committed` flag is set.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On large repos, indexing stalled hard at the very end. Two causes, both in the post-loop edge-resolution phase:
N/N, then sat there silently while the four post-loop passes (imports, type relations, references, project membership) ran — looking hung.Connection::new+prepare+ implicit commit. On a 1559-file React Native repo that's 57,557REFERENCESedges = 57k separate commits/fsyncs — indexing didn't finish in 15 minutes.Fix
GraphStore::link_edges(&[GraphEdge])batch API. The LadybugDB backend writes the whole batch in one transaction (BEGIN/COMMIT) on a single connection, preparing one statement per edge kind and reusing it across rows (rolls back on error). The default trait impl falls back to per-edge writes, so the in-memory store needs no change. All four post-loop passes now accumulate edges and write a single batch each.IndexProgressgains aphasefield; the indexer reportsresolving imports/resolving type relationships/resolving references/linking project membershipthrough the existing callback, and the CLI shows the active phase instead of a frozen bar.Results
Edge/symbol/project counts are identical before/after — batching changed how edges are written, not what. Spot-checked the stored edges on the React Native index against ripgrep ground truth: declaration correctly excluded, all real usage sites found, zero false positives.
The win scales with edge count, which is exactly why the effect is dramatic on the reference-heavy repo.
Tests
New ladybug smoke test for the batched multi-kind
link_edgespath includingMERGEidempotency. Full suite (79 tests),cargo fmt --check, andcargo clippy --all-targetsall green. Bumps version to 0.1.4.