Skip to content

State Store: Compact pruned key range after each prune#3675

Merged
masih merged 1 commit into
mainfrom
kbhat/sto-602-prune-compaction
Jul 1, 2026
Merged

State Store: Compact pruned key range after each prune#3675
masih merged 1 commit into
mainfrom
kbhat/sto-602-prune-compaction

Conversation

@Kbhat1

@Kbhat1 Kbhat1 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Describe your changes and provide context

  • Pebble prune leaves tombstones uncompacted, so prune slows over uptime
  • Bump compaction concurrency to {1,4} and compact each pruned range right after

Testing performed to validate your change

  • Verifying in unit tests + on node

The State Store (pebbledb) prune scans the entire DB on every run. Deleted
keys linger as un-compacted tombstones, so each scan reads through more dead
data the longer a node stays up — prune latency creeps up and head-lag grows
(a restart temporarily relieves it because reopening triggers compaction).

Two changes address this:
- Raise Pebble's compaction concurrency from the default {1,1} to {1,4} so a
  single compactor can keep up with the tombstone churn pruning generates.
- After each prune, compact only the key span that was actually deleted (and
  skip compaction entirely when the prune deleted nothing), reclaiming the
  tombstoned space immediately instead of letting it accumulate.

Applied to both the descending (default for new DBs) and ascending (legacy)
prune paths. Adds unit tests covering the range compaction, the skip guard,
and the single-key inclusive-bound edge case.

STO-602

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes core state-store pruning and synchronous compaction on every wipe paths, which can add I/O/CPU during prune but targets a known production head-lag issue; behavior is guarded when no keys are deleted and covered by new tests.

Overview
Addresses prune latency growing with node uptime by reclaiming tombstones Pebble previously left behind after MVCC prune passes.

Pebble open options now allow 1–4 parallel compactions (CompactionConcurrencyRange) so background compaction can keep up with tombstone churn from pruning.

After descending and ascending prune paths finish deletes and update earliest version, they call new compactPrunedRange, which tracks the min/max encoded keys deleted during the scan and runs Compact on that span (skipped when nothing was deleted). The end bound is derived so Pebble’s start < end requirement holds even for a single deleted key.

prune_compaction_test.go adds coverage for post-prune compaction, no-op when no deletions, single-key range math, and the legacy ascending path.

Reviewed by Cursor Bugbot for commit 6feb36d. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJul 1, 2026, 9:22 AM

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.07%. Comparing base (2378fca) to head (6feb36d).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/db_engine/pebbledb/mvcc/db.go 88.88% 1 Missing and 1 partial ⚠️
sei-db/db_engine/pebbledb/mvcc/db_ascending.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3675      +/-   ##
==========================================
- Coverage   58.97%   58.07%   -0.91%     
==========================================
  Files        2263     2179      -84     
  Lines      187223   177592    -9631     
==========================================
- Hits       110421   103129    -7292     
+ Misses      66858    65304    -1554     
+ Partials     9944     9159     -785     
Flag Coverage Δ
sei-chain-pr 54.92% <84.00%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/db_engine/pebbledb/mvcc/db.go 67.65% <88.88%> (+1.32%) ⬆️
sei-db/db_engine/pebbledb/mvcc/db_ascending.go 53.75% <71.42%> (+35.79%) ⬆️

... and 85 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a synchronous compaction of the pruned key range after each prune pass (both descending and ascending paths) and raises Pebble's compaction concurrency to {1,4}. The logic is correct and well-tested; the only concerns are non-blocking performance trade-offs around running a synchronous compaction while the prune iterator is still open and over a range that can span the whole keyspace.

Findings: 0 blocking | 5 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Both Codex and Cursor produced no material findings (Codex additionally noted it could not run the focused tests because Go 1.25.6 was unavailable in its sandbox).
  • compactPrunedRange runs a synchronous, blocking Pebble Compact on every prune that deletes anything. Since the compacted span runs from the smallest to the largest deleted key across ALL stores, a prune that touches a low-keyspace store and a high-keyspace store will compact essentially the entire DB range (including live data between them) each cycle. This is functionally correct but may make prune latency worse rather than better in some workloads — worth confirming with the on-node validation mentioned in the PR description, and consider whether per-store ranges would better match the 'compact only the pruned range' intent.
  • Consider validating that the new CompactionConcurrencyRange (returning {1,4}) interacts well with existing L0 compaction settings under sustained write load, since bursting to 4 parallel compactions competes for I/O with foreground writes.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

if err := db.SetEarliestVersion(earliestVersion, false); err != nil {
return err
}
return db.compactPrunedRange(firstDeletedKey, lastDeletedKey)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] compactPrunedRange runs synchronously here, but the prune iterator (itr) and batch are only closed by the defers at the top of the function — so they are still open during the compaction. An open Pebble iterator pins the current LSM version, which prevents the obsolete sstables produced by this compaction from being physically deleted until the iterator is closed, partially defeating the immediate space-reclamation goal. Consider explicitly closing itr (and the already-committed batch) before calling compactPrunedRange.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kbhat1 this is worth addressing in a follow up PR I think.

// start < end. Appending a zero byte extends the user-key portion of last,
// yielding a key strictly greater than it under both the MVCC and default
// comparers, so the entire deleted span is covered.
end := append(slices.Clone(last), 0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Minor: append(slices.Clone(last), 0) is correct and safe (clone prevents mutating last). For readability you could note that this relies on 0x00 being the minimum byte so the extended key sorts immediately after last for any suffix — which the test asserts. No change required.

@masih masih requested a review from sei-will July 1, 2026 09:15
@masih masih added the backport release/v6.6 Backport to release v6.6 label Jul 1, 2026
@masih masih added this pull request to the merge queue Jul 1, 2026
Merged via the queue into main with commit 42d7e20 Jul 1, 2026
74 of 77 checks passed
@masih masih deleted the kbhat/sto-602-prune-compaction branch July 1, 2026 10:08
@seidroid

seidroid Bot commented Jul 1, 2026

Copy link
Copy Markdown

Created backport PR for release/v6.6:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-3675-to-release/v6.6
git worktree add --checkout .worktree/backport-3675-to-release/v6.6 backport-3675-to-release/v6.6
cd .worktree/backport-3675-to-release/v6.6
git reset --hard HEAD^
git cherry-pick -x 42d7e20afddf039139502765bdb2ed50e4b45a8e
git push --force-with-lease

Kbhat1 added a commit that referenced this pull request Jul 1, 2026
- Pebble prune leaves tombstones uncompacted, so prune slows over uptime
- Bump compaction concurrency to {1,4} and compact each pruned range
right after

- Verifying in unit tests + on node

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 42d7e20)
masih pushed a commit that referenced this pull request Jul 1, 2026
…each prune (#3679)

Backport of #3675 to `release/v6.6`.

Co-authored-by: Kartik Bhat <kartikbhatri@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants