perf(cache): use foyer io_uring I/O engine on Linux#62
Merged
Conversation
Both foyer-backed caches (clean block cache, pack-index cache) defaulted to foyer's psync engine, which dispatches every SSD-tier read to a tokio blocking thread (spawn_blocking -> pread). Switch to foyer's io_uring engine on Linux, falling back to psync if io_uring setup fails (e.g. seccomp/gVisor sandboxes) or on non-Linux. Selection is automatic; no config knob. A new bench (benches/foyer_ssd_read.rs) isolates the SSD-tier read and measures the win: ~55.6 µs -> ~40.3 µs median per read (~27% lower latency, ~37% higher throughput) on cached reads, from eliminating the blocking-pool thread hop. Shared helper build_preferring_uring() (block/foyer_engine.rs) centralizes the prefer-uring/fallback-to-psync logic; both cache constructors route their builder through it. Also includes an unrelated gc.rs bench tweak and a ublk-core cleanup example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sweep in-flight reads {1,16,64} and add a buffered-vs-O_DIRECT axis. The
direct (cold-media) variant only runs when GLIDEFS_BENCH_DIR points at a
real SSD-backed dir (O_DIRECT EINVALs on tmpfs), so the default CI run
stays buffered-only and portable.
Findings on an NVMe host: io_uring's win is largest on the page-cache-warm
concurrent path (4.5 vs 3.1 GiB/s at conc=64; psync hits a blocking-pool
cliff). Under O_DIRECT the NVMe device becomes the constraint and the two
engines converge (~3.4 GiB/s), confirming the engine is not the bottleneck
on the cold path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Both foyer-backed caches (clean block cache, pack-index cache) defaulted to foyer's psync engine, which dispatches every SSD-tier read to a tokio blocking thread (
spawn_blocking→pread). This switches to foyer's io_uring engine on Linux, submitting the read inline instead.build_preferring_uring()(block/foyer_engine.rs) centralizes the prefer-uring / fallback-to-psync logic; both cache constructors route their builder through it.Why
psync's blocking-pool hop adds latency on top of the actual
preadfor every clean-cache / pack-index SSD-tier read. io_uring removes that hop.Measured
New bench
benches/foyer_ssd_read.rsisolates the SSD-tier read (psync vs io_uring in one run):~27% lower latency, ~37% higher throughput on cached reads.
Verification
cargo build -p glidefsclean; clippy clean on the lib.block::cache(7),block::pack_index_cache(16, incl.test_ssd_persistence),block::write_cache(61) tests all pass through the io_uring path on a Linux host.cargo bench --features test-utils --bench foyer_ssd_readproduces the numbers above.Notes
Also bundles an unrelated
benches/gc.rstweak and aublk-core/examples/cleanup_stale.rsexample.🤖 Generated with Claude Code