Skip to content

perf(cache): use foyer io_uring I/O engine on Linux#62

Merged
jaredLunde merged 2 commits into
mainfrom
jared/foyer-perf
May 29, 2026
Merged

perf(cache): use foyer io_uring I/O engine on Linux#62
jaredLunde merged 2 commits into
mainfrom
jared/foyer-perf

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

What

Both foyer-backed caches (clean block cache, pack-index cache) defaulted to foyer's psync engine, which dispatches every SSD-tier read to a tokio blocking thread (spawn_blockingpread). This switches to foyer's io_uring engine on Linux, submitting the read inline instead.

  • Selection is automatic: io_uring on Linux, falling back to psync if io_uring setup fails (e.g. seccomp/gVisor sandboxes) or on non-Linux. No config knob.
  • Shared helper build_preferring_uring() (block/foyer_engine.rs) centralizes the prefer-uring / fallback-to-psync logic; both cache constructors route their builder through it.

Why

psync's blocking-pool hop adds latency on top of the actual pread for every clean-cache / pack-index SSD-tier read. io_uring removes that hop.

Measured

New bench benches/foyer_ssd_read.rs isolates the SSD-tier read (psync vs io_uring in one run):

Engine Median latency Throughput
psync (old default) 55.6 µs 2.20 GiB/s
io_uring (new) 40.3 µs 3.03 GiB/s

~27% lower latency, ~37% higher throughput on cached reads.

Verification

  • cargo build -p glidefs clean; clippy clean on the lib.
  • block::cache (7), block::pack_index_cache (16, incl. test_ssd_persistence), block::write_cache (61) tests all pass through the io_uring path on a Linux host.
  • cargo bench --features test-utils --bench foyer_ssd_read produces the numbers above.

Notes

Also bundles an unrelated benches/gc.rs tweak and a ublk-core/examples/cleanup_stale.rs example.

🤖 Generated with Claude Code

jaredLunde and others added 2 commits May 28, 2026 16:35
Both foyer-backed caches (clean block cache, pack-index cache) defaulted
to foyer's psync engine, which dispatches every SSD-tier read to a tokio
blocking thread (spawn_blocking -> pread). Switch to foyer's io_uring
engine on Linux, falling back to psync if io_uring setup fails (e.g.
seccomp/gVisor sandboxes) or on non-Linux. Selection is automatic; no
config knob.

A new bench (benches/foyer_ssd_read.rs) isolates the SSD-tier read and
measures the win: ~55.6 µs -> ~40.3 µs median per read (~27% lower
latency, ~37% higher throughput) on cached reads, from eliminating the
blocking-pool thread hop.

Shared helper build_preferring_uring() (block/foyer_engine.rs) centralizes
the prefer-uring/fallback-to-psync logic; both cache constructors route
their builder through it.

Also includes an unrelated gc.rs bench tweak and a ublk-core cleanup
example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sweep in-flight reads {1,16,64} and add a buffered-vs-O_DIRECT axis. The
direct (cold-media) variant only runs when GLIDEFS_BENCH_DIR points at a
real SSD-backed dir (O_DIRECT EINVALs on tmpfs), so the default CI run
stays buffered-only and portable.

Findings on an NVMe host: io_uring's win is largest on the page-cache-warm
concurrent path (4.5 vs 3.1 GiB/s at conc=64; psync hits a blocking-pool
cliff). Under O_DIRECT the NVMe device becomes the constraint and the two
engines converge (~3.4 GiB/s), confirming the engine is not the bottleneck
on the cold path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit d448718 into main May 29, 2026
24 checks passed
@jaredLunde jaredLunde deleted the jared/foyer-perf branch May 29, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant