perf: PR #43 recovery — ACL caching, inline SET, NEON SIMD#71
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughAdds a monotonic AclTable version counter and per-connection ACL cache with a fast-path for unrestricted users; implements an inline raw-RESP Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ConnHandler as ConnectionHandler
participant ACL as AclTable
participant Shard
participant AOF
Client->>ConnHandler: send RESP "*3" SET key value
ConnHandler->>ACL: maybe refresh_acl_cache(&acl_table)
alt conn.acl_skip_allowed() && can_inline_writes
ConnHandler->>Shard: try_evict_if_needed(runtime_config)
ConnHandler->>Shard: acquire shard-exclusive lock
ConnHandler->>Shard: write string entry & update metadata
ConnHandler->>AOF: append raw consumed RESP bytes
ConnHandler-->>Client: "+OK\r\n" (consume input)
else not inlined
ConnHandler->>ACL: ctx.acl_table.read() + check_command_permission(...)
ACL-->>ConnHandler: allow / deny
ConnHandler->>Shard: normal (non-inlined) dispatch
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoPerformance recovery: ACL caching, inline SET, NEON SIMD for aarch64
WalkthroughsDescription• Cache per-connection ACL unrestricted flag to skip RwLock + HashMap probe on every command (~2.3% CPU) • Extend inline dispatch to handle plain SET (*3 only) with maxmemory eviction and AOF support (~8% CPU) • Add AArch64 NEON SIMD path for DashTable probing, replacing 16-iteration scalar loop (~14% CPU) • Expected combined impact: +10–15% SET throughput at p=16 on aarch64 Diagramflowchart LR
A["Connection Init"] -->|refresh_acl_cache| B["Cached ACL Flag"]
B -->|cached_acl_unrestricted=true| C["Skip RwLock Probe"]
C -->|fast path| D["Command Execution"]
E["Raw RESP Bytes"] -->|try_inline_dispatch| F["GET or Plain SET"]
F -->|GET| G["Read-only Path"]
F -->|SET *3| H["Write Path"]
H -->|eviction check| I["Store Value"]
H -->|AOF append| J["Persistence"]
K["DashTable Lookup"] -->|x86_64| L["SSE2 Path"]
K -->|aarch64| M["NEON SIMD Path"]
M -->|vceqq_u8| N["16-way Comparison"]
N -->|bitmask extract| O["Fast Probe Result"]
File Changes1. src/acl/table.rs
|
Code Review by Qodo
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/conn/handler_monoio.rs (1)
410-424:⚠️ Potential issue | 🟠 MajorInline writes bypass replica mode check.
The
can_inline_writesgate checks ACL, MULTI, and tracking state but does not check for replica mode. The READONLY enforcement at lines 824-839 only applies to commands that go through the normal Frame parsing path. An inline SET on a replica would incorrectly succeed.Consider adding a replica mode check:
🐛 Proposed fix
// Inline writes are safe when the user is unrestricted (ACL // already cached), not inside MULTI, and tracking is off. -let can_inline_writes = - conn.cached_acl_unrestricted && !conn.in_multi && !conn.tracking_state.enabled; +let is_replica = ctx.repl_state.as_ref().map_or(false, |rs| { + rs.try_read().map_or(false, |guard| { + matches!(guard.role, crate::replication::state::ReplicationRole::Replica { .. }) + }) +}); +let can_inline_writes = + conn.cached_acl_unrestricted && !conn.in_multi && !conn.tracking_state.enabled && !is_replica;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/conn/handler_monoio.rs` around lines 410 - 424, The inline-write gating (can_inline_writes) currently uses conn.cached_acl_unrestricted, conn.in_multi and conn.tracking_state.enabled but omits any replica/READONLY check, so inline writes via try_inline_dispatch_loop can bypass replica enforcement; update the gate to also require the connection not be a replica/readonly (e.g., add a check like "!conn.is_replica" or "!ctx.replica_mode" depending on the project's flag) so can_inline_writes becomes false for replica/READONLY connections, and ensure try_inline_dispatch_loop will not allow mutating commands when that replica flag is set.
🧹 Nitpick comments (1)
src/server/conn/tests.rs (1)
107-164: Add regression coverage for inlineSETside effects too.These tests cover the dispatch gate well, but the optimized path also owns AOF append and maxmemory/eviction behavior. Please add a positive case that asserts the expected
AofMessage::Appendpayload is emitted when inline writes are enabled, plus an OOM/eviction case, so the fast path cannot drift from normalSETsemantics unnoticed.Also applies to: 289-319
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/conn/tests.rs` around lines 107 - 164, Add tests that exercise side effects of the inline SET path: (1) a test that passes a real aof_tx into try_inline_dispatch and asserts it sends an AofMessage::Append with the expected payload when a plain "*3 SET" is inlined (use make_dbs(), try_inline_dispatch, and inspect the receiver side of the channel for AofMessage::Append containing the serialized SET); (2) an OOM/eviction test that configures rt_config with a tiny maxmemory, inserts keys until eviction would occur, then invokes try_inline_dispatch for a SET and asserts eviction semantics match the normal path (verify the new key exists/old keys were evicted as expected and that any eviction counters or behaviors are observed). Reference try_inline_dispatch, AofMessage::Append, make_dbs, and make_rt_config to locate the relevant helpers and mirror the existing test patterns for assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/conn/blocking.rs`:
- Around line 1182-1190: The loop that parses key_len can overflow when doing
key_len * 10 + ...; replace the direct multiplication/add with a checked or
saturating operation to prevent wrapping: use key_len =
key_len.checked_mul(10).and_then(|v| v.checked_add((d - b'0') as
usize)).unwrap_or(usize::MAX) (or use saturating_mul/saturating_add) and return
0 if it becomes usize::MAX so the later bounds check fails; update the code
around the variables key_len, buf, pos, len to use this safe arithmetic.
In `@src/server/conn/core.rs`:
- Around line 161-165: The per-connection boolean cached_acl_unrestricted is
being left stale on runtime ACL mutations; change the design to track a global
ACL version in AclTable (or similar) and store a per-connection acl_version
alongside cached_acl_unrestricted on the connection struct, then on every
fast-path check (where cached_acl_unrestricted is consulted, e.g. command/key
permission hot-paths and in AUTH/HELLO handling) compare connection.acl_version
to AclTable.version and recompute/refresh cached_acl_unrestricted when versions
differ; also increment AclTable.version whenever ACL-changing operations run
(ACL SETUSER, ACL DELUSER, ACL LOAD) so all existing connections will detect the
change and invalidate their cached_acl_unrestricted.
---
Outside diff comments:
In `@src/server/conn/handler_monoio.rs`:
- Around line 410-424: The inline-write gating (can_inline_writes) currently
uses conn.cached_acl_unrestricted, conn.in_multi and conn.tracking_state.enabled
but omits any replica/READONLY check, so inline writes via
try_inline_dispatch_loop can bypass replica enforcement; update the gate to also
require the connection not be a replica/readonly (e.g., add a check like
"!conn.is_replica" or "!ctx.replica_mode" depending on the project's flag) so
can_inline_writes becomes false for replica/READONLY connections, and ensure
try_inline_dispatch_loop will not allow mutating commands when that replica flag
is set.
---
Nitpick comments:
In `@src/server/conn/tests.rs`:
- Around line 107-164: Add tests that exercise side effects of the inline SET
path: (1) a test that passes a real aof_tx into try_inline_dispatch and asserts
it sends an AofMessage::Append with the expected payload when a plain "*3 SET"
is inlined (use make_dbs(), try_inline_dispatch, and inspect the receiver side
of the channel for AofMessage::Append containing the serialized SET); (2) an
OOM/eviction test that configures rt_config with a tiny maxmemory, inserts keys
until eviction would occur, then invokes try_inline_dispatch for a SET and
asserts eviction semantics match the normal path (verify the new key exists/old
keys were evicted as expected and that any eviction counters or behaviors are
observed). Reference try_inline_dispatch, AofMessage::Append, make_dbs, and
make_rt_config to locate the relevant helpers and mirror the existing test
patterns for assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dca5032b-5fa4-468e-a2e2-81e811619f60
📒 Files selected for processing (8)
src/acl/table.rssrc/server/conn/blocking.rssrc/server/conn/core.rssrc/server/conn/handler_monoio.rssrc/server/conn/handler_sharded.rssrc/server/conn/handler_single.rssrc/server/conn/tests.rssrc/storage/dashtable/simd.rs
| #[allow(clippy::unwrap_used)] // std RwLock: poison = prior panic = unrecoverable | ||
| pub fn refresh_acl_cache(&mut self, acl_table: &StdRwLock<crate::acl::AclTable>) { | ||
| self.cached_acl_unrestricted = acl_table | ||
| .read() | ||
| .unwrap() | ||
| .is_user_unrestricted(&self.current_user); |
There was a problem hiding this comment.
2. refresh_acl_cache() unwrap comment misplaced 📘 Rule violation ⚙ Maintainability
The new .unwrap() in refresh_acl_cache() is not preceded by the required single-line justification comment immediately above the #[allow(clippy::unwrap_used)] attribute. This violates the project’s unwrap-annotation requirement and can allow unreviewed panics to slip into production code.
Agent Prompt
## Issue description
`refresh_acl_cache()` uses `.unwrap()` but does not follow the required pattern: a one-line `// ...` justification comment immediately above `#[allow(clippy::unwrap_used)]`.
## Issue Context
The code currently has `#[allow(clippy::unwrap_used)]` but the justification is not formatted per the rule.
## Fix Focus Areas
- src/server/conn/core.rs[237-242]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Verify the key was actually set | ||
| let guard = dbs.read_db(0, 0); | ||
| let entry = guard.get_if_alive(b"foo", 0).expect("key should exist"); | ||
| assert_eq!(entry.value.as_bytes().unwrap(), b"bar"); | ||
| } |
There was a problem hiding this comment.
3. Test unwrap lacks allow+comment 📘 Rule violation ⚙ Maintainability
A new .unwrap() was added in src/server/conn/tests.rs without a #[allow(clippy::unwrap_used)] and an immediately-preceding justification comment. This violates the unwrap annotation requirement (which applies to tests too).
Agent Prompt
## Issue description
A new `.unwrap()` was added in `src/server/conn/tests.rs` without the required `#[allow(clippy::unwrap_used)]` and justification comment directly above the allow.
## Issue Context
Even in tests, this repo requires explicit justification for unwrap usage to keep unwrap audits consistent.
## Fix Focus Areas
- src/server/conn/tests.rs[133-137]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // ---- SET path (write, plain *3 only) ---- | ||
| // Parse second bulk-string argument (the value) | ||
| pos = key_end + 2; | ||
| if pos >= len || buf[pos] != b'$' { | ||
| return 0; | ||
| } | ||
| pos += 1; | ||
| let mut val_len: usize = 0; | ||
| while pos < len && buf[pos] != b'\r' { | ||
| let d = buf[pos]; | ||
| if d < b'0' || d > b'9' { | ||
| return 0; | ||
| } | ||
| val_len = val_len * 10 + (d - b'0') as usize; | ||
| pos += 1; | ||
| } | ||
| if pos + 1 >= len || buf[pos] != b'\r' || buf[pos + 1] != b'\n' { | ||
| return 0; | ||
| } | ||
| pos += 2; | ||
| let val_start = pos; | ||
| let val_end = val_start + val_len; | ||
| if val_end + 2 > len || buf[val_end] != b'\r' || buf[val_end + 1] != b'\n' { | ||
| return 0; | ||
| } | ||
| let consumed = val_end + 2; | ||
|
|
||
| // Eviction check + write under exclusive lock | ||
| { | ||
| let rt = runtime_config.read(); | ||
| let mut guard = shard_databases.write_db(shard_id, selected_db); | ||
| if crate::storage::eviction::try_evict_if_needed(&mut guard, &rt).is_err() { | ||
| write_buf | ||
| .extend_from_slice(b"-OOM command not allowed when used memory > 'maxmemory'\r\n"); | ||
| let _ = read_buf.split_to(consumed); | ||
| return 1; | ||
| } | ||
| drop(rt); | ||
|
|
||
| let key = Bytes::copy_from_slice(&buf[key_start..key_end]); | ||
| let value = Bytes::copy_from_slice(&buf[val_start..val_end]); | ||
| let mut entry = crate::storage::entry::Entry::new_string(value); | ||
| entry.set_last_access(guard.now()); | ||
| entry.set_access_counter(5); | ||
| guard.set(key, entry); | ||
| } | ||
| drop(guard); | ||
|
|
||
| // AOF: send raw RESP bytes (already in wire format, no re-serialization) | ||
| if let Some(tx) = aof_tx { | ||
| let serialized = Bytes::copy_from_slice(&buf[..consumed]); | ||
| let _ = tx.try_send(crate::persistence::aof::AofMessage::Append(serialized)); | ||
| } | ||
|
|
||
| write_buf.extend_from_slice(b"+OK\r\n"); | ||
| let _ = read_buf.split_to(consumed); | ||
| 1 |
There was a problem hiding this comment.
4. Inline set parsing not fuzzed 📘 Rule violation ☼ Reliability
The inline RESP fast-path was extended to parse and execute plain SET directly from raw bytes, but no cargo-fuzz target exercises this new parsing/decoding path. This increases the risk of latent panics/OOB bugs in a byte-level parser on untrusted network input.
Agent Prompt
## Issue description
`try_inline_dispatch` was extended to parse/execute plain RESP `SET` directly from raw bytes, but there is no fuzz target that calls this code path.
## Issue Context
This is network-facing byte parsing. The existing RESP fuzzing targets `moon::protocol::parse::parse`, which does not exercise the inline-dispatch raw-byte parser in `src/server/conn/blocking.rs`.
## Fix Focus Areas
- src/server/conn/blocking.rs[1130-1314]
- fuzz/fuzz_targets/resp_parse.rs[1-35]
- fuzz/fuzz_targets/inline_parse.rs[1-14]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if conn.authenticated { | ||
| // Inline writes are safe when the user is unrestricted (ACL | ||
| // already cached), not inside MULTI, and tracking is off. | ||
| let can_inline_writes = | ||
| conn.cached_acl_unrestricted && !conn.in_multi && !conn.tracking_state.enabled; | ||
| let inlined = try_inline_dispatch_loop( | ||
| &mut read_buf, | ||
| &mut write_buf, |
There was a problem hiding this comment.
5. Replica write via inline set 🐞 Bug ≡ Correctness
The inline SET fast-path can execute a write without any READONLY replica enforcement, because the replica-role check exists only in the normal frame-dispatch path. This allows writes to succeed on replicas (state divergence) whenever can_inline_writes is true.
Agent Prompt
### Issue description
Inline SET executes without READONLY replica enforcement because it runs before/without the normal frame-processing gate that rejects writes when `ReplicationRole::Replica`.
### Issue Context
`handle_connection_sharded_monoio` inlines GET/SET directly from raw RESP bytes and may `continue` the loop after writing responses, bypassing later enforcement.
### Fix Focus Areas
- src/server/conn/handler_monoio.rs[405-425]
- src/server/conn/handler_monoio.rs[824-839]
- src/server/conn/blocking.rs[1144-1153]
- src/server/conn/blocking.rs[1259-1314]
### What to change
- Ensure inline SET is disabled on replicas, e.g. compute `is_replica` from `ctx.repl_state` and force `can_inline_writes = false` when role is `Replica`.
- (Defense-in-depth) Optionally add an explicit `is_replica` parameter into `try_inline_dispatch(_loop)` and reject SET when `is_replica==true`, so future call sites cannot accidentally re-enable it.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Eviction check + write under exclusive lock | ||
| { | ||
| let rt = runtime_config.read(); | ||
| let mut guard = shard_databases.write_db(shard_id, selected_db); | ||
| if crate::storage::eviction::try_evict_if_needed(&mut guard, &rt).is_err() { | ||
| write_buf | ||
| .extend_from_slice(b"-OOM command not allowed when used memory > 'maxmemory'\r\n"); | ||
| let _ = read_buf.split_to(consumed); | ||
| return 1; | ||
| } | ||
| drop(rt); | ||
|
|
||
| let key = Bytes::copy_from_slice(&buf[key_start..key_end]); | ||
| let value = Bytes::copy_from_slice(&buf[val_start..val_end]); | ||
| let mut entry = crate::storage::entry::Entry::new_string(value); | ||
| entry.set_last_access(guard.now()); | ||
| entry.set_access_counter(5); | ||
| guard.set(key, entry); |
There was a problem hiding this comment.
6. Inline set skips spill eviction 🐞 Bug ☼ Reliability
Inline SET always evicts via try_evict_if_needed (no spill) even when the normal write path uses async spill eviction for tiered storage. Under memory pressure, this can delete evicted keys instead of offloading them, breaking the cold-storage GET fallback semantics.
Agent Prompt
### Issue description
Inline SET’s eviction path does not support tiered-storage spill eviction and can delete keys instead of offloading them when memory pressure triggers eviction.
### Issue Context
The normal monoio write path uses `try_evict_if_needed_async_spill` when `ctx.spill_sender` is present; inline SET currently only has `runtime_config` and thus cannot spill.
### Fix Focus Areas
- src/server/conn/blocking.rs[1286-1296]
- src/server/conn/handler_monoio.rs[1617-1637]
- src/server/conn/handler_monoio.rs[405-425]
### What to change
Choose one:
1. **Correctness-first:** Disable inline SET when tiered storage spill is enabled (e.g. when `ctx.spill_sender.is_some()`), by folding that into `can_inline_writes`.
2. **Full feature parity:** Plumb `spill_sender`, `disk_offload_dir`, and `spill_file_id` (and `selected_db`) into `try_inline_dispatch(_loop)` and use `try_evict_if_needed_async_spill` in the inline SET path when configured.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| /// Resolve and cache the unrestricted flag from the AclTable. | ||
| /// Called once on connection init and after AUTH / HELLO. | ||
| #[inline] | ||
| #[allow(clippy::unwrap_used)] // std RwLock: poison = prior panic = unrecoverable | ||
| pub fn refresh_acl_cache(&mut self, acl_table: &StdRwLock<crate::acl::AclTable>) { | ||
| self.cached_acl_unrestricted = acl_table | ||
| .read() | ||
| .unwrap() | ||
| .is_user_unrestricted(&self.current_user); | ||
| } |
There was a problem hiding this comment.
7. Stale acl unrestricted cache 🐞 Bug ⛨ Security
cached_acl_unrestricted is only refreshed on connection init and AUTH/HELLO, but ACL rules can be changed at runtime via ACL SETUSER/DELUSER. A connection that cached true will continue skipping command/key permission checks (and may allow inline SET) even after the user becomes restricted or disabled.
Agent Prompt
### Issue description
Per-connection `cached_acl_unrestricted` can become stale after runtime ACL changes (ACL SETUSER/DELUSER), causing permission checks to be skipped incorrectly.
### Issue Context
`refresh_acl_cache()` is only called on init and AUTH/HELLO, but `AclTable` is mutated at runtime by ACL commands.
### Fix Focus Areas
- src/server/conn/core.rs[234-243]
- src/server/conn/handler_monoio.rs[1185-1233]
- src/command/acl.rs[149-163]
- src/acl/table.rs[270-297]
### What to change
Implement a cache invalidation mechanism, e.g.:
- Add an `acl_epoch: AtomicU64` (or similar) to `AclTable`, increment on any mutation (`apply_setuser`, `del_user`, loading/rewrite if applicable).
- Store `last_seen_acl_epoch` in `ConnectionState`; before using `cached_acl_unrestricted`, compare epochs and refresh when changed.
- Ensure inline-write gating (`can_inline_writes`) also re-checks/refreshed epoch to avoid enabling inline SET based on stale unrestricted status.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
a0bae1d to
2990401
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/conn/handler_monoio.rs (1)
457-472:⚠️ Potential issue | 🔴 CriticalDon't let inline
SETbypass replica or cluster guards.This runs before the later
READONLYand cluster-routing checks, butcan_inline_writesonly gates on ACL/MULTI/TRACKING. A plainSETcan therefore be applied locally on a replica or on a node that should return a redirection error instead of mutating state. Either disable inline writes while replication/cluster routing is active, or enforce those same guards inside the inline path first.🛠️ Safe fallback gate until the inline path enforces the missing semantics
- let can_inline_writes = - conn.cached_acl_unrestricted && !conn.in_multi && !conn.tracking_state.enabled; + let replica_readonly = ctx + .repl_state + .as_ref() + .and_then(|rs| rs.try_read().ok()) + .is_some_and(|guard| { + matches!( + guard.role, + crate::replication::state::ReplicationRole::Replica { .. } + ) + }); + let can_inline_writes = conn.cached_acl_unrestricted + && !conn.in_multi + && !conn.tracking_state.enabled + && !replica_readonly + && !crate::cluster::cluster_enabled();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/conn/handler_monoio.rs` around lines 457 - 472, The inline write path currently allowed by can_inline_writes can bypass replica and cluster-routing guards; update the gate so inline writes are disabled when replication or cluster routing might affect command semantics (e.g., when the connection is READONLY, the server is a replica, or cluster redirection is required) OR add the same replica/cluster checks at the start of try_inline_dispatch_loop so it refuses to apply mutating commands like SET; specifically modify the can_inline_writes computation (or add a new guard passed into try_inline_dispatch_loop) and/or add a pre-check inside try_inline_dispatch_loop to return a fallback (no inline) whenever conn or ctx indicate replica/read-only/cluster-routing state rather than mutating local state.
♻️ Duplicate comments (1)
src/server/conn/blocking.rs (1)
1267-1273:⚠️ Potential issue | 🟡 MinorMinor: Integer overflow in val_len parsing (same pattern as key_len).
The digit-by-digit multiplication at line 1272 (
val_len * 10 + ...) has the same potential overflow issue flagged forkey_lenin a prior review. While the 8KB buffer limit makes exploitation unlikely, applying the same saturating arithmetic fix would be more defensive and consistent.🛡️ Optional defensive fix
-val_len = val_len * 10 + (d - b'0') as usize; +val_len = val_len.saturating_mul(10).saturating_add((d - b'0') as usize);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/conn/blocking.rs` around lines 1267 - 1273, The val_len parsing loop can overflow when computing val_len = val_len * 10 + (d - b'0') as usize; change the arithmetic to use saturating operations to match the fix used for key_len: replace the multiplication/add with val_len = val_len.saturating_mul(10).saturating_add((d - b'0') as usize) (or equivalent checked-to-saturate logic) inside the loop that updates val_len (the variables/loop using pos, len, buf, and val_len).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/conn/handler_monoio.rs`:
- Line 106: Cached ACL "unrestricted" flags (cached_acl_unrestricted) are stale
because they are only recomputed at connect/AUTH/HELLO and then used as a
permanent fast-path, so ACL SETUSER/DELUSER mutations let existing connections
retain elevated permissions; fix by invalidating or versioning the cache: either
(A) make conn.refresh_acl_cache(&ctx.acl_table) be invoked (or triggered)
whenever the global acl_table is mutated (ACL SETUSER/DELUSER) so existing
connections clear/recompute cached_acl_unrestricted, or (B) add an epoch/version
on acl_table and store that epoch on the connection; before taking the fast path
that checks cached_acl_unrestricted, compare epochs and recompute via
refresh_acl_cache if mismatched. Update usages around cached_acl_unrestricted,
conn.refresh_acl_cache, and the ACL mutation code paths to implement one of
these solutions so the fast-path cannot bypass checks after ACL changes.
In `@src/server/conn/handler_sharded.rs`:
- Around line 224-225: The ACL fast-path is using
conn.refresh_acl_cache(&ctx.acl_table) only at connect/AUTH/HELLO and then later
bypasses checks (e.g., in handler_sharded command handling where command/key ACL
checks are skipped); fix by invalidating or updating the connection ACL cache
whenever ACLs are mutated and by adding a version check before any bypass: add a
version or epoch on ctx.acl_table that is incremented by ACL mutation paths,
update or clear conn.acl_cache when mutations occur, and in the fast-path (the
code that currently skips checks) compare conn.acl_cache.version with
ctx.acl_table.version and call conn.refresh_acl_cache(&ctx.acl_table) if they
differ so post-auth ACL changes are respected (touch functions/methods:
conn.refresh_acl_cache, ctx.acl_table mutation handlers, and the handler_sharded
fast-path where checks are skipped).
---
Outside diff comments:
In `@src/server/conn/handler_monoio.rs`:
- Around line 457-472: The inline write path currently allowed by
can_inline_writes can bypass replica and cluster-routing guards; update the gate
so inline writes are disabled when replication or cluster routing might affect
command semantics (e.g., when the connection is READONLY, the server is a
replica, or cluster redirection is required) OR add the same replica/cluster
checks at the start of try_inline_dispatch_loop so it refuses to apply mutating
commands like SET; specifically modify the can_inline_writes computation (or add
a new guard passed into try_inline_dispatch_loop) and/or add a pre-check inside
try_inline_dispatch_loop to return a fallback (no inline) whenever conn or ctx
indicate replica/read-only/cluster-routing state rather than mutating local
state.
---
Duplicate comments:
In `@src/server/conn/blocking.rs`:
- Around line 1267-1273: The val_len parsing loop can overflow when computing
val_len = val_len * 10 + (d - b'0') as usize; change the arithmetic to use
saturating operations to match the fix used for key_len: replace the
multiplication/add with val_len = val_len.saturating_mul(10).saturating_add((d -
b'0') as usize) (or equivalent checked-to-saturate logic) inside the loop that
updates val_len (the variables/loop using pos, len, buf, and val_len).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3a9502cb-eb82-46ad-84da-16512ada2e7a
📒 Files selected for processing (9)
CHANGELOG.mdsrc/acl/table.rssrc/server/conn/blocking.rssrc/server/conn/core.rssrc/server/conn/handler_monoio.rssrc/server/conn/handler_sharded.rssrc/server/conn/handler_single.rssrc/server/conn/tests.rssrc/storage/dashtable/simd.rs
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (2)
- src/acl/table.rs
- src/server/conn/core.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benches/simd_probe.rs`:
- Around line 71-75: The unsafe callsites invoking match_h2 on &miss (the lines
where you do let r = unsafe { black_box(&miss).match_h2(black_box(0x42)) } and
similar) are missing required // SAFETY: comments; either annotate each unsafe
block with a concise // SAFETY: description that documents the invariants (e.g.,
SSE2/NEON availability, validity/lifetime of &miss, and that black_box use is
safe), or refactor the pattern into a small helper like unsafe_match_h2(miss:
&Type, v: u8) -> Ret that contains a single unsafe block with a proper //
SAFETY: comment and call that helper from the bench to eliminate duplication and
ensure consistent documentation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d383aa04-b1ca-4207-9864-f184a576e6a1
📒 Files selected for processing (2)
Cargo.tomlbenches/simd_probe.rs
✅ Files skipped from review due to trivial changes (1)
- Cargo.toml
handle_connection_sharded_monoio called record_connection_closed() at its exit, while conn_accept.rs (the caller) ALSO calls it in the non- migration branch (line 627). The AtomicU64 counter wrapped from 0 to u64::MAX on the second fetch_sub, causing every subsequent try_accept_connection to reject against maxclients. Symptom: first connection succeeds, all subsequent connections rejected with "maxclients reached" — even though CONFIG GET maxclients returns 10000 and the original value was never reached. Repro (before fix): ./target/release/moon --port 6400 --shards 1 --appendonly no & redis-cli -p 6400 SET foo bar # ✓ OK redis-cli -p 6400 SET foo bar # ✗ Connection reset by peer Fix: remove the handler-level decrement. The comment at line 84 already documents that the caller owns the increment via try_accept_connection; by symmetry the caller owns the decrement (conn_accept.rs:547 for TLS, conn_accept.rs:627 for plain TCP non-migrated path). Migration path counter accounting is a separate concern (already imbalanced) and is not addressed here. Verified on aarch64 Linux (OrbStack moon-dev): - 10 sequential SETs all succeed - INFO clients reports connected_clients:1 (just the probe) - redis-benchmark SET p=16 c=50 n=10000 → 1.25M req/s (real number) Blocks: PR #71 perf recovery — cannot measure real throughput without this fix. Once merged, PR #71 can be validated with bench-compare.sh.
The initial inline SET used Bytes::copy_from_slice for key and value,
which triggers MALLOC+memcpy twice per SET. The Frame-based path
achieves zero-copy via Bytes::slice on a frozen BytesMut, which is
just an Arc refcount bump. This disparity caused a ~4-7% SET p=1
regression because the inline-path savings were outweighed by the
new allocations at low pipeline depth.
Fix: call read_buf.split_to(consumed).freeze() once, then slice() the
frozen Bytes for key, value, and AOF. All three are now Arc refcount
bumps over the same underlying allocation — zero malloc, zero memcpy.
Measured impact on aarch64 Linux (OrbStack, 1 shard, 50 clients):
before nocopy after nocopy delta
SET p=16: 2.94M rps 3.11M rps +5.4% (peak 3.60M)
SET p=1: 237K rps 235K rps -1% (within noise)
GET p=16: 3.32M rps 3.33M rps +0.3%
PR #71 totals vs origin/main (both with PR #72 maxclients fix):
SET p=16: 2.43M → 3.11M = +28% (peak +48%)
SET p=1: 241K → 235K = -2.5% (was -4%, noise-level now)
GET p=16: 3.36M → 3.33M = ±0%
All 11 inline-dispatch tests pass unchanged.
Addresses all actionable findings from qodo-code-review and coderabbit on PR #71 before merge. CRITICAL correctness fixes (would silently regress prod): - ACL cache epoch invalidation: Stale cache after ACL SETUSER/ DELUSER/LOAD was letting in-flight connections retain bypass privileges after permissions were revoked. Added AtomicU64 version counter on AclTable, shared via Arc<AtomicU64> handle to per-connection state. Hot path checks `acl_skip_allowed()` (fresh && unrestricted) — lock-free. ACL LOAD now uses `replace_with()` to preserve the counter identity. Two new unit tests cover version bumping and replace_with identity. - Inline SET guards: Added `!is_replica` (prevents writes on replica, matching the READONLY enforcement on the normal path) and `ctx.spill_sender.is_none()` (tiered-storage users still route through the async-spill eviction path). Policy/reliability: - UNSAFE_POLICY.md: add AArch64 NEON intrinsics to approved patterns (NEON is mandatory in ARMv8-A, no runtime detection needed). - saturating_mul / checked_add defenses against malicious digit runs in the inline SET/GET length parse. - SAFETY comments added to every unsafe block in benches/simd_probe.rs (audit-unsafe.sh now reports 179/179 with coverage). - Explicit #[allow] + justification on the `.expect` / `.unwrap` in the new inline-SET success test. Test suite: 11 inline-dispatch + 19 ACL (17 + 2 new) + 9 SIMD all pass on aarch64. cargo fmt, cargo clippy -- -D warnings, and scripts/audit-unsafe.sh all clean.
Review feedback addressed🔴 Critical correctness fixes (would silently regress prod)coderabbitai — Stale ACL cache on runtime mutation: ✅ Fixed in 3481e49. qodo-code-review — Replica writes via inline SET: ✅ Fixed in 3481e49. qodo-code-review — Inline SET skips spill eviction: ✅ Fixed in 3481e49. 🟡 Policy / reliabilityqodo — NEON unsafe not policy-approved: ✅ Added to qodo — unwrap justification comments: ✅ coderabbitai — Integer overflow in length parse: ✅ Fixed in 3481e49 + e637471. coderabbitai — SAFETY comments missing on bench: ✅ All 4 Not addressed in this PR (follow-up)qodo — Inline SET parsing not fuzzed: Deferred. The inline parser shares no code with the Frame parser (already fuzzed). Adding a Depends on#72 (maxclients counter double-decrement) — required to produce valid throughput benchmarks on monoio aarch64. |
Three independent optimizations targeting the remaining hot spots from the PR #43 pipelined-SET regression profile (aarch64 OrbStack moon-dev): Track C — Cache ACL unrestricted flag on ConnectionState (~2.3% CPU): Cache `cached_acl_unrestricted: bool` per-connection, re-resolved on AUTH/HELLO. When true, skip the RwLock acquisition + HashMap SipHash probe on AclTable for every command. All 3 handlers updated. Track A — Expand inline dispatch to plain SET (~8% CPU parser+drop): Extend try_inline_dispatch to handle `*3 SET key value` (no options) directly from raw RESP bytes, bypassing Frame construction and drop. Includes maxmemory eviction check and AOF append (raw RESP bytes, zero re-serialization). Gated by `can_inline_writes` flag that requires unrestricted ACL, no MULTI, no CLIENT TRACKING. Track B — NEON SIMD for DashTable probing on aarch64 (~14% CPU): Add AArch64 NEON path for Group::match_h2 and match_empty_or_deleted using vceqq_u8 + power-of-2 weight + horizontal add bitmask extraction. Replaces scalar 16-iteration loop with ~4-instruction SIMD sequence. SSE2 (x86_64) path unchanged.
- Make neon_bitmask_from_cmp a safe fn (no raw pointer params, only NEON vector types) with internal unsafe block — reduces unsafe API surface from 2 unsafe fns to 1 - Add explicit SAFETY comments to every unsafe block explaining: ptr validity (self.0.as_ptr() valid for 16 bytes), alignment (vld1q_u8 has no alignment requirement unlike SSE2), platform guarantee (NEON mandatory on AArch64/ARMv8-A) - Document inline SET side-effects exhaustively: what is handled (eviction, AOF), what is skipped with guards (ACL, tracking, MULTI), and what is not applicable (blocking wakeup, vector auto-index)
The CI audit-unsafe.sh checks 3 preceding lines for `// SAFETY:`.
Multi-line comments pushed the marker beyond that window. Condense
to 2-line comments immediately above each `unsafe {` block.
Decouples Track B's perf claim from the network benchmark infrastructure (which is currently blocked by a pre-existing maxclients counter leak). Measured on aarch64 (Apple Silicon / OrbStack Linux VM): | Operation | NEON | Scalar | Speedup | |------------------------------|---------|----------|---------| | match_h2 miss | 883 ps | 1.232 ns | 1.39x | | match_h2 hit_one | 886 ps | 1.231 ns | 1.39x | | match_h2 hit_many | 883 ps | 1.228 ns | 1.39x | | match_empty_or_deleted | 686 ps | 5.266 ns | 7.68x | The scalar match_h2 is already autovectorised by LLVM, so the manual NEON path only wins ~39%. match_empty_or_deleted scalar has a non-vectorisable branch (`byte & 0x80 != 0`), so the NEON path wins 7.68x. Run: cargo bench --bench simd_probe
The initial inline SET used Bytes::copy_from_slice for key and value,
which triggers MALLOC+memcpy twice per SET. The Frame-based path
achieves zero-copy via Bytes::slice on a frozen BytesMut, which is
just an Arc refcount bump. This disparity caused a ~4-7% SET p=1
regression because the inline-path savings were outweighed by the
new allocations at low pipeline depth.
Fix: call read_buf.split_to(consumed).freeze() once, then slice() the
frozen Bytes for key, value, and AOF. All three are now Arc refcount
bumps over the same underlying allocation — zero malloc, zero memcpy.
Measured impact on aarch64 Linux (OrbStack, 1 shard, 50 clients):
before nocopy after nocopy delta
SET p=16: 2.94M rps 3.11M rps +5.4% (peak 3.60M)
SET p=1: 237K rps 235K rps -1% (within noise)
GET p=16: 3.32M rps 3.33M rps +0.3%
PR #71 totals vs origin/main (both with PR #72 maxclients fix):
SET p=16: 2.43M → 3.11M = +28% (peak +48%)
SET p=1: 241K → 235K = -2.5% (was -4%, noise-level now)
GET p=16: 3.36M → 3.33M = ±0%
All 11 inline-dispatch tests pass unchanged.
Addresses all actionable findings from qodo-code-review and coderabbit on PR #71 before merge. CRITICAL correctness fixes (would silently regress prod): - ACL cache epoch invalidation: Stale cache after ACL SETUSER/ DELUSER/LOAD was letting in-flight connections retain bypass privileges after permissions were revoked. Added AtomicU64 version counter on AclTable, shared via Arc<AtomicU64> handle to per-connection state. Hot path checks `acl_skip_allowed()` (fresh && unrestricted) — lock-free. ACL LOAD now uses `replace_with()` to preserve the counter identity. Two new unit tests cover version bumping and replace_with identity. - Inline SET guards: Added `!is_replica` (prevents writes on replica, matching the READONLY enforcement on the normal path) and `ctx.spill_sender.is_none()` (tiered-storage users still route through the async-spill eviction path). Policy/reliability: - UNSAFE_POLICY.md: add AArch64 NEON intrinsics to approved patterns (NEON is mandatory in ARMv8-A, no runtime detection needed). - saturating_mul / checked_add defenses against malicious digit runs in the inline SET/GET length parse. - SAFETY comments added to every unsafe block in benches/simd_probe.rs (audit-unsafe.sh now reports 179/179 with coverage). - Explicit #[allow] + justification on the `.expect` / `.unwrap` in the new inline-SET success test. Test suite: 11 inline-dispatch + 19 ACL (17 + 2 new) + 9 SIMD all pass on aarch64. cargo fmt, cargo clippy -- -D warnings, and scripts/audit-unsafe.sh all clean.
checked_add on both key_end+2 and val_end+2. In debug, Rust panics on unsigned overflow; in release, the previous end+2 would silently wrap and slip past greater-than-len, leading to a panic on the subsequent out-of-bounds slice index. Now the parser cleanly returns 0 (falls through to the Frame path) on any overflow. Covered path: SET with a multi-gigabyte claimed key or value length (after saturating_mul). GET path reuses key_end_crlf to avoid re-deriving the proven-safe value.
e637471 to
94216bf
Compare
Summary
Three independent optimisations targeting remaining hot spots from the PR #43 pipelined-SET regression profile (aarch64 OrbStack moon-dev, 1 shard):
cached_acl_unrestricted: boolper-connection, re-resolved on AUTH/HELLO. Skips RwLock + HashMap SipHash probe on every command.try_inline_dispatchto handle plain*3 SET key valuefrom raw RESP bytes, bypassing Frame construction. Handles maxmemory eviction + AOF (raw RESP, zero re-serialisation). Gated bycan_inline_writes(requires unrestricted ACL + no MULTI + no tracking). Zero-copy key/value extraction viaread_buf.split_to(consumed).freeze()— Arc refcount bump, no malloc.Group::match_h2andmatch_empty_or_deleted. SSE2 path unchanged.Measured impact — aarch64 Linux (OrbStack moon-dev, 1 shard, 50 clients)
End-to-end benchmark (requires PR #72 maxclients fix to produce valid numbers; measured with that fix applied locally to both binaries):
main + fix+ fixTrack B microbench (
cargo bench --bench simd_probe, aarch64):match_h2miss/hitmatch_empty_or_deletedHonest caveats
argc == b'3',can_inline_writescheck) that cost more than they save at pipeline depth 1. Zero-copy fix reduced this from ~7% to ~2.5% — essentially noise given run variance (7-14%). At p=16 these overheads amortise over 16 commands and the inline SET savings dominate (+34% win).maxclientscounter double-decrement makes redis-benchmark unusable on monoio — both baselines and this PR produceSET: rps=0.0without the fix.Files changed
src/acl/table.rsis_user_unrestricted()helpersrc/server/conn/core.rscached_acl_unrestricted+refresh_acl_cache()src/server/conn/handler_{monoio,sharded,single}.rssrc/server/conn/blocking.rssrc/server/conn/tests.rssrc/storage/dashtable/simd.rsmatch_h2+match_empty_or_deletedbenches/simd_probe.rsUnsafe audit
All 4 new unsafe blocks (NEON SIMD) are [SOUND] — 178/178 SAFETY comments pass
scripts/audit-unsafe.sh.neon_bitmask_from_cmpis a safefnwith internal unsafe block (no raw pointer params).vld1q_u8has no alignment requirement unlike SSE2.Test plan
cargo fmt --checkcleancargo clippy -- -D warningsclean (both feature sets)cargo bench --bench simd_probe)redis-benchmark SET -P 16A/B on Linux aarch64 — +34% avg, peak +48%scripts/audit-unsafe.sh178/178 passDepends on
#72 —
fix(metrics): connection counter double-decrement on monoio path. Without that fix, redis-benchmark cannot produce real throughput on aarch64 monoio. Merge #72 first, then validate this PR.Summary by CodeRabbit
Performance
Tests