netvsp: fix untrusted guest input errors#3183
netvsp: fix untrusted guest input errors#3183erfrimod wants to merge 6 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the netvsp VMBus NIC implementation against malformed/untrusted guest inputs (including fuzz-discovered edge cases) by replacing panic paths with explicit validation and structured errors, plus adding regression tests.
Changes:
- Add defensive validation for RX buffer partitioning (
RxBufferRanges) to prevent divide-by-zero/underflow and propagate typed errors. - Replace
GuestBuffers::newpanic/unwrap paths withGuestBuffersErrorand plumb that error into the worker error surface. - Fix
max_subchannels()to correctly exclude the primary channel and add a test for rejecting RSS params withindirection_table_size == 0.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| vm/devices/net/netvsp/src/test.rs | Adds regression tests covering RSS invalid input and RX buffer configuration validation. |
| vm/devices/net/netvsp/src/lib.rs | Adds RX buffer config validation + error wiring, fixes max_subchannels logic, and rejects RSS indirection table size of 0. |
| vm/devices/net/netvsp/src/buffers.rs | Introduces GuestBuffersError and converts prior assert/unwrap failure modes into recoverable errors with tests. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| mtu, | ||
| }); | ||
| } | ||
| if gpadl.first().is_none() { |
There was a problem hiding this comment.
should the gpadl empty check come before the other checks, since we have no need to do any math here? Which condition is more likely to hit, or there's not one?
There was a problem hiding this comment.
I'm happy to move it to the top. I have no idea which condition is more likely to hit. (I found these by fuzzing).
| #[error("GPADL has no ranges")] | ||
| EmptyGpadl, | ||
| #[error("guest memory error")] | ||
| Memory(#[source] GuestMemoryError), |
There was a problem hiding this comment.
is this really a lock error, not generic memory?
| if !active_queues.is_empty() { | ||
| active_queues.len() as u16 | ||
| } else { | ||
| tracelimit::warn_ratelimited!("Invalid RSS indirection table"); |
There was a problem hiding this comment.
do we need any more information with this breadcrumb?
| let locked_pages = mem.lock_gpns(false, &gpns)?; | ||
| let gpns = gpadl | ||
| .first() | ||
| .ok_or(GuestBuffersError::EmptyGpadl)? |
There was a problem hiding this comment.
This check is redundant since validate_config already checked this
| @@ -1254,7 +1286,9 @@ impl VmbusDevice for Nic { | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Nit: should we rename NETVSP_MAX_SUBCHANNELS_PER_VNIC to NETVSP_MAX_CHANNELS_PER_VNIC? The current name is a bit misleading.
There was a problem hiding this comment.
I am not against a rename, but not as part of this PR as I am focusing on turning panics into errors. ;)
There was a problem hiding this comment.
Interestingly I think subchannels is used correctly everywhere except when we use it to set max_queues here. I see ManaDevice::new called with max_bus_channels + 1 for instance. So maybe the problem is the max_queues value. I think we need to confidently answer this question so that we don't end up picking a different number than we have assigned to the hardware.
ben-zen
left a comment
There was a problem hiding this comment.
Otherwise looks good, but I'm not a fan of validate functions that don't then output the result of the validation; there's a few others in lib.rs, but they aren't new.
Fuzz testing found several places where netvsp and vmbus channel could panic on edge-case inputs: untrusted guest data, malformed save state, function calls with out-of-range channel indices. Adding defensive validation to turn those panics into Errors.