Skip to content

netvsp: fix untrusted guest input errors#3183

Open
erfrimod wants to merge 6 commits intomicrosoft:mainfrom
erfrimod:erfrimod/netvsp-arithmetic-safety
Open

netvsp: fix untrusted guest input errors#3183
erfrimod wants to merge 6 commits intomicrosoft:mainfrom
erfrimod:erfrimod/netvsp-arithmetic-safety

Conversation

@erfrimod
Copy link
Copy Markdown
Contributor

@erfrimod erfrimod commented Apr 2, 2026

Fuzz testing found several places where netvsp and vmbus channel could panic on edge-case inputs: untrusted guest data, malformed save state, function calls with out-of-range channel indices. Adding defensive validation to turn those panics into Errors.

  • Added checks for invalid GuestBuffer allocations
  • RxBufferRanges added checks prevent underflow and division by zero from failing restart_queues
  • Correcting logic in max_subchannels
  • OID RSS set parameters checks for divide by zero indirection table size

Copilot AI review requested due to automatic review settings April 2, 2026 22:56
@erfrimod erfrimod requested a review from a team as a code owner April 2, 2026 22:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the netvsp VMBus NIC implementation against malformed/untrusted guest inputs (including fuzz-discovered edge cases) by replacing panic paths with explicit validation and structured errors, plus adding regression tests.

Changes:

  • Add defensive validation for RX buffer partitioning (RxBufferRanges) to prevent divide-by-zero/underflow and propagate typed errors.
  • Replace GuestBuffers::new panic/unwrap paths with GuestBuffersError and plumb that error into the worker error surface.
  • Fix max_subchannels() to correctly exclude the primary channel and add a test for rejecting RSS params with indirection_table_size == 0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
vm/devices/net/netvsp/src/test.rs Adds regression tests covering RSS invalid input and RX buffer configuration validation.
vm/devices/net/netvsp/src/lib.rs Adds RX buffer config validation + error wiring, fixes max_subchannels logic, and rejects RSS indirection table size of 0.
vm/devices/net/netvsp/src/buffers.rs Introduces GuestBuffersError and converts prior assert/unwrap failure modes into recoverable errors with tests.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread vm/devices/net/netvsp/src/lib.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 2, 2026 23:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread vm/devices/net/netvsp/src/buffers.rs
Comment thread vm/devices/net/netvsp/src/buffers.rs Outdated
mtu,
});
}
if gpadl.first().is_none() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the gpadl empty check come before the other checks, since we have no need to do any math here? Which condition is more likely to hit, or there's not one?

Copy link
Copy Markdown
Contributor Author

@erfrimod erfrimod Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move it to the top. I have no idea which condition is more likely to hit. (I found these by fuzzing).

Comment thread vm/devices/net/netvsp/src/buffers.rs Outdated
#[error("GPADL has no ranges")]
EmptyGpadl,
#[error("guest memory error")]
Memory(#[source] GuestMemoryError),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really a lock error, not generic memory?

Comment thread vm/devices/net/netvsp/src/lib.rs Outdated
if !active_queues.is_empty() {
active_queues.len() as u16
} else {
tracelimit::warn_ratelimited!("Invalid RSS indirection table");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need any more information with this breadcrumb?

@benhillis benhillis added the bug Something isn't working label Apr 6, 2026
Comment thread vm/devices/net/netvsp/src/buffers.rs Outdated
let locked_pages = mem.lock_gpns(false, &gpns)?;
let gpns = gpadl
.first()
.ok_or(GuestBuffersError::EmptyGpadl)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is redundant since validate_config already checked this

@@ -1254,7 +1286,9 @@ impl VmbusDevice for Nic {
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we rename NETVSP_MAX_SUBCHANNELS_PER_VNIC to NETVSP_MAX_CHANNELS_PER_VNIC? The current name is a bit misleading.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against a rename, but not as part of this PR as I am focusing on turning panics into errors. ;)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly I think subchannels is used correctly everywhere except when we use it to set max_queues here. I see ManaDevice::new called with max_bus_channels + 1 for instance. So maybe the problem is the max_queues value. I think we need to confidently answer this question so that we don't end up picking a different number than we have assigned to the hardware.

Copilot AI review requested due to automatic review settings April 6, 2026 20:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@ben-zen ben-zen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good, but I'm not a fan of validate functions that don't then output the result of the validation; there's a few others in lib.rs, but they aren't new.

Comment thread vm/devices/net/netvsp/src/lib.rs
Copy link
Copy Markdown
Contributor

@ben-zen ben-zen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants