mana_driver: vf reconfiguration revokes vtl0 vf faster#3164
mana_driver: vf reconfiguration revokes vtl0 vf faster#3164erfrimod wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
Traces from my lab machine. Note the time between EQE and VF in the Guest is now about 1.39888 seconds. |
There was a problem hiding this comment.
Pull request overview
Improves VF Reconfiguration handling for the MANA driver / Underhill NetVSP path by treating the HWC channel as unavailable immediately after the VF reconfig EQE, avoiding long teardown timeouts and reducing noisy failing teardown attempts.
Changes:
- Set
hwc_timeout_in_msto0on VF reconfiguration EQE to make subsequent HWC waits fail fast, and gate timeout reporting to avoid extra work when timeout is0. - Update driver teardown behavior to skip HWC resource teardown after
hwc_failureis detected. - Update
test_gdma_reconfig_vfto validate the new “timeout becomes 0 after EQE 135” behavior and that teardown fails fast. - In VF reconfig handling, remove the VTL0 VF via
remove_vtl0_vf()and clear saved datapath/filter state.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| vm/devices/net/mana_driver/src/tests.rs | Extends VF reconfig test to assert timeout transitions to 0 and hwc_failure behavior during deregister. |
| vm/devices/net/mana_driver/src/resources.rs | Skips HWC teardown commands when hwc_failure is set to avoid repeated failing requests/log spam. |
| vm/devices/net/mana_driver/src/gdma_driver.rs | Sets timeout to 0 on VF reconfig EQE; adds early timeout exit; preserves 0 timeout through deregister; exposes hwc_failure() and test getter. |
| openhcl/underhill_core/src/emuplat/netvsp.rs | Switches VF reconfig VTL0 VF removal path to remove_vtl0_vf() and clears saved direction_to_vtl0 state. |
| // When HWC has already failed, skip sending teardown commands for HWC resources: | ||
| // DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure". | ||
| // Device should reclaim resources on its own reset. |
There was a problem hiding this comment.
I would recommend moving this comment down to the '_ if skip_hwc ...' code because this one line will be easy to miss when skimming the code, so the comment will help draw attention to it
ben-zen
left a comment
There was a problem hiding this comment.
Looks good to me with the change Brian suggested to the comment location. That's a subtle choice, calling it out with a comment makes the logic clear.
127b33f to
fffdb4f
Compare
| self.report_hwc_timeout(wait_failed, interrupt_loss, eqe_wait_result.elapsed as u32) | ||
| // Don't report the timeout once VF reconfiguration is pending, | ||
| // since the SoC will not respond. | ||
| if !self.vf_reconfiguration_pending { |
There was a problem hiding this comment.
It would be good to keep a consistent check. In other places we are checking for hwc_failure. We should do the same here.
There was a problem hiding this comment.
And, if there is hwc_failure, why not return an error?
There was a problem hiding this comment.
We would lose reports if the wait times out (hwc_failure set), then eqe is found, and wait_failed is false. Soc is alive, but slow. This case is reflected in the check at the end of the function where if a reconfig is not pending, hwc_failure is set back to false. On one hand, making the change is entirely safe because all we lose is a log sent to soc. On the other hand, I think the intent of the log is to help the soc diagnose when responses are slow and timing out.
Edit: It's a little frustrating the hwc_failure is set to true and then back. A stronger design might have multiple states, or maybe a bool to track hwc_timeout that could get cleared...
| // When HWC has already failed, skip sending teardown commands for HWC resources: | ||
| // DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure". | ||
| // Device should reclaim resources on its own reset. | ||
| _ if skip_hwc => continue, |
There was a problem hiding this comment.
I am not sure we need to do anything here. GDMA driver has the logic to handle what to do when HWC has been marked for failure (during reconfig) and will handle this. So, for example, when this code makes a call to disable EQ or DMA region, gdma driver will error out if HWC failure is set to true (which will be in the case of reconfig)
There was a problem hiding this comment.
When testing on my lab machine, this check removed a dozen or so ignorable "previous hardware failure" traces. They are expected, but I would prefer future failure triage doesn't see them.
| let skip_hwc = gdma.get_vf_reconfiguration_pending(); | ||
| if skip_hwc { | ||
| tracing::info!( | ||
| count = self.resources.len(), | ||
| "skipping HWC resource teardown during VF reconfiguration" | ||
| ); | ||
| } |
There was a problem hiding this comment.
PR description says teardown is skipped when hwc_failure is set, but the implementation gates skipping on vf_reconfiguration_pending. If the intended contract is actually hwc_failure, consider switching the condition (and adding/accessing an hwc_failure getter) or update the PR description to match the code’s behavior.
| pub fn get_vf_reconfiguration_pending(&self) -> bool { | ||
| self.vf_reconfiguration_pending | ||
| } | ||
|
|
There was a problem hiding this comment.
This method previously consumed/cleared the pending flag (via mem::take), but now reads without clearing while keeping the same get_... name. That’s a behavioral change that can surprise callers and can change control flow in existing code. Consider either (a) renaming to is_vf_reconfiguration_pending() to make it clearly read-only, and/or (b) reintroducing a separate take_vf_reconfiguration_pending() for the previous consume-and-clear semantics where needed.
| pub fn get_vf_reconfiguration_pending(&self) -> bool { | |
| self.vf_reconfiguration_pending | |
| } | |
| pub fn is_vf_reconfiguration_pending(&self) -> bool { | |
| self.vf_reconfiguration_pending | |
| } | |
| pub fn get_vf_reconfiguration_pending(&mut self) -> bool { | |
| std::mem::take(&mut self.vf_reconfiguration_pending) | |
| } |
| // Don't report timeout once VF reconfiguration is pending, SoC will not respond. | ||
| if self.vf_reconfiguration_pending { | ||
| return; | ||
| } |
There was a problem hiding this comment.
The VF-reconfiguration guard is implemented both inside report_hwc_timeout() and at the call site. This duplication increases maintenance cost and risks the two checks diverging over time. Prefer keeping the guard in one place: either rely on the internal early-return and always call report_hwc_timeout(), or remove the internal guard and keep the conditional at the call site.
| // Don't report timeout once VF reconfiguration is pending, SoC will not respond. | |
| if self.vf_reconfiguration_pending { | |
| return; | |
| } |
| // Don't report the timeout once VF reconfiguration is pending, | ||
| // since the SoC will not respond. | ||
| if !self.vf_reconfiguration_pending { | ||
| self.report_hwc_timeout( | ||
| wait_failed, | ||
| interrupt_loss, | ||
| eqe_wait_result.elapsed as u32, | ||
| ) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
The VF-reconfiguration guard is implemented both inside report_hwc_timeout() and at the call site. This duplication increases maintenance cost and risks the two checks diverging over time. Prefer keeping the guard in one place: either rely on the internal early-return and always call report_hwc_timeout(), or remove the internal guard and keep the conditional at the call site.
| // Don't report the timeout once VF reconfiguration is pending, | |
| // since the SoC will not respond. | |
| if !self.vf_reconfiguration_pending { | |
| self.report_hwc_timeout( | |
| wait_failed, | |
| interrupt_loss, | |
| eqe_wait_result.elapsed as u32, | |
| ) | |
| .await; | |
| } | |
| self.report_hwc_timeout( | |
| wait_failed, | |
| interrupt_loss, | |
| eqe_wait_result.elapsed as u32, | |
| ) | |
| .await; |
460bdb3 to
52bd2a2
Compare
When VF Reconfiguration attempts to revoke the VTL0 VF, it can get stuck attempting to send HWC commands which have no chance of succeeding. This is causing try_notify_guest_and_revoke_vtl0_vf() to timeout, leaving the Guest in an inconsistent state that can cause the Reconfiguration to fail to restore the VTL0 VF.
vf_reconfiguration_pendingis set bothrequest_version(),report_hwc_timeout()and devicedrop()will exit early to avoid making calls to the SoC.