Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 42 additions & 40 deletions vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -958,9 +958,9 @@ impl<D: DeviceBacking> NvmeDriver<D> {
.worker_data
.io
.iter()
.filter(|q| {
q.queue_data.qid == 1 || !q.queue_data.handler_data.pending_cmds.commands.is_empty()
})
// .filter(|q| {
// q.queue_data.qid == 1 || !q.queue_data.handler_data.pending_cmds.commands.is_empty()
// })
.flat_map(|q| -> Result<IoQueue<D>, anyhow::Error> {
Comment on lines +961 to 964
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue-restore selection filter has been commented out, leaving dead/commented code in the restore path. Please remove the commented-out code and update the surrounding restore-step comments to reflect the new behavior (restoring all I/O queues eagerly), or gate the behavior behind an explicit flag/config so this doesn’t ship as commented-out logic.

Copilot uses AI. Check for mistakes.
Comment on lines 957 to 964
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters restore semantics (empty queues are now restored eagerly instead of being deferred via proto_io), but there’s no unit/integration test covering save/restore behavior to catch regressions like a queue getting stuck in drain-after-restore. Please add a test in nvme_driver/src/tests.rs that saves state with a mix of empty/non-empty IO queues and verifies restore produces usable issuers/queues (and does not hang waiting for drain completion).

Copilot generated this review using guidance from repository custom instructions.
let qid = q.queue_data.qid;
let cpu = q.cpu;
Expand Down Expand Up @@ -989,12 +989,14 @@ impl<D: DeviceBacking> NvmeDriver<D> {
&pci_id,
q,
bounce_buffer,
if q.queue_data.handler_data.pending_cmds.commands.is_empty() {
if !q.queue_data.handler_data.pending_cmds.commands.is_empty() {
drain_after_restore_template.new_draining()
} else if q.queue_data.qid == 1 {
drain_after_restore_for_qid1
.take()
.expect("only QID 1 should be empty in eager restore")
.unwrap_or_else(|| drain_after_restore_template.new_self_drained())
} else {
drain_after_restore_template.new_draining()
drain_after_restore_template.new_self_drained()
},
Comment on lines +992 to 1000
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DrainAfterRestoreBuilder::new_self_drained() is used for empty restored queues. If the last Draining queue finishes and signals before a given empty queue is restored, that queue’s waiter will be created after the uncached signal and can remain stuck in SelfDrained forever (no new guest IO processed on that queue). Use DrainAfterRestoreBuilder::new_for_new_queue() (or pre-create/register all SelfDrained waiters before starting any draining queues) for empty restored queues to avoid missing the drain-complete signal when draining finishes early.

Copilot uses AI. Check for mistakes.
)?;
tracing::info!(qid, cpu, ?pci_id, "restoring queue: create issuer");
Expand All @@ -1009,40 +1011,40 @@ impl<D: DeviceBacking> NvmeDriver<D> {

// (2) Create prototype entries for any queues that don't currently have outstanding commands.
// They will be restored on demand later.
worker.proto_io = saved_state
.worker_data
.io
.iter()
.filter(|q| {
q.queue_data.qid != 1 && q.queue_data.handler_data.pending_cmds.commands.is_empty()
})
.zip(drain_after_restore_for_proto_queues)
.map(|(q, drain_after_restore)| {
// Create a prototype IO queue entry.
tracing::info!(
qid = q.queue_data.qid,
cpu = q.cpu,
?pci_id,
"creating prototype io queue entry",
);
max_seen_qid = max_seen_qid.max(q.queue_data.qid);
let mem_block = restored_memory
.iter()
.find(|mem| {
mem.len() == q.queue_data.mem_len && q.queue_data.base_pfn == mem.pfns()[0]
})
.expect("unable to find restored mem block")
.to_owned();
(
q.cpu,
ProtoIoQueue {
save_state: q.clone(),
mem: mem_block,
drain_after_restore,
},
)
})
.collect();
// worker.proto_io = saved_state
// .worker_data
// .io
// .iter()
// .filter(|q| {
// q.queue_data.qid != 1 && q.queue_data.handler_data.pending_cmds.commands.is_empty()
// })
// .zip(drain_after_restore_for_proto_queues)
// .map(|(q, drain_after_restore)| {
// // Create a prototype IO queue entry.
// tracing::info!(
// qid = q.queue_data.qid,
// cpu = q.cpu,
// ?pci_id,
// "creating prototype io queue entry",
// );
// max_seen_qid = max_seen_qid.max(q.queue_data.qid);
// let mem_block = restored_memory
// .iter()
// .find(|mem| {
// mem.len() == q.queue_data.mem_len && q.queue_data.base_pfn == mem.pfns()[0]
// })
// .expect("unable to find restored mem block")
// .to_owned();
// (
// q.cpu,
// ProtoIoQueue {
// save_state: q.clone(),
// mem: mem_block,
// drain_after_restore,
// },
// )
// })
// .collect();

Comment on lines +1029 to 1048
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prototype-queue restore block is entirely commented out. As a result, drain_after_restore_for_proto_queues (created above) becomes unused and the restore logic for proto_io is effectively disabled, but the surrounding code still carries the proto-queue scaffolding. Please either fully remove the proto-queue mechanism (and delete the now-unused precreation logic/variables), or re-enable it with the intended behavior—leaving it commented risks warnings and makes the restore semantics hard to reason about.

Suggested change
// );
// max_seen_qid = max_seen_qid.max(q.queue_data.qid);
// let mem_block = restored_memory
// .iter()
// .find(|mem| {
// mem.len() == q.queue_data.mem_len && q.queue_data.base_pfn == mem.pfns()[0]
// })
// .expect("unable to find restored mem block")
// .to_owned();
// (
// q.cpu,
// ProtoIoQueue {
// save_state: q.clone(),
// mem: mem_block,
// drain_after_restore,
// },
// )
// })
// .collect();

Copilot uses AI. Check for mistakes.
// Update next_ioq_id to avoid reusing qids.
worker.next_ioq_id = max_seen_qid + 1;
Expand Down
Loading