Skip to content

Take better advantage of CoW/snapshots#437

Open
syntactically wants to merge 2 commits intomainfrom
lm/cow-wasm-work
Open

Take better advantage of CoW/snapshots#437
syntactically wants to merge 2 commits intomainfrom
lm/cow-wasm-work

Conversation

@syntactically
Copy link
Copy Markdown
Member

This should finally get the main benefits of the hyperlight core snapshot-focused CoW workflow into wasm.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
When we are loading from a snapshot, we can get away with just a
restore and never going through a wasmtime load.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@syntactically syntactically added the kind/enhancement New feature or request label Mar 30, 2026
@danbugs danbugs requested a review from Copilot March 30, 2026 21:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to bring Hyperlight’s snapshot-driven copy-on-write (CoW) workflow into the Wasm path by deferring sandbox restores and enabling Wasmtime “memory image” support via VA remapping.

Changes:

  • Add needs_restore / restore_if_needed() to defer restoring the runtime snapshot until the next module load, reducing eager restore overhead.
  • Add WasmSandbox::load_from_snapshot() to create a loaded sandbox by restoring a snapshot taken from a LoadedWasmSandbox.
  • Implement Wasmtime memory image APIs by copying VA mappings and converting writable pages to CoW mappings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/hyperlight_wasm/src/sandbox/wasm_sandbox.rs Defers restore on unload via needs_restore, adds load_from_snapshot, and wires restore into some load paths.
src/hyperlight_wasm_runtime/src/platform.rs Implements Wasmtime memory image creation/mapping via page-table remaps and CoW conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +81 to +87
fn restore_if_needed(&mut self) -> Result<()> {
if self.needs_restore {
self.inner.as_mut().ok_or(new_error!("WasmSandbox is none"))?.restore(self.snapshot.as_ref().ok_or(new_error!("Snapshot is none"))?.clone())?;
self.needs_restore = false;
}
Ok(())
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_from_loaded() now defers restoring the runtime snapshot via needs_restore, but load_module_from_buffer() does not call restore_if_needed(). This can leave the sandbox in the post-module/poisoned state when loading from an in-memory buffer (e.g., after LoadedWasmSandbox::unload_module()), leading to incorrect module load behavior. Ensure all module-load entry points (including load_module_from_buffer) restore when needs_restore is set, or restore eagerly in new_from_loaded().

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +118
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_from_snapshot() uses unwrap() on self.inner and self.snapshot, which will panic if the sandbox is in an unexpected state. Prefer returning a structured Err (consistent with other methods in this file) so callers get a recoverable error rather than a process abort.

Suggested change
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
let mut sb = self.inner.take().ok_or(new_error!("WasmSandbox is none"))?;
sb.restore(snapshot)?;
let base_snapshot = self.snapshot.take().ok_or(new_error!("Snapshot is none"))?;
LoadedWasmSandbox::new(
sb,
base_snapshot,

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +119
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_from_snapshot() constructs LoadedWasmSandbox directly, bypassing finalize_module_load() and therefore skipping METRIC_SANDBOX_LOADS increment. If this path is a “load”, it should update the same metrics as other load paths (or explicitly document/rename if it’s intended to be excluded).

Suggested change
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
)
let sb = self
.inner
.as_mut()
.ok_or_else(|| new_error!("WasmSandbox is None"))?;
sb.restore(snapshot)?;
self.finalize_module_load()

Copilot uses AI. Check for mistakes.
// hyperlight_guest_bin::paging::virt_to_phys, and coalescing
// continuous ranges there.
let base_u = base as u64;
let va_page_bases = (base_u..(base_u + len as u64)).step_by(vmem::PAGE_SIZE);
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_va_mapping() relies on base, to_va, and len being page-aligned, but it does not enforce this at runtime. Since this is low-level mapping code (and the functions are extern "C"-reachable), add debug_assert!/checks for alignment (and potentially len != 0) to fail fast and avoid silent mis-mapping if the contract is violated.

Suggested change
let va_page_bases = (base_u..(base_u + len as u64)).step_by(vmem::PAGE_SIZE);
let to_va_u = to_va as u64;
let len_u = len as u64;
let page_size = vmem::PAGE_SIZE as u64;
// Enforce the documented contract: base, to_va, and len must be
// page-aligned, and len must be nonzero.
debug_assert!(len_u != 0, "copy_va_mapping: len must be nonzero");
debug_assert_eq!(
base_u % page_size,
0,
"copy_va_mapping: base must be page-aligned"
);
debug_assert_eq!(
to_va_u % page_size,
0,
"copy_va_mapping: to_va must be page-aligned"
);
debug_assert_eq!(
len_u % page_size,
0,
"copy_va_mapping: len must be a multiple of the page size"
);
if len_u == 0
|| base_u % page_size != 0
|| to_va_u % page_size != 0
|| len_u % page_size != 0
{
panic!("copy_va_mapping called with non page-aligned arguments or zero length");
}
let va_page_bases = (base_u..(base_u + len_u)).step_by(vmem::PAGE_SIZE);

Copilot uses AI. Check for mistakes.
new_kind,
);
}
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying page tables in copy_va_mapping() via paging::map_region, there is no barrier/TLB synchronization call (contrast with map_buffer() which calls paging::barrier::first_valid_same_ctx()). If required on this platform, missing the barrier could lead to stale translations when Wasmtime immediately touches the mapped pages; consider adding the appropriate barrier after the mapping loop.

Suggested change
}
}
// Ensure that all page table updates performed above are visible
// before returning, to avoid stale translations when the new
// mappings are immediately accessed.
unsafe {
paging::barrier::first_valid_same_ctx();
}

Copilot uses AI. Check for mistakes.
Comment on lines +262 to 264
/* This should never be called in practice, because we simple
* restore the snapshot rather than actually unload/destroy instances */
panic!("wasmtime_memory_image_free");
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasmtime_memory_image_new() now returns a non-null image handle, which makes it likely that Wasmtime will eventually call wasmtime_memory_image_free(). Currently wasmtime_memory_image_free() always panics, which can crash the guest in normal operation. Implement a real free/unmap for the image VA range (or redesign to return NULL/feature-gate) so dropping images is safe.

Suggested change
/* This should never be called in practice, because we simple
* restore the snapshot rather than actually unload/destroy instances */
panic!("wasmtime_memory_image_free");
/* Currently, we do not unmap or destroy memory images explicitly.
* This function is intentionally a no-op so that it is safe for
* Wasmtime to call during normal cleanup without crashing the guest. */

Copilot uses AI. Check for mistakes.
pub extern "C" fn wasmtime_memory_image_free(_image: *mut c_void) {
/* This should never be called because wasmtime_memory_image_new
* returns NULL */
/* This should never be called in practice, because we simple
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "because we simple" -> "because we simply".

Suggested change
/* This should never be called in practice, because we simple
/* This should never be called in practice, because we simply

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +119
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New public API load_from_snapshot() is introduced but there are no tests covering its expected behavior (e.g., snapshot taken from a LoadedWasmSandbox, restore via WasmSandbox::load_from_snapshot, then successful guest calls / proper unload back to runtime snapshot). Since this module already has unit tests, adding coverage here would help prevent regressions in the snapshot-based CoW workflow.

Suggested change
let mut sb = self.inner.take().unwrap();
sb.restore(snapshot)?;
LoadedWasmSandbox::new(
sb,
self.snapshot.take().unwrap(),
)
let mut sb = self
.inner
.take()
.ok_or_else(|| new_error!("WasmSandbox is None"))?;
sb.restore(snapshot)?;
let runtime_snapshot = self
.snapshot
.take()
.ok_or_else(|| new_error!("Runtime snapshot is None"))?;
LoadedWasmSandbox::new(sb, runtime_snapshot)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants