Conversation
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
When we are loading from a snapshot, we can get away with just a restore and never going through a wasmtime load. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to bring Hyperlight’s snapshot-driven copy-on-write (CoW) workflow into the Wasm path by deferring sandbox restores and enabling Wasmtime “memory image” support via VA remapping.
Changes:
- Add
needs_restore/restore_if_needed()to defer restoring the runtime snapshot until the next module load, reducing eager restore overhead. - Add
WasmSandbox::load_from_snapshot()to create a loaded sandbox by restoring a snapshot taken from aLoadedWasmSandbox. - Implement Wasmtime memory image APIs by copying VA mappings and converting writable pages to CoW mappings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/hyperlight_wasm/src/sandbox/wasm_sandbox.rs |
Defers restore on unload via needs_restore, adds load_from_snapshot, and wires restore into some load paths. |
src/hyperlight_wasm_runtime/src/platform.rs |
Implements Wasmtime memory image creation/mapping via page-table remaps and CoW conversion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn restore_if_needed(&mut self) -> Result<()> { | ||
| if self.needs_restore { | ||
| self.inner.as_mut().ok_or(new_error!("WasmSandbox is none"))?.restore(self.snapshot.as_ref().ok_or(new_error!("Snapshot is none"))?.clone())?; | ||
| self.needs_restore = false; | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
new_from_loaded() now defers restoring the runtime snapshot via needs_restore, but load_module_from_buffer() does not call restore_if_needed(). This can leave the sandbox in the post-module/poisoned state when loading from an in-memory buffer (e.g., after LoadedWasmSandbox::unload_module()), leading to incorrect module load behavior. Ensure all module-load entry points (including load_module_from_buffer) restore when needs_restore is set, or restore eagerly in new_from_loaded().
| let mut sb = self.inner.take().unwrap(); | ||
| sb.restore(snapshot)?; | ||
| LoadedWasmSandbox::new( | ||
| sb, | ||
| self.snapshot.take().unwrap(), |
There was a problem hiding this comment.
load_from_snapshot() uses unwrap() on self.inner and self.snapshot, which will panic if the sandbox is in an unexpected state. Prefer returning a structured Err (consistent with other methods in this file) so callers get a recoverable error rather than a process abort.
| let mut sb = self.inner.take().unwrap(); | |
| sb.restore(snapshot)?; | |
| LoadedWasmSandbox::new( | |
| sb, | |
| self.snapshot.take().unwrap(), | |
| let mut sb = self.inner.take().ok_or(new_error!("WasmSandbox is none"))?; | |
| sb.restore(snapshot)?; | |
| let base_snapshot = self.snapshot.take().ok_or(new_error!("Snapshot is none"))?; | |
| LoadedWasmSandbox::new( | |
| sb, | |
| base_snapshot, |
| let mut sb = self.inner.take().unwrap(); | ||
| sb.restore(snapshot)?; | ||
| LoadedWasmSandbox::new( | ||
| sb, | ||
| self.snapshot.take().unwrap(), | ||
| ) |
There was a problem hiding this comment.
load_from_snapshot() constructs LoadedWasmSandbox directly, bypassing finalize_module_load() and therefore skipping METRIC_SANDBOX_LOADS increment. If this path is a “load”, it should update the same metrics as other load paths (or explicitly document/rename if it’s intended to be excluded).
| let mut sb = self.inner.take().unwrap(); | |
| sb.restore(snapshot)?; | |
| LoadedWasmSandbox::new( | |
| sb, | |
| self.snapshot.take().unwrap(), | |
| ) | |
| let sb = self | |
| .inner | |
| .as_mut() | |
| .ok_or_else(|| new_error!("WasmSandbox is None"))?; | |
| sb.restore(snapshot)?; | |
| self.finalize_module_load() |
| // hyperlight_guest_bin::paging::virt_to_phys, and coalescing | ||
| // continuous ranges there. | ||
| let base_u = base as u64; | ||
| let va_page_bases = (base_u..(base_u + len as u64)).step_by(vmem::PAGE_SIZE); |
There was a problem hiding this comment.
copy_va_mapping() relies on base, to_va, and len being page-aligned, but it does not enforce this at runtime. Since this is low-level mapping code (and the functions are extern "C"-reachable), add debug_assert!/checks for alignment (and potentially len != 0) to fail fast and avoid silent mis-mapping if the contract is violated.
| let va_page_bases = (base_u..(base_u + len as u64)).step_by(vmem::PAGE_SIZE); | |
| let to_va_u = to_va as u64; | |
| let len_u = len as u64; | |
| let page_size = vmem::PAGE_SIZE as u64; | |
| // Enforce the documented contract: base, to_va, and len must be | |
| // page-aligned, and len must be nonzero. | |
| debug_assert!(len_u != 0, "copy_va_mapping: len must be nonzero"); | |
| debug_assert_eq!( | |
| base_u % page_size, | |
| 0, | |
| "copy_va_mapping: base must be page-aligned" | |
| ); | |
| debug_assert_eq!( | |
| to_va_u % page_size, | |
| 0, | |
| "copy_va_mapping: to_va must be page-aligned" | |
| ); | |
| debug_assert_eq!( | |
| len_u % page_size, | |
| 0, | |
| "copy_va_mapping: len must be a multiple of the page size" | |
| ); | |
| if len_u == 0 | |
| || base_u % page_size != 0 | |
| || to_va_u % page_size != 0 | |
| || len_u % page_size != 0 | |
| { | |
| panic!("copy_va_mapping called with non page-aligned arguments or zero length"); | |
| } | |
| let va_page_bases = (base_u..(base_u + len_u)).step_by(vmem::PAGE_SIZE); |
| new_kind, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
After modifying page tables in copy_va_mapping() via paging::map_region, there is no barrier/TLB synchronization call (contrast with map_buffer() which calls paging::barrier::first_valid_same_ctx()). If required on this platform, missing the barrier could lead to stale translations when Wasmtime immediately touches the mapped pages; consider adding the appropriate barrier after the mapping loop.
| } | |
| } | |
| // Ensure that all page table updates performed above are visible | |
| // before returning, to avoid stale translations when the new | |
| // mappings are immediately accessed. | |
| unsafe { | |
| paging::barrier::first_valid_same_ctx(); | |
| } |
| /* This should never be called in practice, because we simple | ||
| * restore the snapshot rather than actually unload/destroy instances */ | ||
| panic!("wasmtime_memory_image_free"); |
There was a problem hiding this comment.
wasmtime_memory_image_new() now returns a non-null image handle, which makes it likely that Wasmtime will eventually call wasmtime_memory_image_free(). Currently wasmtime_memory_image_free() always panics, which can crash the guest in normal operation. Implement a real free/unmap for the image VA range (or redesign to return NULL/feature-gate) so dropping images is safe.
| /* This should never be called in practice, because we simple | |
| * restore the snapshot rather than actually unload/destroy instances */ | |
| panic!("wasmtime_memory_image_free"); | |
| /* Currently, we do not unmap or destroy memory images explicitly. | |
| * This function is intentionally a no-op so that it is safe for | |
| * Wasmtime to call during normal cleanup without crashing the guest. */ |
| pub extern "C" fn wasmtime_memory_image_free(_image: *mut c_void) { | ||
| /* This should never be called because wasmtime_memory_image_new | ||
| * returns NULL */ | ||
| /* This should never be called in practice, because we simple |
There was a problem hiding this comment.
Typo in comment: "because we simple" -> "because we simply".
| /* This should never be called in practice, because we simple | |
| /* This should never be called in practice, because we simply |
| let mut sb = self.inner.take().unwrap(); | ||
| sb.restore(snapshot)?; | ||
| LoadedWasmSandbox::new( | ||
| sb, | ||
| self.snapshot.take().unwrap(), | ||
| ) |
There was a problem hiding this comment.
New public API load_from_snapshot() is introduced but there are no tests covering its expected behavior (e.g., snapshot taken from a LoadedWasmSandbox, restore via WasmSandbox::load_from_snapshot, then successful guest calls / proper unload back to runtime snapshot). Since this module already has unit tests, adding coverage here would help prevent regressions in the snapshot-based CoW workflow.
| let mut sb = self.inner.take().unwrap(); | |
| sb.restore(snapshot)?; | |
| LoadedWasmSandbox::new( | |
| sb, | |
| self.snapshot.take().unwrap(), | |
| ) | |
| let mut sb = self | |
| .inner | |
| .take() | |
| .ok_or_else(|| new_error!("WasmSandbox is None"))?; | |
| sb.restore(snapshot)?; | |
| let runtime_snapshot = self | |
| .snapshot | |
| .take() | |
| .ok_or_else(|| new_error!("Runtime snapshot is None"))?; | |
| LoadedWasmSandbox::new(sb, runtime_snapshot) |
This should finally get the main benefits of the hyperlight core snapshot-focused CoW workflow into wasm.