fix(delete): handle already-exited processes during force-delete#752
fix(delete): handle already-exited processes during force-delete#752Yashika0724 wants to merge 2 commits into
Conversation
A force-delete can race with a guest shutdown or a previous kill attempt, leaving the VMM process already exited by the time killProcess() runs. In that case syscall.Kill(pid, SIGKILL) returns ESRCH, which currently propagates as an error even though the process is no longer running, causing the force-delete to abort before cleaning up the container's runtime state. Treat ESRCH from the initial SIGKILL as a successful outcome, and continue with Delete() even if Kill() returns an error, since the purpose of --force is to reclaim runtime state regardless of whether the target process is still around. Signed-off-by: Yashika0724 <ssyashika1311@gmail.com>
✅ Deploy Preview for urunc canceled.
|
|
Hi @cmainas, While tracing the force-delete path, I noticed that killProcess() returns ESRCH when the VMM has already exited before the initial SIGKILL, causing delete -f to exit before cleanup runs. This change treats that case as a successful outcome and allows the force-delete path to continue with cleanup. Other kill errors remain unchanged. Please let me know if you'd prefer a different approach or any changes to the implementation. |
…ady-exited # Conflicts: # pkg/unikontainers/hypervisors/utils.go
|
Update: I've rebased onto main to resolve the conflict. It turns out the What remains is the change in Happy to adjust the approach if you'd prefer a narrower error check instead of a blanket log-and-continue. |
What changed
killProcess()ESRCHfrom the initialSIGKILLas a successful outcome.killProcess()runs. In that casesyscall.Kill(pid, SIGKILL)returnsESRCH, which currently propagates as an error even though the process is no longer running.delete --forceDelete()even ifKill()returns an error.--forceis to reclaim runtime state. Returning early from the force-delete path can leave the container state directory behind even when the target process is already gone.Reproducer
One way to reproduce this is:
delete --forceruns).urunc delete -f <id>.Before this change,
Kill()may returnESRCHand the delete operation exits early.After this change, an already-exited process is treated as a successful kill and cleanup proceeds normally.
Notes
This change only special-cases
ESRCH("no such process"). Other kill failures continue to be returned and logged as before.