OHOS: Implement a maximum number of failed runner spawns before exiting#121
OHOS: Implement a maximum number of failed runner spawns before exiting#121Narfinger wants to merge 4 commits into
Conversation
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
|
Note: currently undergoing testing on hos-ci1. |
|
Hmm, doesn't seem to be working as intended (after deleting the image, I expected growing retry times, and self-exit). I also enabled debug logging later, but that didn't add more useful information. It seems we don't see the docker spawn command failing, which is wierd. Needs more investigation at a later time, maybe I'm missing something in the code. |
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
|
|
||
| // The above command will not give an error if the docker command exists | ||
| // but the command exited with failure. | ||
| std::thread::sleep(Duration::from_millis(100)); |
There was a problem hiding this comment.
looks like this might need to be ~2 seconds. time reports around 1.1s before docker run non_existing_local_image --rm returns.
There was a problem hiding this comment.
I think perhaps we should move the retries.reset() to the place where we check if the container exited. Exit code 125 means docker run failed. Can we get the actual runtime from the exit status? If yes maybe besides checking the exit code we could also do a heuristic (shorter than 5 seconds probably means something went wrong)
There was a problem hiding this comment.
Ok then I need to rething this a bit.
This will stop the docker_jit_container if we are spawning runners but there is a problem with spawning it on our side.