Skip to content

OHOS: Implement a maximum number of failed runner spawns before exiting#121

Draft
Narfinger wants to merge 4 commits into
servo:mainfrom
Narfinger:loop-max
Draft

OHOS: Implement a maximum number of failed runner spawns before exiting#121
Narfinger wants to merge 4 commits into
servo:mainfrom
Narfinger:loop-max

Conversation

@Narfinger
Copy link
Copy Markdown
Contributor

This will stop the docker_jit_container if we are spawning runners but there is a problem with spawning it on our side.

Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Comment thread docker/docker_jit_monitor/src/main.rs Outdated
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Comment thread docker/docker_jit_monitor/src/main.rs
Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
@jschwe
Copy link
Copy Markdown
Member

jschwe commented Apr 1, 2026

Note: currently undergoing testing on hos-ci1.

@jschwe
Copy link
Copy Markdown
Member

jschwe commented Apr 1, 2026

Hmm, doesn't seem to be working as intended (after deleting the image, I expected growing retry times, and self-exit). I also enabled debug logging later, but that didn't add more useful information. It seems we don't see the docker spawn command failing, which is wierd. Needs more investigation at a later time, maybe I'm missing something in the code.

Apr 01 09:31:46 drc-servo-ci-1 docker_jit_monitor[3122732]: √ Connected to GitHub
Apr 01 09:31:47 drc-servo-ci-1 docker_jit_monitor[3122732]: Current runner version: '2.333.0'
Apr 01 09:31:47 drc-servo-ci-1 docker_jit_monitor[3122732]: 2026-04-01 09:31:47Z: Listening for Jobs
Apr 01 09:31:47 drc-servo-ci-1 docker_jit_monitor[3122872]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:31:49 drc-servo-ci-1 docker_jit_monitor[3122872]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:31:49 drc-servo-ci-1 docker_jit_monitor[3122872]: Run 'docker run --help' for more information
Apr 01 09:31:50 drc-servo-ci-1 docker_jit_monitor[3122903]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:31:50 drc-servo-ci-1 docker_jit_monitor[3122732]: 2026-04-01 09:31:50Z: Running job: OpenHarmony (Bencher) / OpenHarmony / HarmonyOS Build (aarch64)
Apr 01 09:31:51 drc-servo-ci-1 docker_jit_monitor[3122903]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:31:51 drc-servo-ci-1 docker_jit_monitor[3122903]: Run 'docker run --help' for more information
Apr 01 09:31:53 drc-servo-ci-1 docker_jit_monitor[3122951]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:31:54 drc-servo-ci-1 docker_jit_monitor[3122951]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:31:54 drc-servo-ci-1 docker_jit_monitor[3122951]: Run 'docker run --help' for more information
Apr 01 09:31:55 drc-servo-ci-1 docker_jit_monitor[3123020]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:31:56 drc-servo-ci-1 docker_jit_monitor[3123020]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:31:56 drc-servo-ci-1 docker_jit_monitor[3123020]: Run 'docker run --help' for more information
Apr 01 09:31:58 drc-servo-ci-1 docker_jit_monitor[3123051]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:31:59 drc-servo-ci-1 docker_jit_monitor[3123051]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:31:59 drc-servo-ci-1 docker_jit_monitor[3123051]: Run 'docker run --help' for more information
Apr 01 09:32:00 drc-servo-ci-1 docker_jit_monitor[3123080]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:01 drc-servo-ci-1 docker_jit_monitor[3123080]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:01 drc-servo-ci-1 docker_jit_monitor[3123080]: Run 'docker run --help' for more information
Apr 01 09:32:03 drc-servo-ci-1 docker_jit_monitor[3123109]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:04 drc-servo-ci-1 docker_jit_monitor[3123109]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:04 drc-servo-ci-1 docker_jit_monitor[3123109]: Run 'docker run --help' for more information
Apr 01 09:32:06 drc-servo-ci-1 docker_jit_monitor[3123137]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:07 drc-servo-ci-1 docker_jit_monitor[3123137]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:07 drc-servo-ci-1 docker_jit_monitor[3123137]: Run 'docker run --help' for more information
Apr 01 09:32:09 drc-servo-ci-1 docker_jit_monitor[3123164]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:10 drc-servo-ci-1 docker_jit_monitor[3123164]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:10 drc-servo-ci-1 docker_jit_monitor[3123164]: Run 'docker run --help' for more information
Apr 01 09:32:11 drc-servo-ci-1 docker_jit_monitor[3123190]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:12 drc-servo-ci-1 docker_jit_monitor[3123190]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:12 drc-servo-ci-1 docker_jit_monitor[3123190]: Run 'docker run --help' for more information
Apr 01 09:32:14 drc-servo-ci-1 docker_jit_monitor[3123218]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:15 drc-servo-ci-1 docker_jit_monitor[3123218]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:15 drc-servo-ci-1 docker_jit_monitor[3123218]: Run 'docker run --help' for more information
Apr 01 09:32:16 drc-servo-ci-1 docker_jit_monitor[3123246]: Unable to find image 'servo_gha_hos_runner:latest' locally
Apr 01 09:32:17 drc-servo-ci-1 docker_jit_monitor[3123246]: docker: Error response from daemon: pull access denied for servo_gha_hos_runner, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Apr 01 09:32:17 drc-servo-ci-1 docker_jit_monitor[3123246]: Run 'docker run --help' for more information
Apr 01 09:32:19 drc-servo-ci-1 docker_jit_monitor[3123275]: Unable to find image 'servo_gha_hos_runner:latest' locally

Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>

// The above command will not give an error if the docker command exists
// but the command exited with failure.
std::thread::sleep(Duration::from_millis(100));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this might need to be ~2 seconds. time reports around 1.1s before docker run non_existing_local_image --rm returns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps we should move the retries.reset() to the place where we check if the container exited. Exit code 125 means docker run failed. Can we get the actual runtime from the exit status? If yes maybe besides checking the exit code we could also do a heuristic (shorter than 5 seconds probably means something went wrong)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I need to rething this a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants