-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Summary
We experienced an issue where a self-hosted GitHub Actions runner remained in an active (busy) state indefinitely, preventing new jobs from being picked up. This impacted multiple repositories using the same runner infrastructure.
Impact
Affected 3 repositories sharing the same self-hosted runner host
Jobs remained in queued state for several hours (~3+ hours)
CI/CD pipelines (including SonarQube analysis) were blocked
Manual intervention (runner restart) was required to restore functionality
Observed Behavior
Runner appears as Active in GitHub UI
New jobs show:
Waiting for a runner to pick up this job...
Requested labels: self-hosted
No new jobs are picked up despite runner being online and connected
Runner logs show it is listening for jobs, but an existing worker process remains attached
System inspection shows a hanging process:
npm run test:co
└─ vitest run --coverage
Runner does not automatically recover or release the job
Expected Behavior
Runner should:
Detect and terminate stalled/hung jobs
Return to Idle state after job timeout or failure
Continue picking up queued jobs without manual restart
Temporary Resolution
Restarting the runner service resolves the issue:
sudo systemctl restart actions.runner...service
After restart:
Runner returns to Idle
Queued jobs are immediately picked up
Environment
Self-hosted runners on Linux (Ubuntu)
Multiple runners installed on same host (different repos)
Runner version: 2.333.0
Workloads include:
Node.js (Vitest tests with coverage)
SonarQube analysis
CI/CD pipelines
Suspected Cause
Long-running or hanging test process (vitest --coverage) does not exit cleanly
Runner does not enforce timeout or detect job inactivity
Worker process (Runner.Worker) remains attached indefinitely
No automatic cleanup or job termination
Suggested Improvements
Add automatic detection of stalled jobs (e.g., no output / no CPU activity threshold)
Enforce configurable job timeouts at runner level
Improve visibility in GitHub UI for:
currently running job duration
stuck/hung jobs
Allow runner to gracefully recover without requiring full service restart
Additional Notes
This issue affected multiple repositories sharing the same infrastructure, indicating it is not isolated to a single workflow
Adding workflow-level timeout-minutes mitigates the issue but does not fully solve runner-level hangs