Skip to content

Commit 4f56a16

Browse files
author
techartdev
committed
feat: update version to 0.5.62 and fix gateway restart loop issues
1 parent 6a08c66 commit 4f56a16

3 files changed

Lines changed: 55 additions & 17 deletions

File tree

openclaw_assistant/CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
All notable changes to the OpenClaw Assistant Home Assistant Add-on will be documented in this file.
44

5-
## [0.5.61] - 2026-03-10
5+
## [0.5.62] - 2026-03-10
66

77
### Fixed
8-
- **Gateway restart loop** (issue #95): `openclaw gateway run` spawns `openclaw-gateway` as the actual long-running daemon; the launcher wrapper exits immediately. The old self-restart detection used `pgrep -f "openclaw.*(gateway|node).*run"` which never matched the live daemon name, so the supervisor always fell through to the restart path, found the port occupied, and looped forever with "already listening". Fixed by using the pattern `openclaw.*(gateway|node)` (without `.*run`) which correctly matches `openclaw-gateway`. Additionally, the loopback relay (tailnet mode) is now stopped before restarting the gateway and restarted after, preventing it from holding the port during supervisor-initiated restarts.
8+
- **Gateway restart loop** (issue #95): `openclaw gateway run` is a thin wrapper that spawns `openclaw-gateway` as a long-running daemon then exits. The supervisor had two bugs: (1) `pgrep` pattern `"openclaw.*(gateway|node).*run"` never matched the daemon name `openclaw-gateway`, so self-restarts were never detected; (2) after re-tracking a self-restarted PID, `wait` failed with "pid N is not a child of this shell" (exit 127) because the new daemon was spawned by the old one, not by run.sh. The supervisor loop now uses `pgrep -f "openclaw-gateway"` for reliable daemon detection and switches to `kill -0` polling for non-child PIDs instead of `wait`. The loopback relay (tailnet mode) is also stopped/restarted around supervisor-initiated gateway restarts to prevent port conflicts.
99

1010
## [0.5.60] - 2026-03-10
1111

openclaw_assistant/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: OpenClaw Assistant
2-
version: "0.5.61"
2+
version: "0.5.62"
33
slug: openclaw_assistant
44
description: Run OpenClaw Assistant (OpenClaw-compatible) as a Home Assistant add-on.
55
url: https://github.com/techartdev/OpenClawHomeAssistant

openclaw_assistant/run.sh

Lines changed: 52 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,7 @@ fi
394394
# ------------------------------------------------------------------------------
395395

396396
gateway_running() {
397-
pgrep -f "openclaw.*gateway.*run" >/dev/null 2>&1
397+
pgrep -f "openclaw-gateway" >/dev/null 2>&1
398398
}
399399

400400
cleanup_session_locks() {
@@ -489,7 +489,14 @@ shutdown() {
489489

490490
if [ -n "${GW_PID}" ] && kill -0 "${GW_PID}" >/dev/null 2>&1; then
491491
kill -TERM "${GW_PID}" >/dev/null 2>&1 || true
492-
wait "${GW_PID}" || true
492+
# wait reaps child PIDs; for non-child (re-tracked) PIDs it fails instantly,
493+
# so fall back to a timed kill -0 poll to let the gateway finish cleanly.
494+
if ! wait "${GW_PID}" 2>/dev/null; then
495+
for _i in 1 2 3 4 5; do
496+
kill -0 "${GW_PID}" 2>/dev/null || break
497+
sleep 1
498+
done
499+
fi
493500
fi
494501

495502
stop_gw_relay
@@ -971,25 +978,55 @@ fi
971978

972979
# Keep add-on alive even if gateway/node runtime restarts itself (e.g. during onboarding).
973980
# If runtime exits unexpectedly, restart it while nginx/ttyd stay up.
981+
#
982+
# Design notes (issue #95):
983+
# `openclaw gateway run` is a thin wrapper that spawns `openclaw-gateway` as a
984+
# long-running daemon and then exits. When the gateway self-restarts (SIGUSR1 /
985+
# `openclaw gateway restart`), the old daemon exits and a NEW daemon is forked —
986+
# the new PID is NOT a child of this shell so `wait` cannot block on it.
987+
# Strategy:
988+
# 1. `wait` for our child (the wrapper). When it exits, check if the daemon
989+
# is still alive (`pgrep`). If yes → re-track and poll with `kill -0`.
990+
# 2. When the re-tracked daemon eventually exits (crash or another restart),
991+
# `kill -0` fails, we check again for a live daemon to re-track, or restart.
992+
GW_IS_CHILD=true # true only when GW_PID was started by us (can use `wait`)
993+
974994
while true; do
975-
GW_EXIT_CODE=0
976-
wait "${GW_PID}" || GW_EXIT_CODE=$?
995+
if [ "$GW_IS_CHILD" = "true" ]; then
996+
# Efficient blocking wait on our child process.
997+
GW_EXIT_CODE=0
998+
wait "${GW_PID}" 2>/dev/null || GW_EXIT_CODE=$?
999+
else
1000+
# GW_PID is NOT our child (re-tracked after a self-restart).
1001+
# Poll with kill -0 until it exits.
1002+
while kill -0 "$GW_PID" 2>/dev/null; do
1003+
if [ "$SHUTTING_DOWN" = "true" ]; then break 2; fi
1004+
sleep 5
1005+
done
1006+
GW_EXIT_CODE=0
1007+
fi
9771008

9781009
if [ "$SHUTTING_DOWN" = "true" ]; then
9791010
break
9801011
fi
9811012

982-
# Detect agent/user-initiated self-restart (e.g. 'openclaw gateway restart').
983-
# 'openclaw gateway run' spawns 'openclaw-gateway' as the actual long-running
984-
# daemon; the launcher wrapper exits immediately. The old pattern '.*run' never
985-
# matched the live daemon name, so the supervisor always fell through to the
986-
# restart path, hit the gateway still on the port, and looped forever.
987-
# Use the broader pattern that matches both 'openclaw-gateway' and 'openclaw node run'.
988-
sleep 1
989-
RESTARTED_PID=$(pgrep -f "openclaw.*(gateway|node)" 2>/dev/null | head -1 || true)
990-
if [ -n "$RESTARTED_PID" ] && [ "$RESTARTED_PID" != "$GW_PID" ]; then
991-
echo "INFO: OpenClaw runtime restarted itself (new PID $RESTARTED_PID); re-tracking."
1013+
# Give a potential self-restart time to spawn the new daemon.
1014+
sleep 2
1015+
1016+
# Detect self-restart: look for a live `openclaw-gateway` process.
1017+
# If one exists, the runtime restarted itself — re-track and monitor
1018+
# instead of spawning a duplicate (which would collide on the port).
1019+
RESTARTED_PID=""
1020+
if [ "$GATEWAY_MODE" != "remote" ]; then
1021+
RESTARTED_PID=$(pgrep -f "openclaw-gateway" 2>/dev/null | head -1 || true)
1022+
else
1023+
RESTARTED_PID=$(pgrep -f "openclaw.*node.*run" 2>/dev/null | head -1 || true)
1024+
fi
1025+
1026+
if [ -n "$RESTARTED_PID" ]; then
1027+
echo "INFO: OpenClaw runtime restarted itself (new PID $RESTARTED_PID); monitoring."
9921028
GW_PID="$RESTARTED_PID"
1029+
GW_IS_CHILD=false
9931030
continue
9941031
fi
9951032

@@ -1005,6 +1042,7 @@ while true; do
10051042
echo "ERROR: Failed to restart OpenClaw runtime; retrying in 5s..."
10061043
sleep 5
10071044
else
1045+
GW_IS_CHILD=true
10081046
start_gw_relay
10091047
fi
10101048
done

0 commit comments

Comments
 (0)