Skip to content

Commit 6a08c66

Browse files
author
techartdev
committed
feat: update version to 0.5.61 and fix gateway restart loop in tailnet mode
1 parent 02814db commit 6a08c66

3 files changed

Lines changed: 49 additions & 25 deletions

File tree

openclaw_assistant/CHANGELOG.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,18 @@
22

33
All notable changes to the OpenClaw Assistant Home Assistant Add-on will be documented in this file.
44

5+
## [0.5.61] - 2026-03-10
6+
7+
### Fixed
8+
- **Gateway restart loop** (issue #95): `openclaw gateway run` spawns `openclaw-gateway` as the actual long-running daemon; the launcher wrapper exits immediately. The old self-restart detection used `pgrep -f "openclaw.*(gateway|node).*run"` which never matched the live daemon name, so the supervisor always fell through to the restart path, found the port occupied, and looped forever with "already listening". Fixed by using the pattern `openclaw.*(gateway|node)` (without `.*run`) which correctly matches `openclaw-gateway`. Additionally, the loopback relay (tailnet mode) is now stopped before restarting the gateway and restarted after, preventing it from holding the port during supervisor-initiated restarts.
9+
510
## [0.5.60] - 2026-03-10
611

712
### Fixed
813
- **Session lock cleanup ignored non-default agents**: `cleanup_session_locks` was hardcoded to `agents/main/sessions`, skipping stale locks for any agent with a custom `forcedAgentId`. Stale locks could block the gateway from opening sessions for those agents, causing silent fallback to `main`. Cleanup now scans all `agents/*/sessions/` directories.
914

1015
## [0.5.59] - 2026-03-10
1116

12-
### Fixed
13-
- **Gateway restart loop** (issue #95): when the agent or user ran `openclaw gateway restart`, the supervisor loop detected the old PID exiting and immediately spawned a second gateway instance, which collided with the already-restarted one and looped with "another gateway instance is already listening". The supervisor now detects a self-restart (new PID already running on the same port) and re-tracks it instead of spawning a duplicate.
1417
- **Remote mode URL not propagated** (issue #93): `start_openclaw_runtime` was reading `gateway.remote.url` back via `openclaw config get`, which can time out (2 s limit at startup) or return an empty/redacted result. The function now uses `$GATEWAY_REMOTE_URL` directly from the already-parsed add-on options, which is the same value the config helper writes to `openclaw.json`.
1518
- **Terminal CLI unreachable in tailnet mode** (issue #90): when `gateway_bind_mode=tailnet` (or `access_mode=tailnet_https`), the gateway binds only to the Tailscale IP. The local CLI always connects via `ws://127.0.0.1:PORT`, causing "Gateway not running" inside the add-on terminal. A lightweight loopback relay (Node.js) is now started automatically to forward `127.0.0.1:PORT → TAILSCALE_IP:PORT`, making all terminal CLI commands work normally. Token auth is still enforced end-to-end by the gateway.
1619
- **Session lock cleanup ignored non-default agents**: `cleanup_session_locks` was hardcoded to `agents/main/sessions`, skipping stale locks for any agent with a custom `forcedAgentId`. Stale locks could block the gateway from opening sessions for those agents, causing silent fallback to `main`. Cleanup now scans all `agents/*/sessions/` directories.

openclaw_assistant/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: OpenClaw Assistant
2-
version: "0.5.60"
2+
version: "0.5.61"
33
slug: openclaw_assistant
44
description: Run OpenClaw Assistant (OpenClaw-compatible) as a Home Assistant add-on.
55
url: https://github.com/techartdev/OpenClawHomeAssistant

openclaw_assistant/run.sh

Lines changed: 43 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -492,10 +492,7 @@ shutdown() {
492492
wait "${GW_PID}" || true
493493
fi
494494

495-
if [ -n "${GW_RELAY_PID}" ] && kill -0 "${GW_RELAY_PID}" >/dev/null 2>&1; then
496-
kill -TERM "${GW_RELAY_PID}" >/dev/null 2>&1 || true
497-
wait "${GW_RELAY_PID}" || true
498-
fi
495+
stop_gw_relay
499496

500497
if [ "$CLEAN_LOCKS_ON_EXIT" = "true" ]; then
501498
cleanup_session_locks || true
@@ -811,25 +808,27 @@ PY
811808
return 0
812809
}
813810

814-
if ! start_openclaw_runtime; then
815-
exit 1
816-
fi
817-
818-
# --- Loopback relay for tailnet bind mode (issue #90) ---
811+
# --- Loopback relay helpers for tailnet bind mode (issue #90) ---
819812
# When gateway.bind=tailnet the gateway only listens on the Tailscale IP.
820813
# The local CLI always tries ws://127.0.0.1:PORT and fails with
821814
# "Gateway not running" even though the gateway is healthy.
822-
# A lightweight Node.js relay (loopback-only) forwards those connections
823-
# to the Tailscale IP so terminal CLI commands work normally.
824-
# Token auth is still enforced end-to-end by the gateway.
825-
if [ "$GATEWAY_BIND_MODE" = "tailnet" ]; then
826-
TAILSCALE_IP=$(ip -4 addr show tailscale0 2>/dev/null \
815+
# These functions start/stop a lightweight Node.js TCP relay on
816+
# 127.0.0.1:PORT -> TAILSCALE_IP:PORT so terminal CLI commands work.
817+
# IMPORTANT: stop_gw_relay must be called before restarting the gateway;
818+
# otherwise the relay holds the loopback port and the new gateway instance
819+
# detects it as "already listening" and exits with code 1.
820+
start_gw_relay() {
821+
if [ "$GATEWAY_BIND_MODE" != "tailnet" ]; then
822+
return 0
823+
fi
824+
local ts_ip
825+
ts_ip=$(ip -4 addr show tailscale0 2>/dev/null \
827826
| awk '/inet /{gsub(/\/.*/,"",$2); print $2; exit}' || true)
828-
if [[ "${TAILSCALE_IP:-}" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
829-
echo "INFO: Starting loopback relay for tailnet gateway (127.0.0.1:${GATEWAY_PORT} -> ${TAILSCALE_IP}:${GATEWAY_PORT})"
827+
if [[ "${ts_ip:-}" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
828+
echo "INFO: Starting loopback relay for tailnet gateway (127.0.0.1:${GATEWAY_PORT} -> ${ts_ip}:${GATEWAY_PORT})"
830829
node -e "
831830
const net = require('net');
832-
const TARGET_HOST = '${TAILSCALE_IP}';
831+
const TARGET_HOST = '${ts_ip}';
833832
const TARGET_PORT = ${GATEWAY_PORT};
834833
const server = net.createServer(function(c) {
835834
const t = net.createConnection(TARGET_PORT, TARGET_HOST);
@@ -844,8 +843,22 @@ server.listen(TARGET_PORT, '127.0.0.1');" &
844843
echo "WARN: tailnet bind mode active but Tailscale IP not found on tailscale0 interface."
845844
echo "WARN: Terminal CLI may show gateway as unreachable. Ensure Tailscale is running and restart."
846845
fi
846+
}
847+
848+
stop_gw_relay() {
849+
if [ -n "${GW_RELAY_PID}" ] && kill -0 "${GW_RELAY_PID}" >/dev/null 2>&1; then
850+
kill -TERM "${GW_RELAY_PID}" >/dev/null 2>&1 || true
851+
wait "${GW_RELAY_PID}" 2>/dev/null || true
852+
GW_RELAY_PID=""
853+
fi
854+
}
855+
856+
if ! start_openclaw_runtime; then
857+
exit 1
847858
fi
848859

860+
start_gw_relay
861+
849862
# Start web terminal (optional)
850863
TTYD_PID_FILE="/var/run/openclaw-ttyd.pid"
851864

@@ -967,12 +980,13 @@ while true; do
967980
fi
968981

969982
# Detect agent/user-initiated self-restart (e.g. 'openclaw gateway restart').
970-
# When the gateway restarts itself, the old PID exits but a new process immediately
971-
# binds the same port. Without this check the supervisor would spawn a second
972-
# instance, which fails with "already listening" and loops forever.
973-
# Give the new process a moment to start, then re-track it instead of spawning a duplicate.
983+
# 'openclaw gateway run' spawns 'openclaw-gateway' as the actual long-running
984+
# daemon; the launcher wrapper exits immediately. The old pattern '.*run' never
985+
# matched the live daemon name, so the supervisor always fell through to the
986+
# restart path, hit the gateway still on the port, and looped forever.
987+
# Use the broader pattern that matches both 'openclaw-gateway' and 'openclaw node run'.
974988
sleep 1
975-
RESTARTED_PID=$(pgrep -f "openclaw.*(gateway|node).*run" 2>/dev/null | head -1 || true)
989+
RESTARTED_PID=$(pgrep -f "openclaw.*(gateway|node)" 2>/dev/null | head -1 || true)
976990
if [ -n "$RESTARTED_PID" ] && [ "$RESTARTED_PID" != "$GW_PID" ]; then
977991
echo "INFO: OpenClaw runtime restarted itself (new PID $RESTARTED_PID); re-tracking."
978992
GW_PID="$RESTARTED_PID"
@@ -982,8 +996,15 @@ while true; do
982996
echo "WARN: OpenClaw runtime exited with code ${GW_EXIT_CODE}. Restarting in 2s..."
983997
sleep 2
984998

999+
# Stop the loopback relay BEFORE restarting the gateway (tailnet mode only).
1000+
# The relay holds 127.0.0.1:GATEWAY_PORT — leaving it up causes the new gateway
1001+
# to detect the port as occupied and exit with code 1, re-entering the loop.
1002+
stop_gw_relay
1003+
9851004
if ! start_openclaw_runtime; then
9861005
echo "ERROR: Failed to restart OpenClaw runtime; retrying in 5s..."
9871006
sleep 5
1007+
else
1008+
start_gw_relay
9881009
fi
9891010
done

0 commit comments

Comments
 (0)