You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Despite the HostIP fix shipped in v0.3.6 (#15), the docker provisioner produces a container with NetworkSettings.Ports == {} when the released y-cluster binary is invoked via y-cluster provision -c <dir> from a bash script on a GitHub Actions ubuntu-latest runner. The container starts, k3s comes up internally, but no port forwards are published to the host, so the host-side /readyz probe added in v0.3.5 never resolves and kubectl --context=local get --raw=/readyz returns connection refused for the entire 60s deadline.
All three snippets below are from the same job run on the same ubuntu-latest instance, in chronological order. Timestamps preserved.
1. Plain docker run -p ... on the runner publishes all four bindings cleanly
This confirms the daemon is healthy, the ports are free, and there's no environmental obstruction. Run as a sanity check by the acceptance script before invoking y-cluster:
2026-05-01T19:58:06.297Z CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2026-05-01T19:58:06.297Z # Control: docker run -p with the four ystack port maps:
2026-05-01T19:58:06.319Z Unable to find image 'alpine:3.20' locally
2026-05-01T19:58:06.797Z Status: Downloaded newer image for alpine:3.20
2026-05-01T19:58:06.819Z 60e458c620bff00f330808dd6b6995e93a7021deaaabb37fbe78f696e2e1e34d
2026-05-01T19:58:06.981Z docker port yk-portbind-control:
2026-05-01T19:58:06.994Z 80/tcp -> 0.0.0.0:80
2026-05-01T19:58:06.994Z 80/tcp -> [::]:80
2026-05-01T19:58:06.994Z 443/tcp -> 0.0.0.0:443
2026-05-01T19:58:06.994Z 443/tcp -> [::]:443
2026-05-01T19:58:06.994Z 6443/tcp -> 0.0.0.0:6443
2026-05-01T19:58:06.994Z 6443/tcp -> [::]:6443
2026-05-01T19:58:06.994Z 8944/tcp -> 0.0.0.0:8944
2026-05-01T19:58:06.994Z 8944/tcp -> [::]:8944
2026-05-01T19:58:06.994Z docker inspect yk-portbind-control NetworkSettings.Ports:
2026-05-01T19:58:07.007Z {"443/tcp":[{"HostIp":"0.0.0.0","HostPort":"443"},{"HostIp":"::","HostPort":"443"}],
"6443/tcp":[{"HostIp":"0.0.0.0","HostPort":"6443"},{"HostIp":"::","HostPort":"6443"}],
"80/tcp":[{"HostIp":"0.0.0.0","HostPort":"80"},{"HostIp":"::","HostPort":"80"}],
"8944/tcp":[{"HostIp":"0.0.0.0","HostPort":"8944"},{"HostIp":"::","HostPort":"8944"}]}
The control container is removed immediately after.
2. y-cluster's container, started seconds later via y-cluster provision, has empty Ports
Same runner, same daemon, same k3s image. A backgrounded poller prints docker port local and docker inspect local --format '{{json .NetworkSettings.Ports}}' every 5 seconds while waitForHostAPIServer runs:
docker port local produces no output. NetworkSettings.Ports is the literal empty object {}. The container itself is up — docker ps shows local Up X seconds for every poll — but no ports are published to the host.
3. Inside the container, k3s is fully serving traffic
When waitForHostAPIServer times out, the captured container logs make clear the issue is purely the missing host bindings, not k3s being slow to come up. From the same run:
2026-05-01T19:50:33.0Z time="2026-05-01T19:50:32Z" level=info msg="Started tunnel to 172.17.0.2:6443"
2026-05-01T19:50:33.0Z time="2026-05-01T19:50:32Z" level=info msg="Stopped tunnel to 127.0.0.1:6443"
2026-05-01T19:50:33.0Z I0501 19:50:18.502 pod_startup_latency_tracker: kube-system/metrics-server-786d997795 ... podStartE2EDuration="15.5s" observedRunningTime
2026-05-01T19:50:33.0Z I0501 19:50:33.292 garbagecollector: ...
2026-05-01T19:50:33.0Z I0501 19:50:33.551 handler.go:304 Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
2026-05-01T19:50:33.0Z Error: wait for host apiserver: apiserver /readyz never returned 200 within 1m0s
on context "local": exit status 1: The connection to the server 127.0.0.1:6443 was
refused - did you specify the right host or port?
k3s reaches "metrics.k8s.io v1beta1 added to ResourceManager" inside the container. The host can't reach it because nothing is published.
What rules out an obvious fix
Hypothesis
Tested in
Result
HostIP value still wrong post v0.3.6
v0.3.6 with HostIP: netip.IPv4Unspecified() ("0.0.0.0")
NetworkSettings.Ports == {}
Privileged port collision (80, 443)
Two-port config: 6443:6443, 8944:8944 only (no privileged ports)
NetworkSettings.Ports == {}
Same host port as guest port
High-port config: 16443:6443, 18944:8944 (host ≠ guest, both unprivileged)
NetworkSettings.Ports == {}
Stale host-side listener
Pre-provision ss -lntp 'sport = :6443 or sport = :80 or sport = :443 or sport = :8944'
Empty
Stale docker container
Pre-provision docker ps -a
Empty (just header row)
Daemon can't bind these ports
Plain docker run -d -p 6443:6443 -p 80:80 -p 443:443 -p 8944:8944 alpine sleep 30 immediately before y-cluster
All four bindings publish to 0.0.0.0:* (snippet 1 above)
Every iteration above is from a CI run with full diagnostics; trying the next hypothesis produced no behaviour change in the failing case.
Yolean/ystack actions/runs/25245245101 — v0.3.6 + high host ports 16443:6443, 18944:8944 — Ports={} (probe error: connection refused on 127.0.0.1:16443)
Yolean/ystack actions/runs/25230550615 — earlier diagnostic run with the side-by-side docker run -p control + y-cluster container; control publishes all four bindings, y-cluster container has {}
What works
The same v0.3.6 binary publishes bindings cleanly in two other contexts:
Mac Docker Desktop, same local-docker/y-cluster-provision.yaml shape (verified locally with host:16443 guest:6443 + host:18944 guest:8944):
2026-05-02T07:54:54Z INFO docker/docker.go:149 starting docker apiPort=16443
2026-05-02T07:54:56Z INFO docker/docker.go:414 waiting for host apiserver
2026-05-02T07:54:59Z INFO docker/docker.go:223 k3s ready
2026-05-02T07:54:59Z INFO envoygateway/install.go:119 applying envoy-gateway install manifest
customresourcedefinition.apiextensions.k8s.io/backends.gateway.envoyproxy.io serverside-applied
…
Total time from starting docker to k3s ready: ~5 seconds.
go test -tags 'e2e,docker' -run TestDocker_ProvisionTeardown ./e2e/ on the same ubuntu-latest runner image, exercised by every PR. PR fix(provision/docker): set HostIP on PortBindings to bind on host #15 CI (y-cluster actions/runs/25244259094) was green against the very commit that became v0.3.6. That test asserts docker port <name> 8080/tcp shows :38080 on the SDK-created container, and the assertion passed.
So the SDK call shape produced by pkg/provision/docker/docker.go:buildHostConfigcan publish bindings on ubuntu-latest, just not in the path that goes:
Same buildHostConfig, same image, same Privileged: true + Tmpfs, same Engine 28, same runner image — but the request the daemon receives via the released-binary-from-bash path differs from the in-process go test path enough that Engine 28 silently drops the port bindings.
I haven't been able to instrument that from outside the binary. Hypotheses that would explain the asymmetry but I can't confirm from the runner side:
Docker API version negotiation. Does the released binary negotiate a different API version than the test-built one? The test binary is built from the same go.mod, so they pin the same github.com/moby/moby/client v0.4.1, but client-side version probing is environment-dependent.
Config.ExposedPorts absence.buildHostConfig sets HostConfig.PortBindings but ContainerCreate never sets Config.ExposedPorts. The Docker CLI (which works on this runner) auto-adds ExposedPorts when you pass -p. Engine 28 may have started treating the absence differently in some request shapes.
Some env var the SDK reads.DOCKER_API_VERSION, DOCKER_HOST, runner-set proxies — the bash invocation environment differs from go test's parent env in non-obvious ways.
Daemon-side capture of the POST /containers/create body for one working e2e run vs one failing ystack run would identify the differential field within minutes.
Asks
Daemon-side dockerd -D capture (or equivalent) of the POST /containers/create request body for:
One successful y-cluster TestDocker_ProvisionTeardown run on ubuntu-latest
One failing ystack acceptance run on ubuntu-latest
Diffing the two requests should reveal the field that Engine 28 treats differently.
If the differential turns out to be Config.ExposedPorts: setting Config.ExposedPorts[guestPort]={} in buildHostConfig alongside HostConfig.PortBindings[guestPort]=... (mirroring what the Docker CLI does for every -p) would likely resolve both paths. Trivial change, low risk, would also be a good defensive thing regardless of root cause.
Workaround for ystack: until the SDK path is fixed, ystack could shell out to docker create / docker start directly instead of going through the SDK. Less elegant but unblocks PR #76's e2e gate.
v0.3.6: added HostIP: netip.IPv4Unspecified() to PortBindings (fix(provision/docker): set HostIP on PortBindings to bind on host #15) — the previous code left HostIP as the zero netip.Addr ("invalid IP"), which moby v1.54+ marshals as "" in JSON. That fix made bindings work on Mac Docker Desktop and in the y-cluster e2e on ubuntu-latest, but evidently doesn't address whatever Engine 28 is dropping in the released-binary-from-bash path.
Summary
Despite the
HostIPfix shipped in v0.3.6 (#15), the docker provisioner produces a container withNetworkSettings.Ports == {}when the releasedy-clusterbinary is invoked viay-cluster provision -c <dir>from a bash script on a GitHub Actionsubuntu-latestrunner. The container starts, k3s comes up internally, but no port forwards are published to the host, so the host-side/readyzprobe added in v0.3.5 never resolves andkubectl --context=local get --raw=/readyzreturnsconnection refusedfor the entire 60s deadline.Environment
y-cluster_v0.3.6_linux_amd64, sha256576964a8825f23c56b633ea5cbc0b587d25931c17c462e0d77a4ae80553146ae)ubuntu-latest(Ubuntu 24.04 LTS, runner image20260413.86.1)ghcr.io/yolean/k3s:v1.35.4-rc3-k3s1Reproducer
Yolean/ystack PR #76,
e2e-clusterjob. The acceptance script does, in essence:against:
Log evidence
All three snippets below are from the same job run on the same
ubuntu-latestinstance, in chronological order. Timestamps preserved.1. Plain
docker run -p ...on the runner publishes all four bindings cleanlyThis confirms the daemon is healthy, the ports are free, and there's no environmental obstruction. Run as a sanity check by the acceptance script before invoking y-cluster:
The control container is removed immediately after.
2. y-cluster's container, started seconds later via
y-cluster provision, has emptyPortsSame runner, same daemon, same k3s image. A backgrounded poller prints
docker port localanddocker inspect local --format '{{json .NetworkSettings.Ports}}'every 5 seconds whilewaitForHostAPIServerruns:docker port localproduces no output.NetworkSettings.Portsis the literal empty object{}. The container itself is up —docker psshowslocal Up X secondsfor every poll — but no ports are published to the host.3. Inside the container, k3s is fully serving traffic
When
waitForHostAPIServertimes out, the captured container logs make clear the issue is purely the missing host bindings, not k3s being slow to come up. From the same run:k3s reaches "metrics.k8s.io v1beta1 added to ResourceManager" inside the container. The host can't reach it because nothing is published.
What rules out an obvious fix
HostIP: netip.IPv4Unspecified()("0.0.0.0")NetworkSettings.Ports == {}6443:6443, 8944:8944only (no privileged ports)NetworkSettings.Ports == {}16443:6443, 18944:8944(host ≠ guest, both unprivileged)NetworkSettings.Ports == {}ss -lntp 'sport = :6443 or sport = :80 or sport = :443 or sport = :8944'docker ps -adocker run -d -p 6443:6443 -p 80:80 -p 443:443 -p 8944:8944 alpine sleep 30immediately before y-cluster0.0.0.0:*(snippet 1 above)Every iteration above is from a CI run with full diagnostics; trying the next hypothesis produced no behaviour change in the failing case.
CI runs:
Ports={}Ports={}for all 12 pollsPorts={}16443:6443, 18944:8944—Ports={}(probe error:connection refused on 127.0.0.1:16443)docker run -pcontrol + y-cluster container; control publishes all four bindings, y-cluster container has{}What works
The same v0.3.6 binary publishes bindings cleanly in two other contexts:
Mac Docker Desktop, same
local-docker/y-cluster-provision.yamlshape (verified locally withhost:16443 guest:6443 + host:18944 guest:8944):Total time from
starting dockertok3s ready: ~5 seconds.go test -tags 'e2e,docker' -run TestDocker_ProvisionTeardown ./e2e/on the sameubuntu-latestrunner image, exercised by every PR. PR fix(provision/docker): set HostIP on PortBindings to bind on host #15 CI (y-cluster actions/runs/25244259094) was green against the very commit that became v0.3.6. That test assertsdocker port <name> 8080/tcpshows:38080on the SDK-created container, and the assertion passed.So the SDK call shape produced by
pkg/provision/docker/docker.go:buildHostConfigcan publish bindings onubuntu-latest, just not in the path that goes:What's left as the differential
Same
buildHostConfig, same image, samePrivileged: true + Tmpfs, same Engine 28, same runner image — but the request the daemon receives via the released-binary-from-bash path differs from the in-processgo testpath enough that Engine 28 silently drops the port bindings.I haven't been able to instrument that from outside the binary. Hypotheses that would explain the asymmetry but I can't confirm from the runner side:
go.mod, so they pin the samegithub.com/moby/moby/client v0.4.1, but client-side version probing is environment-dependent.Config.ExposedPortsabsence.buildHostConfigsetsHostConfig.PortBindingsbutContainerCreatenever setsConfig.ExposedPorts. The Docker CLI (which works on this runner) auto-addsExposedPortswhen you pass-p. Engine 28 may have started treating the absence differently in some request shapes.DOCKER_API_VERSION,DOCKER_HOST, runner-set proxies — the bash invocation environment differs fromgo test's parent env in non-obvious ways.Daemon-side capture of the
POST /containers/createbody for one working e2e run vs one failing ystack run would identify the differential field within minutes.Asks
Daemon-side
dockerd -Dcapture (or equivalent) of thePOST /containers/createrequest body for:TestDocker_ProvisionTeardownrun onubuntu-latestubuntu-latestDiffing the two requests should reveal the field that Engine 28 treats differently.
If the differential turns out to be
Config.ExposedPorts: settingConfig.ExposedPorts[guestPort]={}inbuildHostConfigalongsideHostConfig.PortBindings[guestPort]=...(mirroring what the Docker CLI does for every-p) would likely resolve both paths. Trivial change, low risk, would also be a good defensive thing regardless of root cause.Workaround for ystack: until the SDK path is fixed, ystack could shell out to
docker create/docker startdirectly instead of going through the SDK. Less elegant but unblocks PR #76's e2e gate.Background context
waitForHostAPIServerhost-side/readyzprobe (fix(provision/docker): probe host apiserver before declaring ready #12) — needed because the previous "k3s ready" log fired before the host port forward was actually reachable.HostIP: netip.IPv4Unspecified()to PortBindings (fix(provision/docker): set HostIP on PortBindings to bind on host #15) — the previous code leftHostIPas the zeronetip.Addr("invalid IP"), which moby v1.54+ marshals as""in JSON. That fix made bindings work on Mac Docker Desktop and in the y-cluster e2e onubuntu-latest, but evidently doesn't address whatever Engine 28 is dropping in the released-binary-from-bash path.