Skip to content

Commit 23a782b

Browse files
committed
Avoid returning early on agent join failures
When a gossip join failure happens do not return early in the call chain because a join failure is most likely transient and the retry logic built in the networkdb is going to retry and succeed. Returning early makes the initialization of ingress network/sandbox to not happen which causes a problem even after the gossip join on retry is successful. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
1 parent bf3d9cc commit 23a782b

2 files changed

Lines changed: 5 additions & 2 deletions

File tree

agent.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,8 +191,7 @@ func (c *controller) agentSetup() error {
191191

192192
if remoteAddr != "" {
193193
if err := c.agentJoin(remoteAddr); err != nil {
194-
logrus.Errorf("Error in agentJoin : %v", err)
195-
return nil
194+
logrus.Errorf("Error in joining gossip cluster : %v(join will be retried in background)", err)
196195
}
197196
}
198197

networkdb/cluster.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ func (nDB *NetworkDB) retryJoin(members []string, stop <-chan struct{}) {
161161
logrus.Errorf("Failed to join memberlist %s on retry: %v", members, err)
162162
continue
163163
}
164+
if err := nDB.sendNodeEvent(NodeEventTypeJoin); err != nil {
165+
logrus.Errorf("failed to send node join on retry: %v", err)
166+
continue
167+
}
164168
return
165169
case <-stop:
166170
return

0 commit comments

Comments
 (0)