Skip to content

dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send) #3231

@oneingan

Description

@oneingan

What happened

We hit a recurring dockerd panic while operating a Swarm cluster (panic inside swarmkit raft snapshot transport). When a manager needs to send a raft snapshot (observed while joining/promoting additional managers), the leader crashes with a slice-bounds panic in transport.splitSnapshotData.

The panic repeats on restart while the node is still participating in the raft cluster.

Environment

  • Docker Engine: 29.2.1 (linux)
  • SwarmKit module version from stack trace: github.com/moby/swarmkit/v2@v2.1.2-0.20251110192100-17b8d222e7dd
  • Linux (systemd)

Trigger / suspected conditions

  • Observed when adding/promoting additional managers (leader sending raft snapshot to a peer).
  • In our case the swarm state had grown quite large (we observed ~1858 swarm config objects at one point, mostly generated by automation; only a small fraction were referenced by services). We removed the unused configs, but still saw the panic until rebuilding the swarm from scratch.

Expected

Sending a snapshot to a peer should not crash the manager.

Actual

Leader panics and exits; systemd restarts dockerd; the panic may repeat.

Crash excerpt

panic: runtime error: slice bounds out of range [:9065848] with capacity 4871629
goroutine 161470 [running]:
github.com/moby/swarmkit/v2/manager/state/raft/transport.splitSnapshotData(...)
        github.com/moby/swarmkit/v2@v2.1.2-0.20251110192100-17b8d222e7dd/manager/state/raft/transport/peer.go:177 +0x265
github.com/moby/swarmkit/v2/manager/state/raft/transport.(*peer).sendProcessMessage(...)
        github.com/moby/swarmkit/v2@v2.1.2-0.20251110192100-17b8d222e7dd/manager/state/raft/transport/peer.go:246 +0x22f
github.com/moby/swarmkit/v2/manager/state/raft/transport.(*peer).run(...)
        github.com/moby/swarmkit/v2@v2.1.2-0.20251110192100-17b8d222e7dd/manager/state/raft/transport/peer.go:420 +0x1d4
created by github.com/moby/swarmkit/v2/manager/state/raft/transport.newPeer
        github.com/moby/swarmkit/v2@v2.1.2-0.20251110192100-17b8d222e7dd/manager/state/raft/transport/peer.go:63 +0x332

Workaround

Rebuilding the swarm from scratch avoided the crash for us.

If you’d like, I can try to reproduce in a minimal environment and provide more details (snapshot size, exact steps).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions