supervisord — the per-container process supervisor for Weka. It loads a
supervisord-style config, brings the configured processes up by priority, keeps
them up (restart / backoff / fatal policy), and answers JSON-RPC over a unix
socket. The wekanode D client, supervisorctl, and QA tooling all drive it
through that socket.
The binary is named supervisord for drop-in compatibility with the container
bootstrap (weka_init runs /usr/local/bin/supervisord -n -c <conf>) and with
the supervisorctl / wekastop / bridge / eventlistener argv routing.
Weka previously ran a fork of supervisord,
but the dependency upon Python and the slow runtime was deemed to be an issue.
We explored a fork of ochinchina/supervisord,
which was the Python supervisord ported to Go with mutexes bolted on. That fork
produced a string of concurrency bugs — a concurrent-map-writes crash in the event
registry, a re-entrancy deadlock in the restart-on-test-request listener, and a
boot-time hang where the daemon went silent after dispatching the first
PROCESS_STATE_EXITED and never started weka-management. All three share one
root cause: mutable state shared across goroutines behind locks that were too
coarse, taken in the wrong order, or held across blocking I/O.
This is a clean-room rewrite that makes that bug class structurally hard to reintroduce: the orchestrator owns all mutable state in one goroutine, so there are no shared maps or lock-ordering hazards. (Inter-goroutine sends use bounded channels, so a cycle is not literally impossible under an extreme event storm mid-handler — but the waiter-continuation below removes the specific boot-hang, and the buffers are sized for real loads.)
This is also much more idiomatic for Go, and follows the paradigm of sharing state by communicating, rather than communicating by sharing state.
(As an aside, running "supervisorctl status" on the Python code could take ~600ms. The ochinchina Go port dramatically reduced that to about 10ms. This port reduces that substantially further to approximately 2ms. These improvements should yield quicker service start times, and be noticeably more responsive when querying service state.)
This port provides more fidelity to the Python supervisor in its outputs and syntax for groups, which should substantially reduce friction in using it.
One orchestrator goroutine owns all mutable state (the process table,
autorecovery ref-count, config snapshot, listener registry, event serials).
Everything else is a goroutine that talks to it only over channels. There are no
sync.Mutex/RWMutex, no shared maps, no atomics on shared state.
| Goroutine | Count | Owns |
|---|---|---|
| Orchestrator | 1 | all mutable state; policy / state-machine decisions |
| Process supervisor | 1 per program instance | one OS process: spawn, startsecs timer, Wait(), stop sequence, retry/backoff |
| Output-capture pump | 2 per running program | child stdout/stderr → log writer (+ PROCESS_LOG_* events) |
| Event-listener pump | 1 per eventlistener | childutils stdin/stdout protocol, bounded queue, drop-on-full |
| RPC server | 1 accept + 1 per in-flight request | decode JSON-RPC → message → encode reply |
The orchestrator never does blocking I/O, exec, or cmd.Wait() — it
selects and delegates blocking work to subordinate goroutines. Process
supervisors report facts (Spawned / Exited / StopComplete / SpawnError);
the orchestrator decides policy.
Waiter-continuation. A wait=true RPC (startProcess, stopProcessGroup, …)
does not block the orchestrator: it records the request's reply channel as a
pending waiter against the process, fires the command, and returns to its select
loop. When the matching state event arrives, the waiter is resolved and the reply
sent. The only goroutine that blocks is the per-request server goroutine. This is
the structural cure for the boot-hang.
The state machine itself is a functional core (internal/procsup): pure
methods that take an event and return a list of Actions, with the imperative
shell (timers, exec, signals) kept separate — so the full restart/backoff/stop
policy is unit-testable without spawning a single process.
One transport: JSON-RPC 2.0 over the unix socket (/tmp/supervisor.sock,
POST /weka/jrpc). The dialect matches the wekanode D client
(weka/cluster/supervisor_client.d): {jsonrpc,method,params,id} →
{jsonrpc,result|error,id}, positional params, application/json, HTML-escaping
off, no trailing newline. XML-RPC is gone entirely — every consumer
(supervisorctl, the bridge, the test-request-restart listener) is our own code
and speaks JSON-RPC. The bridge subcommand still emits the QA-facing
{"faultCode":N,"faultString":"…"} JSON shape so existing QA tooling is unchanged.
As with the ochinchina port, this port does not support the interactive REPL mode for supervisorctl. There seems to be very little use case for that.
The single binary dispatches on argv[0] / first subcommand:
supervisord -n -c <conf>— run the daemon.supervisorctl …(orsupervisord ctl …) — JSON-RPC client:status,start,stop,restart,tail [-f] <proc> [stdout|stderr] [bytes],maintail [-f] [bytes](daemon log),help.wekastop(orsupervisord wekastop) — disable autorecovery, stop the named processes, re-enable — so an operator stop isn't fought by the restart policy.supervisord bridge <method> [json-args]— one-shot JSON-RPC client for QA tooling; prints the raw result JSON, or{"faultCode":N,"faultString":"…"}on error (the shape QA'swepy/explore/supervisor.pyparses).supervisord eventlistener --logger— childutils event listener that writes capturedPROCESS_LOG_*output to its logfile.supervisord eventlistener --test-req-restart— the restart-on-test-request listener (disables AR at startup, restarts exit-64 nodes, re-enables on SIGTERM).
CGO_ENABLED=0 go build . # → ./supervisord (one static binary)Requires Go 1.26+. The module has zero external dependencies (stdlib only), so the build is fully offline/hermetic.
Lint/test gate (run at every step):
gofmt -l . && go vet ./... && golangci-lint run ./... && go test ./...golangci-lint runs the modernize suite (see .golangci.yml) so new code stays
on current Go idioms.
main.go argv[0] + subcommand routing; daemon wiring (log, listeners, signals)
ctl.go supervisorctl JSON-RPC client (status/start/stop/restart/tail/maintail/help)
bridge.go one-shot bridge subcommand (raw result / faultCode-faultString JSON)
eventlistener.go --logger and --test-req-restart event-listener subcommands + wekastop
internal/proc process State enum (UPPERCASE statenames) + ProcessInfo wire struct
internal/config ini parse + interpolation + numprocs expansion + NONE→/dev/null + defaults
internal/procsup per-process state machine (functional core) + exec runner + supervisor shell + output capture
internal/orchestrator THE actor: select loop, process table (keyed group:name), RPC ops, event fan-out, waiter-continuation, autorecovery
internal/event supervisord 3.0 event wire encoding (headers/bodies) + subscription hierarchy
internal/listener per-listener childutils pump (bounded, drop-on-full) + manager (spawn/respawn, SUPERVISOR_* env)
internal/logio size-rotating log writer (NONE→/dev/null), backs daemon + per-process logs
internal/jrpc lightweight JSON-RPC 2.0 (request/response/error, positional params, no HTML-escape)
internal/rpc unix-socket server, /weka/jrpc handler, fault→faultCode/faultString mapping
All phases are complete: the rewrite is the binary shipped in the Weka containers
(pinned via SUPERVISORD_COMMIT in build/packages{,2}/supervisord/pack.py) and
validated end-to-end on a live cluster.
| Phase | Scope | State |
|---|---|---|
| 1 | Config + state machine + JSON-RPC core + supervisorctl |
done — daemon boots, autostarts by priority, status/start/stop/restart work end-to-end |
| 2 | D-client path (weka.*, idempotent start, stopAllIoNodes) |
done — wekanode D client drives the daemon |
| 3 | Events + --logger listener + crash-collector wiring |
done — output capture → output.log; PROCESS_STATE_* to listeners; slow listener never stalls process management |
| 4 | Weka autorecovery extensions + --test-req-restart + wekastop |
done — AR ref-count/boundary gate, exit-64 restart, the three original-fork bugs gone by construction |
| 5 | bridge / QA parity + compat symlinks |
done — bridge faultCode/faultString shape; supervisorctl/wekastop symlinks |
| 6 | Protocol-container parity + NDU soak | done — clusterized + validated across container types on a live cluster |
UPPERCASE statenames · startsecs <= 0 ⇒ immediate RUNNING (no hang) ·
event dispatch fully decoupled from process management · SUPERVISOR_* env
injection for listeners · bridge faultCode/faultString JSON shape ·
ALREADY_STARTED=60 / BAD_NAME=10 preserved through to the bridge ·
autorecovery ref-count seeded to 1 at boot · exit-64 restarts even when
autorecovery is off · NONE logfile → /dev/null · JSON-RPC start is idempotent.