Skip to content

weka/weka-supervisor

Repository files navigation

weka-supervisor

supervisord — the per-container process supervisor for Weka. It loads a supervisord-style config, brings the configured processes up by priority, keeps them up (restart / backoff / fatal policy), and answers JSON-RPC over a unix socket. The wekanode D client, supervisorctl, and QA tooling all drive it through that socket.

The binary is named supervisord for drop-in compatibility with the container bootstrap (weka_init runs /usr/local/bin/supervisord -n -c <conf>) and with the supervisorctl / wekastop / bridge / eventlistener argv routing.

Why this exists

Weka previously ran a fork of supervisord, but the dependency upon Python and the slow runtime was deemed to be an issue.

We explored a fork of ochinchina/supervisord, which was the Python supervisord ported to Go with mutexes bolted on. That fork produced a string of concurrency bugs — a concurrent-map-writes crash in the event registry, a re-entrancy deadlock in the restart-on-test-request listener, and a boot-time hang where the daemon went silent after dispatching the first PROCESS_STATE_EXITED and never started weka-management. All three share one root cause: mutable state shared across goroutines behind locks that were too coarse, taken in the wrong order, or held across blocking I/O.

This is a clean-room rewrite that makes that bug class structurally hard to reintroduce: the orchestrator owns all mutable state in one goroutine, so there are no shared maps or lock-ordering hazards. (Inter-goroutine sends use bounded channels, so a cycle is not literally impossible under an extreme event storm mid-handler — but the waiter-continuation below removes the specific boot-hang, and the buffers are sized for real loads.)

This is also much more idiomatic for Go, and follows the paradigm of sharing state by communicating, rather than communicating by sharing state.

(As an aside, running "supervisorctl status" on the Python code could take ~600ms. The ochinchina Go port dramatically reduced that to about 10ms. This port reduces that substantially further to approximately 2ms. These improvements should yield quicker service start times, and be noticeably more responsive when querying service state.)

This port provides more fidelity to the Python supervisor in its outputs and syntax for groups, which should substantially reduce friction in using it.

Architecture — single-owner orchestrator, channels everywhere

One orchestrator goroutine owns all mutable state (the process table, autorecovery ref-count, config snapshot, listener registry, event serials). Everything else is a goroutine that talks to it only over channels. There are no sync.Mutex/RWMutex, no shared maps, no atomics on shared state.

Goroutine Count Owns
Orchestrator 1 all mutable state; policy / state-machine decisions
Process supervisor 1 per program instance one OS process: spawn, startsecs timer, Wait(), stop sequence, retry/backoff
Output-capture pump 2 per running program child stdout/stderr → log writer (+ PROCESS_LOG_* events)
Event-listener pump 1 per eventlistener childutils stdin/stdout protocol, bounded queue, drop-on-full
RPC server 1 accept + 1 per in-flight request decode JSON-RPC → message → encode reply

The orchestrator never does blocking I/O, exec, or cmd.Wait() — it selects and delegates blocking work to subordinate goroutines. Process supervisors report facts (Spawned / Exited / StopComplete / SpawnError); the orchestrator decides policy.

Waiter-continuation. A wait=true RPC (startProcess, stopProcessGroup, …) does not block the orchestrator: it records the request's reply channel as a pending waiter against the process, fires the command, and returns to its select loop. When the matching state event arrives, the waiter is resolved and the reply sent. The only goroutine that blocks is the per-request server goroutine. This is the structural cure for the boot-hang.

The state machine itself is a functional core (internal/procsup): pure methods that take an event and return a list of Actions, with the imperative shell (timers, exec, signals) kept separate — so the full restart/backoff/stop policy is unit-testable without spawning a single process.

RPC: JSON-RPC 2.0 only

One transport: JSON-RPC 2.0 over the unix socket (/tmp/supervisor.sock, POST /weka/jrpc). The dialect matches the wekanode D client (weka/cluster/supervisor_client.d): {jsonrpc,method,params,id}{jsonrpc,result|error,id}, positional params, application/json, HTML-escaping off, no trailing newline. XML-RPC is gone entirely — every consumer (supervisorctl, the bridge, the test-request-restart listener) is our own code and speaks JSON-RPC. The bridge subcommand still emits the QA-facing {"faultCode":N,"faultString":"…"} JSON shape so existing QA tooling is unchanged.

No "REPL" mode

As with the ochinchina port, this port does not support the interactive REPL mode for supervisorctl. There seems to be very little use case for that.

Binary modes

The single binary dispatches on argv[0] / first subcommand:

  • supervisord -n -c <conf> — run the daemon.
  • supervisorctl … (or supervisord ctl …) — JSON-RPC client: status, start, stop, restart, tail [-f] <proc> [stdout|stderr] [bytes], maintail [-f] [bytes] (daemon log), help.
  • wekastop (or supervisord wekastop) — disable autorecovery, stop the named processes, re-enable — so an operator stop isn't fought by the restart policy.
  • supervisord bridge <method> [json-args] — one-shot JSON-RPC client for QA tooling; prints the raw result JSON, or {"faultCode":N,"faultString":"…"} on error (the shape QA's wepy/explore/supervisor.py parses).
  • supervisord eventlistener --logger — childutils event listener that writes captured PROCESS_LOG_* output to its logfile.
  • supervisord eventlistener --test-req-restart — the restart-on-test-request listener (disables AR at startup, restarts exit-64 nodes, re-enables on SIGTERM).

Build

CGO_ENABLED=0 go build .      # → ./supervisord (one static binary)

Requires Go 1.26+. The module has zero external dependencies (stdlib only), so the build is fully offline/hermetic.

Lint/test gate (run at every step):

gofmt -l . && go vet ./... && golangci-lint run ./... && go test ./...

golangci-lint runs the modernize suite (see .golangci.yml) so new code stays on current Go idioms.

Package layout

main.go                 argv[0] + subcommand routing; daemon wiring (log, listeners, signals)
ctl.go                  supervisorctl JSON-RPC client (status/start/stop/restart/tail/maintail/help)
bridge.go               one-shot bridge subcommand (raw result / faultCode-faultString JSON)
eventlistener.go        --logger and --test-req-restart event-listener subcommands + wekastop
internal/proc           process State enum (UPPERCASE statenames) + ProcessInfo wire struct
internal/config         ini parse + interpolation + numprocs expansion + NONE→/dev/null + defaults
internal/procsup        per-process state machine (functional core) + exec runner + supervisor shell + output capture
internal/orchestrator   THE actor: select loop, process table (keyed group:name), RPC ops, event fan-out, waiter-continuation, autorecovery
internal/event          supervisord 3.0 event wire encoding (headers/bodies) + subscription hierarchy
internal/listener       per-listener childutils pump (bounded, drop-on-full) + manager (spawn/respawn, SUPERVISOR_* env)
internal/logio          size-rotating log writer (NONE→/dev/null), backs daemon + per-process logs
internal/jrpc           lightweight JSON-RPC 2.0 (request/response/error, positional params, no HTML-escape)
internal/rpc            unix-socket server, /weka/jrpc handler, fault→faultCode/faultString mapping

Status

All phases are complete: the rewrite is the binary shipped in the Weka containers (pinned via SUPERVISORD_COMMIT in build/packages{,2}/supervisord/pack.py) and validated end-to-end on a live cluster.

Phase Scope State
1 Config + state machine + JSON-RPC core + supervisorctl done — daemon boots, autostarts by priority, status/start/stop/restart work end-to-end
2 D-client path (weka.*, idempotent start, stopAllIoNodes) done — wekanode D client drives the daemon
3 Events + --logger listener + crash-collector wiring done — output capture → output.log; PROCESS_STATE_* to listeners; slow listener never stalls process management
4 Weka autorecovery extensions + --test-req-restart + wekastop done — AR ref-count/boundary gate, exit-64 restart, the three original-fork bugs gone by construction
5 bridge / QA parity + compat symlinks done — bridge faultCode/faultString shape; supervisorctl/wekastop symlinks
6 Protocol-container parity + NDU soak done — clusterized + validated across container types on a live cluster

Compatibility invariants (baked in, regression-guarded)

UPPERCASE statenames · startsecs <= 0 ⇒ immediate RUNNING (no hang) · event dispatch fully decoupled from process management · SUPERVISOR_* env injection for listeners · bridge faultCode/faultString JSON shape · ALREADY_STARTED=60 / BAD_NAME=10 preserved through to the bridge · autorecovery ref-count seeded to 1 at boot · exit-64 restarts even when autorecovery is off · NONE logfile → /dev/null · JSON-RPC start is idempotent.

About

Pure, idiomatic, Go supervisor workalike, tuned for WEKA's use case

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages