Skip to content

runtime: make Slack bridge restart policy adaptive (backoff/jitter/failure threshold) #121

@benvinegar

Description

@benvinegar

Problem

start.sh runs the Slack bridge in a while true loop with a fixed 5s restart delay. Persistent faults can produce noisy restart storms and ambiguous operational state.

Proposed solution

  • Replace fixed-delay infinite restart with adaptive policy:
    • exponential backoff (+ jitter)
    • reset backoff after stable run window
    • max-consecutive-failure threshold with clear alert/status signal
  • Emit structured restart reason/attempt logs for easier triage.
  • Keep current behavior as fallback if policy env vars are unset.

Helpful context

  • start.sh launches tmux session slack-bridge and restarts bridge with hardcoded 5s sleep.
  • Broker health file exists (broker-health.json) but restart policy is still simplistic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions