Feat/abstract radio connection#246
Open
zindello wants to merge 8 commits into
Open
Conversation
HTTP server and web UI now always start regardless of radio hardware availability. RadioManager owns the full radio lifecycle — connection, exponential backoff retry, mid-run reconnect, and hardware cleanup — so the daemon is no longer blocked on hardware initialisation. - Add RadioManager: asyncio task with retry loop, status tracking, on_connected/on_disconnected callbacks, and /api/stats reporting - Refactor main.py: initialize() handles identity + sensors + GPS; _on_radio_connected() sets up dispatcher and all helpers; HTTP server starts before RadioManager so it is always reachable - Move LocalIdentity setup to initialize() so public key is available from boot, not only after first radio connection - Add engine.py async stop() for awaitable mid-run teardown - Remove _kiss_transport_restart_required() from config_manager; radio_type and KISS config changes now trigger automatic reconnect via notify_config_changed() instead of requiring a service restart - Fix SPI re-init bug: explicitly reset _initialized on SX1262Radio singleton after cleanup() so begin() is called on reconnect - Move CH341 and radio.cleanup() out of _shutdown() into RadioManager Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
"Repeater handler not initialized" was a code-level description that appeared in the UI and logs whenever storage endpoints were called before the radio connected. Replaced with "Radio not available — connecting or hardware unavailable" which reflects the actual state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pymc_core calls sys.exit() when GPIO hardware is unavailable, raising SystemExit (BaseException) which bypassed the except Exception handler in _connect_loop() and killed the process. Now caught alongside Exception so GPIO failure is treated as a normal connection error — RadioManager logs it, sets status to error, and retries with backoff. Required for KISS radio use on hardware without GPIO (e.g. x86 server). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…stemExit workaround KissModemWrapper now sets is_connected=False when worker threads die on I/O error (fixed in core), so _wait_for_disconnect() polls that attribute for radios that have it. Radios without is_connected (SX1262) fall back to the original asyncio event-wait — no polling needed as disconnect is signalled via the dispatcher exiting. Removes except (Exception, SystemExit) workaround now that core raises a catchable exception instead of calling sys.exit() on GPIO hardware failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…init CancelledError local_hash_bytes was not initialised to None in __init__ alongside local_hash and local_identity, risking AttributeError before initialize() completes. Moves self._current_radio = radio assignment to before the post-init config block so that _cleanup_radio() correctly finds and cleans up the hardware handle if CancelledError fires during post-init. Previously _current_radio was still None at that point, leaking the radio object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt cleanup() and configure_radio() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nnect_loop Consolidates three duplicated backoff state blocks into a single _enter_backoff() helper, and moves the post-init hasattr block into _apply_post_init_config(). _connect_loop shrinks from ~90 lines to ~45. All 20 RadioManager tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _enter_backoff on disconnect now passes "Radio disconnected" so get_status().error is never null/stale after a mid-run radio loss - notify_config_changed resets _retry_delay alongside _retry_count so get_status() is immediately consistent after a config save - Move get_radio_for_board import out of the while loop (cached by Python, but importing inside a loop is unusual) - ensure_future → create_task in _wait_for_disconnect (modern API) - _apply_post_init_config docstring now mentions the RF parameter logging it performs alongside event-loop and CAD threshold setup - Add retry_delay_seconds assertion to notify_config_changed test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Decouple radio hardware from service startup; add RadioManager
Requires pyMC-dev/pyMC_core#70 to be merged and released before this can be merged. That PR fixes
KissModemWrappersilent thread death andsys.exit()on GPIO failure — both of which this branch depends on for correct KISS disconnect detection and GPIO error recovery. It also fixes a missing endpoint in the sx1262 radio type for updating hardware config on-the-fly.The Problem
Before this change, the pyMC Repeater had a fundamental usability issue: if radio hardware
was unavailable, misconfigured, or failed, the entire service failed to start. The HTTP
server and web UI came up only if the radio initialised successfully. This meant:
service — no web UI, no status, no way to diagnose the problem without SSH access.
including the web UI. The user had no visibility and no way to recover without a restart.
The net result: users who hit any hardware problem were forced to SSH into the device and
configure via the CLI. For a device intended to be deployed and managed through its web
interface, this was less than ideal.
What Changed
RadioManager — radio hardware now has its own lifecycle
A new
RadioManagerclass owns the entire radio hardware lifecycle: connect, retry onfailure with exponential backoff (5s → 10s → 30s → 60s), report status, and reconnect
if the radio is lost mid-run. It runs as an asyncio background task and calls back into
the daemon when hardware comes up or goes away.
The HTTP server now starts unconditionally before any radio connection is attempted.
The web UI is always reachable. Radio hardware connects asynchronously in the background.
Automatic retry with backoff
If radio hardware is unavailable or fails to initialise, RadioManager retries
automatically. The user sees the current status through the web UI and
/api/statsrather than a blank page or a timeout.
Mid-run disconnect and reconnect
If the radio is lost while running (e.g. USB hardware failure, power blip on a KISS
modem), RadioManager detects the failure, tears down the dispatcher and all helpers
cleanly, and re-enters the retry loop. The HTTP server, GPS, and sensors keep running
throughout. When hardware comes back, everything reinitialises automatically.
Companion app connections survive a mid-run radio disconnect — the companion disconnects
when the radio goes away, and can reconnect and resume once the radio comes back.
Clearer error messaging
The "Repeater handler not initialized" error that appeared in the packet log when the
radio wasn't connected has been replaced with "Radio not available — connecting or
hardware unavailable", which tells the user what is actually happening.
Bugs Fixed
KISS radio disconnect not detected
When a KISS modem loses its serial connection,
KissModemWrapper._rx_workerand_tx_workerdie silently without settingis_connected = False. A paired fix inpymc_corenow setsis_connected = Falseon I/O failure.RadioManager._wait_for_disconnect()polls
radio.is_connectedevery second for radios that expose it, and falls back toasyncio event-wait for those that don't (SX1262).
CH341 singleton not reset on reconnect
The
CH341Asyncsingleton was not being reset on radio cleanup, causingsx1262_ch341radios to fail on reconnect. Fixed inRadioManager._cleanup_radio(). This has not been tested yet as I do not have CH341 hardware to testlocal_hash_bytesnot initialised in__init__local_hashandlocal_identitywere initialised toNonein__init__butlocal_hash_byteswas not, riskingAttributeErrorbeforeinitialize()completes.Radio hardware leaked on post-init
CancelledErrorself._current_radiowas assigned after the post-init config block. IfCancelledErrorfired during post-init (e.g. shutdown during startup),
_cleanup_radio()found_current_radio = Noneand skipped hardware teardown, leaking the radio handle.Dispatcher
Noneduring reconnect cyclesconfig_manager.live_update_daemon()was accessingdispatcherwithout a None guard.After RadioManager was introduced,
dispatcheris legitimatelyNoneduring reconnectcycles. Added guard to prevent
AttributeError.New API Surface
Adds a
radiokey to the/api/statsresponse. Additive only — no existing fieldschanged.
This field is not yet consumed by the web UI but is available for future use — dashboard
status indicators, external monitoring, companion app integration.