Skip to content

fix silent RX drops: enlarge ring to 4 KiB, add drop counters#5

Closed
cpunt wants to merge 1 commit into
mainfrom
fix/rx-ring-silent-drops
Closed

fix silent RX drops: enlarge ring to 4 KiB, add drop counters#5
cpunt wants to merge 1 commit into
mainfrom
fix/rx-ring-silent-drops

Conversation

@cpunt
Copy link
Copy Markdown

@cpunt cpunt commented Apr 13, 2026

The issue

The software RX ring between the PL011 ISR and the foreground reader is
hard-coded to 128 bytes in uartx/ringbuffer.go:

const bufferSize uint8 = 128

At 115200 baud a byte arrives every ~87 µs, so 128 bytes is only
~11 ms of headroom between the interrupt producer and the foreground
consumer. Any stall longer than that on the consumer side causes
Buffer.Put() inside handleInterrupt to start returning false, at
which point bytes are silently discarded. The existing ISR code even
acknowledges the drop path:

if uart.Buffer.Put(byte(r & 0xFF)) {
    enq++
} else {
    // optional rxDrops++
}

…but the counter was never implemented, so applications had no way to
detect the loss. The symptom downstream is mid-frame data corruption
with no error path back to the caller.

How we hit it

We use this driver for a full-duplex fabric link between a CM5 host
and an RP2350 MCU at 115200 8N1, running a stop-and-wait firmware
update protocol that ships ~800-byte JSON chunks, writes each chunk
to flash on the MCU, acks, and waits for the next one.

The MCU side does a flash page program (5–10 ms) per accepted chunk,
plus occasional TinyGo GC cycles and periodic logger bursts. Any one
of those events pushed total foreground latency above the 11 ms
headroom and caused bytes to vanish from the middle of the next
incoming frame. Two surface symptoms:

  1. Single truncated frames — a JSON object arrived with a short
    data field (typically 76–96 bytes missing); sender and receiver
    both claimed the full chunk length.
  2. Merged frames — when the lost bytes happened to include the
    \n terminator, our line assembler kept reading past the broken
    frame and glued the sender's retry of the same seq onto the broken
    copy. We would see types=xfer_chunk,xfer_chunk seqs=N,N on the
    receive side.

Both classes disappeared the moment the RX ring was resized.

The fix

1) uartx/ringbuffer.go — widen and enlarge

  • bufferSize promoted from uint8 = 128 to uint16 = 4096.
  • head / tail switched from volatile.Register8 to
    volatile.Register16 so sizes above 255 work without wraparound
    ambiguity.
  • Size() and Used() now return uint16.

4096 is still a power of two so (h+1) % bufferSize compiles to a
bitmask. It gives ~355 ms of headroom at 115200 (scales linearly with
baud), which is enough to absorb any realistic foreground stall on
the MCUs this package currently targets. The extra RAM cost is
2 × (4096 − 128) = 7.75 KiB per UART instance (RX + TX ring); on a
dual-UART RP2350 config that is ~16 KiB total, negligible against the
520 KiB SRAM budget.

2) uartx/rp2_uart.go — drop counters

  • Two new fields on UART: rxHwDrops and rxSwDrops, each a
    volatile.Register32.
  • handleInterrupt increments rxHwDrops in the existing branch
    that drops PL011 error bytes (the most informative of OE/BE/PE/FE
    is UARTDR_OE, which means the hardware FIFO overflowed before the
    ISR could service it), and increments rxSwDrops when
    Buffer.Put returns false.
  • The ISR is the only writer so the non-atomic read-add-write
    sequence is race-free on a single core. On a dual-core MCU the
    reader may see a slightly stale value but aligned 32-bit
    loads/stores remain atomic.

3) uartx/uartx.go — public accessor

New (*UART).RXDrops() (hw, sw uint32) so applications can surface
the counters in whatever telemetry / logging they already have. In a
healthy deployment at the supported bauds both values should stay at
zero; a nonzero hw means the ISR latency is exceeding one character
time, and a nonzero sw means the application consumer is not
draining the ring fast enough.

Compat notes

  • RingBuffer.Size() and RingBuffer.Used() return type changed
    from uint8 to uint16. Every caller in this repo — including
    the Buffered() and TxFree() wrappers — already goes through
    int(...), so there is no in-repo break. External callers that
    explicitly stored the result as uint8 will need to widen the
    type. The package has no tagged release yet, so no semver break.
  • Total static RAM increase on a typical RP2350 dual-UART config is
    ~16 KiB. Trivial against the chip's 520 KiB SRAM.

Verification

  • tinygo build -target=pico2 ./cmd/uart_diag — builds cleanly with
    the uint16 widening.
  • tinygo build -target=pico2 ./examples/simple/uartx-polling
    builds cleanly; the example's Buffered() call is unaffected.
  • End-to-end downstream: a 336 KiB firmware-update transfer over
    115200 that used to corrupt 15–20 frames per run and take 60–180 s
    with retries now runs clean in ~55 s (stop-and-wait protocol floor)
    with rxHwDrops == rxSwDrops == 0 across the entire transfer.

Follow-ups (not in this PR)

  • Consider making bufferSize configurable via UARTConfig for
    callers who care about RAM more than headroom. That would require
    moving the ring from a fixed-size array to a slice and is a
    larger change.
  • Consider normalising Size() / Used() / Buffered() to all
    return int so the width is never a compat concern again.

The RX software ring between the PL011 ISR and the foreground reader
was hard-coded to 128 bytes. At 115200 baud that is only ~11 ms of
headroom, so any foreground stall longer than a flash page program or
a GC pause caused bytes to silently disappear mid-frame. The ISR had
a `// optional rxDrops++` placeholder in the overflow branch but no
actual counter, so applications had no way to observe the loss.

This commit:

  - widens the ring head/tail to uint16 and bumps bufferSize to 4096,
    giving ~355 ms of headroom at 115200. Still a power of two so the
    modulo compiles to an AND. Costs ~16 KiB of static RAM on a dual-
    UART RP2350 config; fits easily in the 520 KiB SRAM budget.

  - adds rxHwDrops and rxSwDrops counters on the UART struct and
    increments them from handleInterrupt when UARTDR arrives with
    per-byte error bits set (the OE bit indicates the hardware FIFO
    overflowed before the ISR serviced it) and when Buffer.Put returns
    false respectively. Both are volatile.Register32; the ISR is the
    only writer so increments are race-free on a single-core MCU, and
    aligned 32-bit loads/stores keep reads safe on dual-core.

  - exposes the counters via (*UART).RXDrops() so applications can
    surface them in telemetry. Both values should stay at zero under
    normal operation; nonzero hw indicates ISR latency exceeded one
    character time, nonzero sw indicates the application consumer is
    not draining the ring fast enough.

RingBuffer.Size() and RingBuffer.Used() now return uint16 instead of
uint8. Every caller in this repo already converts through int(), so
this is a no-op internally. External callers that stored the return
as uint8 will need to widen the type; there is no tagged release so
no semver break.

Verified:

  - tinygo build -target=pico2 ./cmd/uart_diag
  - tinygo build -target=pico2 ./examples/simple/uartx-polling
  - downstream: a 336 KiB firmware-update transfer over 115200 that
    used to corrupt 15-20 frames and take 60-180 s now runs clean in
    ~55 s (stop-and-wait floor) with rxHwDrops == rxSwDrops == 0.
@cpunt cpunt marked this pull request as draft April 13, 2026 17:19
cpunt added a commit to jangala-dev/devicecode-go that referenced this pull request Apr 14, 2026
The upstream tinygo-uartx software RX ring was hard-coded to 128
bytes — only ~11 ms of headroom at 115200 baud. Any reactor-side
stall longer than that (a flash page program, a TinyGo GC pause,
a periodic logger burst) caused bytes to be silently dropped at
the ISR's `Buffer.Put` and corrupted incoming `xfer_chunk` frames
mid-flight. This produced ~80-byte payload truncations and
duplicate-seq merged frames during firmware update transfers.

This commit:

  - Adds a replace directive in go.mod pointing at a vendored copy
    of tinygo-uartx that bumps the ring to 4 KiB with uint16 indices
    (~355 ms of headroom at 115200) and adds rxHwDrops / rxSwDrops
    counters on the UART struct. The replace stays in place until
    the upstream PR (jangala-dev/tinygo-uartx#5) lands and we can
    re-pin to a tagged release.

  - Plumbs the new uartx.RXDrops accessor through rp2SerialPort and
    surfaces it from serial_raw via a [serial-raw] log line that
    also reports cumulative RX bytes and shmring-full events. The
    line is throttled (1 Hz minimum, only on counter delta or 64
    KiB byte quantum) so it stays silent in steady state.

  - Logs the active A/B partition on the fabric session's
    "waiting for connection start" message so post-reboot debugging
    can confirm which slot the MCU is running.

  - Extends the existing decode_failed / size_mismatch xfer_chunk
    error logs with seq, off and data_len for clearer diagnostics
    on any future receive-side parse failure.

In a clean steady-state run all three drop counters
(ring_full, rx_hw_drops, rx_sw_drops) stay at zero across a full
~336 KiB firmware update transfer.
@cpunt
Copy link
Copy Markdown
Author

cpunt commented Apr 16, 2026

Closed as moved up in layers jangala-dev/devicecode-go#40

@cpunt cpunt closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant