Skip to content

feat: add fabric session protocol with topic remapping and HAL integration#41

Draft
cpunt wants to merge 67 commits into
mainfrom
fabric-protocol
Draft

feat: add fabric session protocol with topic remapping and HAL integration#41
cpunt wants to merge 67 commits into
mainfrom
fabric-protocol

Conversation

@cpunt
Copy link
Copy Markdown

@cpunt cpunt commented Apr 13, 2026

Summary

  • Adds the Fabric JSON-lines session protocol used by the CM5/MCU UART link.
  • Implements the v1 hello/hello_ack, ping/pong, pub/call/reply, retained export replay, and static topic remap surface described in docs/fabric.md.
  • Wires Fabric into the reactor on the proto_1 UART path with bounded export draining and host-side protocol coverage.

Why

This is the base transport/protocol layer needed before update traffic can move over Fabric. The branch is deliberately limited to the link/session contract and static topic routing.

Testing

  • go test ./...

cpunt added 29 commits April 2, 2026 09:29
Add the fabric service for CM5/MCU communication over UART. This
implements the v1 JSON-lines protocol from fabric.md:

- Session state machine with hello/hello_ack handshake
- CM5-driven ping/pong heartbeat with 45s stale timeout
- Incoming pub/call/unretain dispatch with import rules
- Outgoing export replay gated on peer handshake
- Outgoing wire-call support for remote RPC
- Pending call tracking with timeout and correlation
- Ping guard: drop pings when link is not up
- Structured session logging (log/logKV)
- Build-tag-gated transport trace (fabric_trace)
- ShmringTransport for TinyGo cooperative scheduler
- RWTransport for host testing with buffered io
Add static topic remapping rules for import/export between the CM5 bus
and the MCU wire protocol. Import rules map incoming pub/call topics to
local bus addresses. Export rules forward local HAL state to the peer.

Add config bridge that translates config/device into config/hal format,
normalizing Lua empty-table encoding ({} -> []) for Go unmarshalling.
Register the fabric service in main.go and add a standalone
fabric-test command for host-side protocol testing.

Tune UART ring sizes for the pico_bb_proto_1 hardware setup:
- TX shmring: 512 -> 2048 bytes (prevents export replay overflow)
- RX shmring: 32 -> 256 bytes (prevents edge notification misses)

Add resources_host.go stub so fabric tests compile on host.
Update host helpers (fmtx, strconvx) used by fabric wire encoding.
- Replace all bare println() in session handlers with s.log()/s.logKV()
  so every log line includes the session ID
- Include actual error detail in JSON unmarshal failure logs
- Fix writeLine() indentation (cosmetic, logic was correct)
- Document timing constant relationships and interdependencies
- Document postHelloAckSettle as TinyGo scheduler constraint
- Document SID-change handling asymmetry vs Lua side
- Document hardcoded import rules as intentional v1 scope
- Document wireCaps as forward-compatibility stub
When a local retained publish is cleared (retain=true, payload=nil),
the export drain now sends a wireUnretain instead of a wirePub. This
lets the CM5 clear the corresponding retained topic on its side.
Bus queue length increased from 3 to 16 so retained state replay on
export subscribe does not overflow. With 3, only the last 3 of ~10
retained hal/cap topics survived, giving CM5 an incomplete snapshot.

Nil out call.sub after unsubscribe in drainPendingCalls so that a
subsequent writeLine failure -> handleLinkDown -> teardownPendingCalls
does not close the same subscription channel twice (panic).

Document rpc/hal/read_state import rule as a placeholder with no
handler — CM5 calls to this endpoint will timeout until implemented.
Increase bus queue from 16 to 32. pico_bb_proto_1 publishes ~26
retained topics across env/power domains (info + status + value per
capability). Queue of 16 still overflowed during export subscribe
replay.

Remove rpc/hal/read_state import call rule — no handler subscribes
to this topic, so CM5 calls would silently timeout. The export call
rule now routes rpc/hal/dump (matching the bridge handler).
Define msgHello, msgHelloAck, msgPing, msgPong, msgPub, msgUnretain,
msgCall, msgReply constants in wire.go. Replace all raw string literals
in dispatch, noteRx, and wire struct construction throughout session.go.
Rename remoteNode -> peerNode and RemoteID -> PeerNode in the
link state payload to match the Lua side's peer_* naming convention.
Update JSON tag from remote_id to peer_node.
…Waiting on start

Unmarshal incoming frames once in dispatch via wireMsg union struct
instead of double-parsing (wireType + per-handler unmarshal). markRx()
is now called once in dispatch rather than in every handler.

Remove helloSeen flag — logWaiting now checks peerSID != "" which is
already managed by notePeerIdentity and cleared on link down.

Remove redundant logWaiting() call at session start — the waitTick
prints the same message after 2 seconds.
Move the link-up check from individual handlers (onPing, onPub,
onUnretain, onCall) into dispatch. Only hello and hello_ack are
accepted before the link is established — they are the handshake.
All other message types (ping, pong, pub, unretain, call, reply)
are dropped with a single log line if the link is not up.

This removes scattered precondition checks from 4 handlers and
ensures pong/reply are also rejected before handshake, which they
previously were not.
Add validateInbound() to gate all non-handshake messages on link state
and session ID match. Removes SID-change teardown logic from onPing
and onPong — mismatched SIDs are now dropped at the dispatch level.

Add statusReady/statusOpening/statusDown constants for link state
payload strings.

Fix test helper unlockExports to use the CM5's SID (matching peerSID)
rather than the MCU's ack SID.
Exports are now enabled immediately in promoteLink when the link
transitions to up, instead of being deferred and re-triggered by
every incoming message handler.

Removes enableExports(), exportWaitFallback, exportWaitUntil, and
all exportTrigger* constants. The fallback timer and per-handler
arming were unnecessary complexity — exports should start when the
session is established, not when arbitrary messages arrive.
Handlers no longer return bool — dispatch always resets the stale
timer after any received frame (even malformed ones indicate the
peer is alive). This removes the meaningless return true/false from
all 8 handlers and simplifies the dispatch/handler contract.
Rename single-letter variable t to localTopic in onPub, onUnretain,
and onCall for clarity. Add statusReady/statusOpening/statusDown
constants for link state payload strings.
…Error

Inline drainOutboundNew and drainOutboundPending into drainOutbound
with comments separating the two phases (forward new calls, expire
pending calls).

Extract 50ms magic number into exportTickInterval const.

Optimise checkBusError: try types.ErrorReply type assertion first
(zero alloc) before falling back to JSON marshal/unmarshal for
ad-hoc error structs.

Add comment explaining why writeLine returns true for oversized
frames (transport is still healthy, session continues).
Rename wire.go to protocol.go to match the Lua side (protocol.lua).
Rename all wire-prefixed structs and functions: wireHello -> protoHello,
wireMsg -> protoMsg, wireType -> protoType, wireImportRule -> importRule,
wireImport -> importMatch, etc.

Also renames session.writeLine to session.sendFrame to match Lua's
send_frame and avoid confusion with Transport.WriteLine.
The const was set to true during fabric debugging to reduce console
noise. This silenced all power/thermal/charger/HAL logging at compile
time. Set to false to restore original logging behaviour. The const
and its guards can be fully removed in a follow-up cleanup.
Remove the handshakeOnlyOutput const and all 17 guard blocks that
silenced production logging. Restore original logging behaviour.
Remove stale buildTag const. Rename fabricSessionWaitLogEvery to
fabricWaitLogInterval.
The bridge was a separate goroutine handling config/device translation
and rpc/hal/dump replies independently of the session lifecycle. This
was not in the design spec (fabric.md) and ran even when no session
was active.

Config handling is now inline in onPub: when config/device arrives, the
session normalizes Lua empty tables, validates the HAL config, and
publishes to config/hal directly. Config state (apply count, errors)
is tracked on the session struct.

The rpc/hal/dump handler is now inline in onCall: the session builds
the reply directly using its cached HAL state and config tracking,
with no bus round-trip.

- Delete bridge.go and bridge_test.go
- Create config.go with decodeHALConfig and helpers (moved from bridge)
- Change import rule: config/device → config/hal (was config/device)
- Remove rpc/hal/dump from import call rules (handled directly)
- Remove RunBridge and bridgeConn from main.go
- Add halStateSub to session for HAL state caching
- Rewrite bridge tests as session integration tests
…nly on dump

Move dumpReply, inboundCall, outboundCall, readResult, linkStatePayload
types and tConfigHAL/dumpCallTopic vars to the top of the file with
other declarations.

Remove drainHALState from the 50ms export tick — it only needs to run
when a dump call arrives, not every tick.
Replace the persistent halStateSub subscription and drainHALState
polling loop with a one-shot subscribe/read/unsubscribe in onCall
when a dump is requested. The hal/state topic is retained on the bus,
so the subscribe immediately delivers the latest value.

Removes halStateSub and lastHALState from the session struct.
Implements the MCU-side receive path for binary file/firmware transfer
over fabric. Uses chunked streaming with CRC32 per-chunk and SHA256
whole-file verification. The RP2350 sink delegates to ab-bringup/abupdate
for A/B slot writes and reboot.
@cpunt cpunt marked this pull request as draft April 13, 2026 11:23
cpunt added 30 commits April 16, 2026 15:35
main.go already logs the partition at boot and hal/dump queries it
directly. No need for a cached copy on the session struct.
Build-tagged beginTransfer() function replaces the factory interface
and struct. Tests override via a function field on the session.
Unmarshal just the type header first, then dispatch into the correct
typed struct via a generic helper. Each handler now receives its own
protocol type instead of a shared superset union. Removes protoMsg
and validateInbound.
gofmt also realigns session struct fields.
Unused by CM5 tooling; abupdate import pulls an RP2350-specific
dependency into board-agnostic fabric code. Boot-time log in main.go
remains.
Diagnostic added while chasing mid-frame byte-drop truncation; that
bug is fixed and data_head + seq/off/data_len/n give enough context
for any re-occurrence.
Hardcoded version string rots and has no consumer; slot reporting
belongs on the bus as telemetry (hal/cap/*), not as a board-specific
import in board-agnostic main.go.
Keep the runtime.GC() drop (deliberate stop-the-world every 2 s
corrupted in-flight transfers), but revert the one-site switch to
println. Rest of codebase uses log.Println in hot paths; single-site
divergence is noise without addressing the underlying concern.
TX drains at 115200 baud (~185 ms for 2048 B) and MCU does not
produce back-to-back max-size frames, so sizing TX to exactly
maxLineLen is principled and saves 2 KiB SRAM. RX stays at 4 KiB
to match the uartx software ring.
Trimming to 2048 coincided with an MCU→CM5 decode_failed during
hardware transfer test. Theoretical derivation said 2048 should be
enough, but empirical beats analytical — revert to match RX.
rxBytesTotal + 64 KiB quantum existed only to emit a liveness
heartbeat; fabric-layer xfer_chunk logging covers the same signal.
Keep rxRingFull — that is the back-pressure alarm and is not
duplicated elsewhere.
Describes the trigger (count changed) honestly and drops the awkward
IfDue idiom.
Per maintainer guidance (buffering should live in protocol/client
layer, not the HAL): rp2350TransferSink.stage (4 KiB, flash-sector
aligned) and the fabric session line queue are the real protocol
buffers. Shmring goes back to the 512 B serial_raw default.

The runtime.GC() drop in emitMemSnapshot stays — direct empirical
cause of byte drops per FABRIC_TRANSFER_FIX.md.
- Inline readyNext (one-shot call)
- Inline textPreview (only called from error logs; removed data_head)
- Drop infoPayload dead error branch (json.Marshal of 2-uint32 can't fail)
- Compact the six xfer_need error blocks (keep seq/off/data_len, drop data_head noise)
- Route sink.Abort + clearTransfer through abortTransfer for consistent logging
- Rename rp2350TransferSink/rp2350TransferStageSize -> transferSinkImpl/stageSize

Net 58 lines removed, wire behaviour unchanged, tests pass.
Implements W1, W2, W4, W8 from docs/firmware-alignment-protocol.md:

- xxhash32 port at x/xxhash/ with 4 KAT vectors (empty/a/abc/123456789)
- Frame discriminator t -> type; reply {corr,payload} -> {id,ok,value,err}
- Transfer wire fields renamed (id->xfer_id, etc.); xfer_chunk shape
  reduced to {xfer_id,offset,data}; checksum moves to xfer_begin/commit
- Wire integrity SHA-256 -> xxHash32 hex (no algorithm field)
- xfer_need.next is byte offset (not seq)
- maxLineLen 2048 -> 4096 (covers chunk_size=2048 + base64url + envelope)
- protoXferBegin.Meta preserved as opaque RawMessage (transfer_mgr passes
  meta.receiver through)
- onTransferChunk aborts on every chunk-level fault (matches Lua
  transfer_mgr.lua: unexpected_offset / decode_failed / empty_chunk /
  size_overflow / sink errors all clear active transfer + send xfer_abort)
- RP2350 default sink now refuses transfers (signed-image receiver lands
  in fabric-update); direct abupdate flashing gated behind flash_unsafe
  build tag
- FABRIC_TRANSFER_FIX GC fix preserved with regression-guard comment

Out of scope (deferred): W3 link config + idle-chunk watchdog, W5 3-lane
writer, W6 active ping/session_reset/bounded helper, W7 UART role swap.
- New LinkConfig{ChunkSize, PhaseTimeout} threaded via fabric.Run with
  release defaults from bigbox-v1-cm-2.json (2048 / 15s); zero-value
  config falls back to defaults via applyDefaults so direct session{}
  test construction stays safe.
- incomingTransfer.deadline armed on xfer_begin accept and refreshed on
  every accepted chunk, mirroring transfer_mgr.lua.
- New checkTransferTimeout fires from the existing 50ms drain tick;
  on expiry it aborts the local sink and emits xfer_abort{err="timeout"}
  to match Lua's clear_active('timeout') + outbound abort.
- Reactor caller now passes fabric.DefaultLinkConfig().
- Test fabric_test.go callers updated to pass DefaultLinkConfig().
- New TestTransferIdleChunkWatchdog with PhaseTimeout=100ms.
LinkConfig grows three fields with release defaults pulled from
bigbox-v1-cm-2.json `service.fabric.links.<id>`:
- PingInterval (10s)
- LivenessTimeout (30s) — replaces hardcoded staleTimeout=45s
- MaxInboundHelpers (64) — Lua's `max_pending_calls` fallback

Session lifecycle changes mirror session_ctl.lua / rpc_bridge.lua at
update-migration tip:
- Active outbound ping cadence: tickPing fires from the existing 50ms
  drain tick; sends `ping` and resets nextPingAt = now + PingInterval
  unconditionally (no TX-activity dependency). nextPingAt is armed in
  promoteLink so the first ping fires PingInterval after link-up.
- Stale timer now runs on cfg.LivenessTimeout (was 45s).
- Pending outbound calls now fail with err="session_reset" on peer SID
  change (renamed reasonPeerSessionChanged -> reasonSessionReset);
  matches rpc_bridge.lua's fail_pending(pending, 'session_reset').
- onCall now enforces capacity: if len(inboundCalls) >=
  cfg.MaxInboundHelpers, reply {ok=false, err="busy"} before route
  resolution. Mirrors rpc_bridge.lua's spawn_local_call_helper.

New tests:
- TestSessionPingsUnconditionally: 3 pings within 500ms at PingInterval=150ms
- TestInboundCallBusyAtCapacity: second concurrent call hits busy with
  MaxInboundHelpers=1 (uses test-scoped importCallRules entry so both
  calls actually route).
- TestCallExportPeerReset updated for the new session_reset string.

The receiver-half of "imported retained facts unretained on session-gen
change" is L2/fabric-update territory (it lives in the bridge's
imported-fact cache), so out of scope here. The pending-call cancel +
xfer abort halves of the session-reset semantics are now in place.
Mirrors src/services/fabric/writer.lua at update-migration tip. Frame
priority class follows protocol.lua's FRAME_CLASS:
  control: hello, hello_ack, ping, pong, xfer_{begin,ready,need,commit,
           done,abort}
  rpc:     pub, unretain, call, reply
  bulk:    xfer_chunk (MCU does not originate; bulk lane is wired in
           for symmetry but currently unused on MCU)

Implementation notes:
- new writer.go: txLane FIFO + enqueueFrame(lane, data) + flushWriter
  (drains controlQ fully, then weighted RR between rpcQ and bulkQ).
- session.go: per-lane buffers on session; sendFrame replaced by
  sendControl + sendRPC wrappers at every call site (transfer.go's
  xfer_* senders likewise).
- LinkConfig grows RPCQuantum (default 4) and BulkQuantum (default 1),
  matching writer.lua's release tuning. applyDefaults sets them.
- flushWriter floors zero quantums to 1 defensively so unit tests that
  construct session{} directly (without applyDefaults) still make
  forward progress instead of spinning the outer loop.

Test:
- TestWriterControlPreemptsRPCAndBulk pre-loads all 3 lanes and asserts
  the drain order: 2 control, then 4 rpc, 1 bulk, 1 rpc, 1 bulk.

The MCU's actual outbound traffic profile (1 frame per drain tick from
the existing single producer) doesn't currently exercise the RR
fairness — but the structure is in place for fabric-update's retained
state publishers, which will queue rpc-lane frames concurrently with
control-lane xfer_need / ping.
bigbox-v1-cm-2.json binds the CM5-facing fabric link to uart0; mirror
that on the MCU. Atomic cutover — no dual-run.

reactor.go (production / !qa_reactor):
- uart0 now carries fabric (was: legacy telemetry JSON).
- uart1 now carries the log mirror via log.SetUART1 (was: fabric link).
- OnCharger / OnBattery / OnTempDeciC / emitMemSnapshot strip their
  inline JSONWriter blocks; FSM state updates and human-readable log
  lines stay.
- humidSub / evSub handlers in Run drop their JSON branches.
- jsonOut / droppedUART0Bytes fields and jsonWrite helper deleted.

Retained-state publishers in fabric-update will replace the old
JSON-over-uart0 telemetry. qa_reactor.go (//go:build qa_reactor) is
intentionally unchanged — it remains the hardware bring-up path with
uart0=telemetry / uart1=log.

Build sizes (pico_bb_proto_1):
  default     : code 282892 -> 280388 (-2504 B)
  flash_unsafe: code 287028 -> 284500 (-2528 B)
W6 — gate Ready on rpc_ready, unretain imported facts on session reset:

- Track imported retained local topics (s.importedRetained). onPub
  appends, onUnretain removes; trackImportedRetain dedups.
- New teardownImportedRetained: nil-payload retained publish on every
  tracked topic. Called from promoteLink (session-reset path) and from
  handleLinkDown. Mirrors rpc_bridge.lua's invalidate_imported_retained
  on generation bump.
- New rpcReady flag, gated on the post-handshake export holdoff. New
  tickReady fires from the 50ms drain tick once exportReadyAt elapses,
  sets rpcReady=true, and republishes link state so consumers observe
  the ready edge. linkStatePayload.Ready is now `linkUp && rpcReady`;
  currentStatus() likewise returns "ready" only when rpcReady. Mirrors
  session_ctl.lua + rpc_bridge.lua's `ready == established and rpc_ready`.

W7 — debug uart policy gate:

- New debug_uart build tag. debug_uart_release.go (no tag) provides a
  no-op debugUARTLog stub; debug_uart_dev.go (with tag) opens uart1 and
  routes log.SetUART1 through the existing shmring drop-on-overflow
  mirror. reactor.go's Run uses the helper, so the uart1 lifecycle is
  compiled out of release builds entirely.
- log.SetUART1 path's existing TryWriteFrom drop-on-full is the
  rate-limit; documented in the dev-build comment.

New tests:
- TestReadyHeldUntilExportHoldoff asserts the Established+!Ready edge
  precedes Ready=true.
- TestSessionResetUnretainsImports forces a session-gen bump (new SID
  hello) and asserts a nil-payload retained publish lands on the
  imported topic.

Build sizes (pico_bb_proto_1):
  release default: code 280388 -> 279892 (-496 B; uart1 code stripped)
  release unsafe : code 284500 -> 284012 (-488 B)
  dev (debug_uart): code 281644 (uart1 code retained)
Codex review caught a parity bug: the previous fix untracked the local
topic on any non-retained pub arriving via the same import rule. Lua's
rpc_bridge.lua only mutates imported_retained on retain set/clear, and
the Go bus likewise only clears retained storage on an explicit
retained-nil publish. So a stale retained value could survive a
session reset because the tracking entry had been silently dropped.

Fix: drop the else-untrack branch in onPub. Untracking only happens
on onUnretain (the explicit retain clear).

Regression test:
- TestSessionResetUnretainsImportsAfterTransientPub adds a temp import
  rule, sequences retain=true → retain=false on the same topic, then
  forces a session reset and asserts the nil-payload retained edge
  still lands on the imported subscriber. Verified to fail with the
  buggy else-untrack restored.
The hardcoded "cm5-local" peer-id in fabric.Run never matched what any
CM5 actually sends. mcu-dev.json and bigbox-v1-cm-2.json both publish
the link with `"node_id": "cm5"`, and session_ctl.lua sends that exact
string in the hello frame's `node` field. The MCU's onHello rejects
helos when `msg.Node != s.peerID`, so every hello was being dropped
with "wrong node" before promoteLink could fire.

The DEVICECODE_NODE_ID env var the dev command line was setting is not
read by any Lua code — it's a no-op on the CM5 side, so changing it
there does nothing. Aligning the MCU literal to the CM5 canon is the
correct fix.

Tests / fabric_test.go / cmd/fabric-test keep their internal "cm5-local"
pairings — they're self-consistent harness fixtures and don't talk to
a real CM5.
Hardware bring-up against a real CM5 surfaced every inbound frame
arriving as "malformed" with err empty:

  malformed frame dropped line_len 74 line_head
  {"sid":"…","node":"cm5","type":"hello"} err

The empty err is dispatch's "protoType returned empty" path. Standard
Go's json.Unmarshal happily extracts Type from this envelope (every
fabric_test.go exercises identical shapes); TinyGo's reflect path
silently leaves Type as the zero value when the anonymous-struct
target has the {Type string `json:"type"`} layout and the JSON has
preceding sibling keys. The send side (json.Marshal) is unaffected
and the named-struct decode in typedDispatch is also unaffected — only
the wire-discriminator probe was broken in production.

Replacing protoType with a manual byte scan that finds the first
top-level "type":"…" pair. The heuristic guard ("type must be preceded
by {, ',' or whitespace") rejects matches inside string values; we
trust the wire to use one of the well-known msg* constants for the
value, none of which contain escapes.

Tests still pass; the change does not alter typedDispatch's named-type
unmarshal, only the type probe.
Codex review of e445862 caught two issues:

1. The previous protoType heuristic ("preceded by '{', ',' or whitespace")
   would mis-route any envelope whose payload happened to contain a
   "type" key, since it scanned the whole line for the first match
   regardless of nesting. Example shape:

       {"payload":{"type":"x"},"type":"pub"}

   …would dispatch as "x" rather than "pub". Replacing the scanner
   with a depth-aware top-level walker:
   - skipJSONString honours backslash escapes
   - skipJSONContainer balances {/[/}/] while ignoring brace-like
     bytes inside string values
   - skipJSONValue dispatches by leading byte (string / container /
     literal-or-number)
   - protoType walks ONLY the top-level key/value pairs and returns
     the first depth-1 "type" string it sees

   New tests in TestWireTypeIgnoresNestedTypeKeys cover the nested
   payload, nested meta, type-as-array-element, type-as-substring, and
   the real-CM5 hello shape that was the original regression.

2. cmd/fabric-test/main.go didn't build after the W3 LinkConfig signature
   change and still hard-coded "cm5-local" as the peer id. Updated to
   pass fabric.DefaultLinkConfig() and to use the canonical "cm5"
   identity (matches the production reactor + every CM5 config).

Verified:
- go test ./...                                                   pass
- tinygo build -target=pico2 -tags pico_bb_proto_1                pass
- tinygo build -target=pico2 -tags 'pico_bb_proto_1 flash_unsafe' pass
- tinygo build -target=pico2 -tags 'pico_bb_proto_1 debug_uart'   pass (unchanged)
- tinygo build -target=pico2 -tags 'qa_reactor pico_bb_proto_1'   pass (unchanged)
- tinygo build -target=pico2 -tags pico_bb_proto_1 ./cmd/fabric-test  pass
Hardware report after 7523093 still shows malformed-frame drops on
exact same shape, just with `node` first instead of `sid` first. My
host test (TestProtoTypeExactFailingInput, exact bytes from log)
passes under standard Go, so either:

  (a) TinyGo is taking a different code path through the same scanner
  (b) the binary on hardware isn't the one we think it is

Debug instrumentation:

- Every bail point in protoType now emits "[fabric-debug] protoType
  bail at <where>" with the index, total length, and a 96-byte head.
  The names are `no_opening_brace`, `eof_after_brace`,
  `close_before_type`, `non_quote_at_key_start`,
  `scanstring_key_failed`, `missing_colon_after_key`,
  `eof_after_colon`, `type_value_not_string`,
  `scanstring_value_failed`, `skipvalue_failed`. After reflash, the
  hardware log will pinpoint which one fires.

- Session "run start" log now includes `build_tag=protoTypeScanV3`.
  If we don't see that string in the boot log after picotool load,
  the wrong .elf is on the device.

Both will be removed once the bug is diagnosed.
W7 (fc3b62d) moved fabric from uart1 to uart0 on the MCU. On the
proto_1 hardware the CM5 link is physically wired to GP4/GP5
(uart1) — not GP0/GP1 (uart0) — so the post-W7 image was listening
on the wrong pins and seeing zero RX.

Earlier hardware logs that showed "malformed frame dropped" were
the pre-W7 image still booting from slot A (because every flash
landed on slot B at the same `--major 10` and lost the version
tie-break). When `--major 11` finally let slot B win, fabric moved
to uart0 and the harness wires were left behind.

Until the harness moves, the MCU side has to mirror the wires:
- fabric stays on uart1
- the legacy CM5 telemetry JSON rip-out from W7 is preserved
- the optional debug log mirror moves uart0
- doc plan W7 wording updated to reflect this hardware reality;
  CM5-side `bigbox-v1-cm-2.json` continues to bind its own end as
  `uart-0` (`/dev/ttyAMA0`) — the labels on the two sides are
  independent.

A future hardware revision can swap the wires and undo this; the
wire schema is unaffected.

Plan/doc: docs/firmware-alignment-protocol.md sections describing
the UART role swap and the W7 acceptance updated to match.
Hardware handshake confirmed working in practice (hello rx /
hello_ack tx / exports enabled / ping rx + pong tx all flowing on
both sides), so the diagnostic prints from f8d02ba are no longer
needed.

Removed:
- debugBail() and per-bail println at every protoType failure path
- the strconvx import in protocol.go (no longer used here)
- the build_tag=protoTypeScanV3 suffix on the session "run start" log

protoType's depth-aware logic is unchanged; this is purely
log-noise removal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant