Skip to content

first-pass fabric protocol for TinyGo peer #37

@lePereT

Description

@lePereT

This describes the current intended behaviour of the devicecode fabric link, as implemented on the Lua side and intended for the matching TinyGo side.

This is still a first pass. The aim is a clean and reliable CM5 ↔ MCU control-plane link over UART, with room to evolve later.

The main design choices remain:

  • keep raw UART bytes out of the local in-process bus;
  • keep OS and UART ownership on the Lua side inside HAL;
  • make fabric the only service that knows about remote peers;
  • carry a small explicit protocol over a byte stream;
  • preserve useful bus semantics such as publish, retained state, unretain, and directed call/reply;
  • avoid pretending the remote side is just another in-process bus connection.

1. Big picture

On the Lua side there is one fabric service. It owns one session per configured link. For the MCU link, that session currently uses a UART transport.

HAL on the Lua side opens the UART and returns a Stream capability to fabric. fabric then reads and writes protocol messages on that stream.

On the TinyGo side, the matching component should be the peer session layer over the UART. It sits above a raw byte stream and below the MCU’s internal service/runtime environment.

So the UART link is:

  • Lua HAL owns UART fd / driver
  • Lua fabric speaks fabric protocol over the stream
  • TinyGo fabric speaks the same protocol over the raw UART
  • TinyGo fabric imports/exports messages to the MCU’s internal service world

The protocol meaning should remain transport-neutral even though, for now, it is carried over UART.


2. First-pass scope

Version 1 supports:

  • link bring-up and peer handshake
  • heartbeat
  • ordinary publish
  • retained publish
  • unretain
  • directed call / reply, equivalent to lane-B-style RPC proxying

Version 1 does not yet support:

  • distributed subscriptions
  • route advertisements
  • multi-hop mesh forwarding
  • firmware transfer
  • binary framing
  • on-wire authentication

3. Wire format

For the first implementation, the wire format is deliberately simple:

  • one compact JSON object per line
  • UTF-8 text
  • newline (\n) terminates one message

Conceptually:

{"t":"hello",...}\n
{"t":"pub",...}\n
{"t":"call",...}\n

Framing rule

Treat the UART as a byte stream. Accumulate bytes until newline, then decode that full line as one JSON message.

Encoding rule

Do not emit pretty-printed JSON. One compact JSON object per line only.

Practical note

JSON strings may contain escaped \n, but not literal frame-breaking newline bytes. A normal JSON encoder does the right thing.


4. Protocol version

The current protocol version is:

proto = 1

This is carried in handshake messages.

A peer should reject or ignore incompatible protocol versions rather than trying to continue with mismatched assumptions.


5. Message types

All messages are JSON objects with a required string field t.

5.1 hello

Example:

{"t":"hello","node":"cm5-local","peer":"mcu-1","sid":"9e3b...","proto":1,"caps":{"pub":true,"call":true}}

Fields:

  • t: "hello"
  • node: sender node id
  • peer: intended remote peer id
  • sid: sender session id
  • proto: protocol version
  • caps: high-level capability flags

Semantics

The sender is saying:

  • this is who I am (node)
  • this is who I think you are (peer)
  • this is my current session id (sid)
  • this is the protocol version I am speaking (proto)
  • these are the capability families I support (caps)

Receiver behaviour

On receiving hello:

  • validate the shape;
  • verify peer is acceptable for this device;
  • verify proto is supported;
  • record remote node;
  • record remote sid;
  • treat a changed remote sid as a fresh peer session;
  • send back hello_ack.

A fresh peer session means any pending call/reply correlation state tied to the previous peer session should be discarded.


5.2 hello_ack

Example:

{"t":"hello_ack","node":"mcu-1","sid":"a12f...","proto":1,"ok":true}

Fields:

  • t: "hello_ack"
  • node: sender node id
  • sid: sender session id
  • proto: protocol version
  • ok: boolean, currently expected to be true

Semantics

Acknowledges handshake and provides the sender’s own current session identity.

On receiving hello_ack, the peer should:

  • verify proto;
  • record remote node;
  • record remote sid;
  • treat a changed remote sid as a fresh peer session.

5.3 ping

Example:

{"t":"ping","ts":1712345678,"sid":"9e3b..."}

Fields:

  • t: "ping"
  • ts: sender timestamp, opaque in v1
  • sid: sender session id

Behaviour

Reply with pong.


5.4 pong

Example:

{"t":"pong","ts":1712345678,"sid":"a12f..."}

Fields:

  • t: "pong"
  • ts: opaque timestamp
  • sid: sender session id

Semantics

Heartbeat only. No strict clock semantics in v1.


5.5 pub

Example:

{"t":"pub","topic":["state","mcu","health"],"payload":{"ok":true},"retain":false}

Fields:

  • t: "pub"
  • topic: array of non-empty strings
  • payload: arbitrary JSON value
  • retain: boolean

Semantics

Publish one message into the peer’s import rules.

If retain is true, the receiver should treat this as retained state for the mapped local topic.

If retain is false, treat it as a transient publish.

Constraint

For v1, topic tokens on the wire are strings only.


5.6 unretain

Example:

{"t":"unretain","topic":["state","mcu","health"]}

Fields:

  • t: "unretain"
  • topic: array of non-empty strings

Semantics

Clear retained state for the mapped local topic.


5.7 call

Example:

{"t":"call","id":"f6a2...","topic":["rpc","hal","read_state"],"payload":{"ns":"config","key":"services"},"timeout_ms":5000}

Fields:

  • t: "call"
  • id: correlation id generated by caller
  • topic: concrete topic array for the remote directed call target
  • payload: arbitrary JSON value
  • timeout_ms: advisory timeout in milliseconds

Semantics

This is a directed request to the remote peer. The receiver should map topic through import-call rules, invoke the corresponding local handler, and send exactly one reply.

Important rule

call.topic must be concrete in v1. No wildcards.


5.8 reply

Success example:

{"t":"reply","corr":"f6a2...","ok":true,"payload":{"found":true,"data":"..."}}

Failure example:

{"t":"reply","corr":"f6a2...","ok":false,"err":"timeout"}

Fields:

  • t: "reply"
  • corr: correlation id matching a previous call.id
  • ok: boolean
  • payload: present when ok=true
  • err: string when ok=false

Semantics

Completes one pending call.

Exactly one reply should be emitted per accepted call.

If the receiver cannot route or execute the call, it should still reply with ok=false.


6. Topic model

Topics on the wire are JSON arrays of strings.

Examples:

  • ["state","mcu","health"]
  • ["rpc","hal","dump"]
  • ["config","device"]

Do not encode topics as slash-separated strings on the wire.


7. Topic remapping

Each side uses static configured remapping rules.

A rule is conceptually:

  • local pattern ↔ remote pattern

with wildcard support:

  • + means one token
  • # means the remaining tail

Example

Remote:

{ "state", "#" }

maps to local:

{ "peer", "mcu-1", "state", "#" }

So a remote publish:

{"t":"pub","topic":["state","net","link","wan0"], ...}

becomes locally:

{"peer","mcu-1","state","net","link","wan0"}

Recommendation for TinyGo

Mirror the same mechanism:

  • export rules for what the MCU may send out
  • import rules for what the MCU accepts from the CM5
  • proxy-call rules for directed RPC

Keep these static in v1.


8. Directed call mapping

There are two directions.

8.1 Lua local → TinyGo remote

Lua fabric binds local proxy endpoints. When called locally, Lua fabric sends:

{"t":"call","id":"...","topic":["rpc","mcu","reboot_to_bootloader"],"payload":{"reason":"update"},"timeout_ms":5000}

TinyGo fabric should:

  • map that topic to a local MCU service handler;
  • invoke it;
  • send back reply.

8.2 TinyGo local → Lua remote

TinyGo fabric may send a call to a configured remote target, for example:

{"t":"call","id":"...","topic":["rpc","hal","read_state"],"payload":{"ns":"config","key":"services"},"timeout_ms":5000}

Lua fabric maps that to a local call target and returns a reply.

Rule for both sides

If no route matches, send:

{"t":"reply","corr":"...","ok":false,"err":"no_route"}

Do not silently drop a call.


9. Retained state semantics

Retained state is simple in v1.

If retain=true on a pub, the receiver should treat that as the current retained value for the mapped topic.

If an unretain arrives, the receiver should clear the retained value for the mapped topic.

Reconnect behaviour

On link-up, the exporter should replay the current retained exported state.

On the Lua side this is implemented by watching retained lifecycle with replay enabled. The TinyGo side should follow the same model conceptually:

  • on session establishment, emit current retained exported state again;
  • later retained updates become pub(..., retain=true);
  • later retained removals become unretain.

10. Session state

The Lua side currently uses these session states:

  • opening
  • session_up
  • ready
  • down

Meaning:

  • opening: transport is up, handshake/local setup incomplete
  • session_up: peer session established, local forwarding surfaces not yet all installed
  • ready: peer session established and local forwarding surfaces installed
  • down: terminal failure for this link session

The TinyGo side does not need to mirror the exact names, but should have equivalent internal distinctions.

Suggested minimal state

At minimum:

  • link status
  • remote node id
  • local session id
  • remote session id
  • last hello seen
  • last heartbeat seen
  • pending outgoing calls by correlation id
  • import/export/proxy rule tables

11. Session replacement

A change in peer session id is significant.

If a peer sends hello or hello_ack with a different sid from the current recorded peer session, treat that as:

  • a fresh peer session;
  • reset pending call correlation state;
  • keep the transport up, but replace peer-session state.

Late replies from the old peer session should be dropped.


12. Error handling rules

These should be followed on both sides.

Invalid JSON line

  • log it;
  • discard it;
  • do not immediately bring the whole session down.

Oversize line

  • log it;
  • discard it;
  • count it as a bad frame.

Unknown t

  • log it;
  • ignore it.

Malformed message of known type

  • log it;
  • ignore it;
  • if it is recognisably a call and has a usable id, best effort reply with ok=false.

Call with no route

  • reply with ok=false, err="no_route".

Local handler failure

  • reply with ok=false, err="<reason>".

Timeout waiting for reply

  • local caller times out;
  • clear pending entry;
  • treat late reply as unknown and drop it.

Repeated bad frames

A few bad frames should not kill the session immediately. Repeated bad frames within a time window should.

Current Lua defaults are:

  • bad frame limit: 5
  • bad frame window: 30s

These are local policy defaults, not wire-level requirements.


13. Timeouts and liveness

Current Lua defaults are:

  • hello retry: 10s
  • idle ping interval: 15s
  • stale link timeout: 45s

Directed call timeout:

  • use timeout_ms if present and sensible;
  • otherwise use a local default, typically around 5s.

These are local policy defaults, not hard protocol guarantees.


14. Frame size limits

For v1, both sides should use a bounded maximum line size.

Current Lua default:

  • max_line_bytes = 4096

The TinyGo side should also enforce a fixed maximum line length and reject oversize input safely.

Do not allow unbounded line buffering.


15. Payload shape

Payloads are JSON values. In practice, use JSON objects for application-facing messages.

Do not put binary blobs directly into this v1 control-plane protocol. Firmware transfer is a later subprotocol.


16. Transport support

Current implemented transport on the Lua side is:

  • UART

Other transport kinds may appear later, but should not be assumed for current interop work.


17. TinyGo implementation structure

A sensible structure is:

UART transport

Responsible for:

  • reading bytes until newline
  • enforcing max line size
  • writing one JSON line plus newline

Session layer

Responsible for:

  • hello / hello_ack
  • ping / pong
  • peer session id tracking
  • pending call map
  • bad-frame budget
  • dispatch by t

Router

Responsible for:

  • applying import/export rules
  • mapping incoming pub to local topics
  • mapping incoming call to local handlers
  • forwarding local exported publishes to the wire
  • replaying retained exported state on session establishment

Local integration

Responsible for:

  • local publish
  • local retained update / clear
  • local directed call handling
  • local retained replay source

18. Deliberate non-feature in v1

Firmware transfer is not part of this first control protocol.

Once the control path is solid, a separate bulk-transfer subprotocol can be added with messages such as:

  • begin
  • ready
  • need
  • chunk
  • done
  • abort

That should not just be “a very large JSON message”.


19. Practical examples

19.1 CM5 announces retained config to MCU

{"t":"pub","topic":["config","device"],"payload":{"schema":"devicecode.mcu/1","rev":3,"data":{"mode":"normal"}},"retain":true}

19.2 MCU publishes retained health to CM5

{"t":"pub","topic":["state","mcu","health"],"payload":{"ok":true,"temp_c":41.2},"retain":true}

19.3 MCU clears retained health

{"t":"unretain","topic":["state","mcu","health"]}

19.4 CM5 calls remote MCU method

{"t":"call","id":"1234","topic":["rpc","mcu","reboot_to_bootloader"],"payload":{"reason":"update"},"timeout_ms":5000}

Reply:

{"t":"reply","corr":"1234","ok":true,"payload":{"accepted":true}}

20. Minimal implementation checklist

For the first working milestone, the TinyGo side should do all of this:

  • UART open
  • bounded line reader
  • send hello
  • respond to hello with hello_ack
  • record peer sid
  • reset pending state on peer sid change
  • respond to ping with pong
  • accept incoming pub
  • accept incoming unretain
  • send outgoing pub
  • send outgoing unretain
  • accept incoming call
  • return reply
  • send outgoing call
  • match incoming reply to pending map
  • apply static import/export/proxy rules
  • replay retained exported state on session establishment
  • log and safely drop malformed messages
  • close or reset session after repeated bad frames

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions