|
| 1 | +--- |
| 2 | +id: architecture |
| 3 | +title: Architecture |
| 4 | +slug: /reference/architecture |
| 5 | +--- |
| 6 | + |
| 7 | +Aster is a peer-to-peer RPC framework built on top of iroh. This document explains the full stack, from the decorators you write down to the QUIC packets on the wire. |
| 8 | + |
| 9 | +## The stack |
| 10 | + |
| 11 | +```mermaid |
| 12 | +graph TD |
| 13 | + subgraph PY["Python / TypeScript layer"] |
| 14 | + A["AsterServer, AsterClient, @service, @rpc<br/>ForyCodec, interceptors, trust/admission"] |
| 15 | + end |
| 16 | +
|
| 17 | + subgraph RS["Rust bindings (PyO3 / NAPI-RS)"] |
| 18 | + B["IrohNode, NetClient, BlobsClient,<br/>DocsClient, GossipClient, HookManager"] |
| 19 | + end |
| 20 | +
|
| 21 | + subgraph CORE["Core (aster_transport_core)"] |
| 22 | + C["CoreNode, CoreNetClient, CoreBlobsClient,<br/>CoreDocsClient, CoreGossipClient, AsterQueueHandler"] |
| 23 | + end |
| 24 | +
|
| 25 | + subgraph IROH["Iroh crates"] |
| 26 | + D["iroh (endpoint, router), iroh-blobs,<br/>iroh-docs, iroh-gossip"] |
| 27 | + end |
| 28 | +
|
| 29 | + subgraph NET["Network"] |
| 30 | + E["UDP, relay servers, NAT traversal"] |
| 31 | + end |
| 32 | +
|
| 33 | + PY -- "PyO3 / NAPI-RS async bridge" --> RS |
| 34 | + RS -- "Rust function calls" --> CORE |
| 35 | + CORE -- "async Rust (tokio)" --> IROH |
| 36 | + IROH -- "QUIC (quinn)" --> NET |
| 37 | +``` |
| 38 | + |
| 39 | +Each layer has a clear responsibility: |
| 40 | + |
| 41 | +- **Language layer** -- Application-facing API. Service definitions, RPC call/dispatch, serialization (Fory), interceptors, trust/admission logic. All pure Python/TypeScript except for the native bindings import. |
| 42 | +- **Rust bindings** -- Thin PyO3/NAPI-RS wrappers that expose Rust async functions as language-native awaitables. No business logic lives here. Each wrapper calls into core and bridges the async boundary. |
| 43 | +- **Core** -- The authoritative backend. `CoreNode` manages the iroh `Router`, which dispatches connections by ALPN. All real logic (node creation, blob operations, doc sync, gossip, connection management) lives here. |
| 44 | +- **Iroh crates** -- The underlying P2P networking stack. Provides QUIC endpoints with relay fallback, content-addressed blob storage, CRDT document sync, and gossip pub-sub. |
| 45 | + |
| 46 | + |
| 47 | +## The unified node |
| 48 | + |
| 49 | +The central design decision in Aster is that everything runs through a single iroh endpoint. When `AsterServer.start()` creates the node, it registers all protocol handlers on one `Router`: |
| 50 | + |
| 51 | +```mermaid |
| 52 | +graph TD |
| 53 | + NODE["IrohNode"] |
| 54 | + ROUTER["Router"] |
| 55 | +
|
| 56 | + NODE --> ROUTER |
| 57 | +
|
| 58 | + ROUTER --> BLOBS["iroh-blobs<br/><code>BLOBS_ALPN</code>"] |
| 59 | + ROUTER --> DOCS["iroh-docs<br/><code>DOCS_ALPN</code>"] |
| 60 | + ROUTER --> GOSSIP["iroh-gossip<br/><code>GOSSIP_ALPN</code>"] |
| 61 | + ROUTER --> RPC["Aster RPC<br/><code>aster/1</code>"] |
| 62 | + ROUTER --> CA["Consumer Admission<br/><code>aster.consumer_admission</code>"] |
| 63 | + ROUTER --> PA["Producer Admission<br/><code>aster.producer_admission</code>"] |
| 64 | +
|
| 65 | + RPC --> QH1["AsterQueueHandler"] |
| 66 | + CA --> QH2["AsterQueueHandler"] |
| 67 | + PA --> QH3["AsterQueueHandler"] |
| 68 | +``` |
| 69 | + |
| 70 | +### How the Router works |
| 71 | + |
| 72 | +Iroh's `Router` is the central connection dispatcher. When a peer opens a QUIC connection, the ALPN (Application-Layer Protocol Negotiation) string in the TLS handshake determines which `ProtocolHandler` receives the connection. |
| 73 | + |
| 74 | +Built-in iroh protocols register their own handlers: |
| 75 | +- `iroh-blobs` registers for `BLOBS_ALPN` (blob transfer protocol) |
| 76 | +- `iroh-docs` registers for `DOCS_ALPN` (document sync protocol) |
| 77 | +- `iroh-gossip` registers for `GOSSIP_ALPN` (gossip broadcast protocol) |
| 78 | + |
| 79 | +### AsterQueueHandler |
| 80 | + |
| 81 | +Custom Aster ALPNs (`aster/1`, `aster.consumer_admission`, `aster.producer_admission`) are each registered as an `AsterQueueHandler` instance. This is a thin `ProtocolHandler` implementation that does one thing: when a connection arrives, it places `(alpn, connection)` on a bounded tokio mpsc channel and returns. |
| 82 | + |
| 83 | +All `AsterQueueHandler` instances share a single sender, so from the consumer's perspective there is one unified stream of incoming connections. The language layer calls `node.accept_aster()` to pull from this channel: |
| 84 | + |
| 85 | +```rust |
| 86 | +// Rust core (simplified) |
| 87 | +struct AsterQueueHandler { |
| 88 | + alpn: Vec<u8>, |
| 89 | + tx: mpsc::Sender<(Vec<u8>, Connection)>, |
| 90 | +} |
| 91 | + |
| 92 | +impl ProtocolHandler for AsterQueueHandler { |
| 93 | + async fn accept(&self, conn: Connection) -> Result<()> { |
| 94 | + self.tx.send((self.alpn.clone(), conn)).await?; |
| 95 | + Ok(()) |
| 96 | + } |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +The Python accept loop in `AsterServer._accept_loop()` then dispatches based on the ALPN: |
| 101 | + |
| 102 | +```python |
| 103 | +# Python (simplified) |
| 104 | +while True: |
| 105 | + alpn, conn = await self._node.accept_aster() |
| 106 | + if alpn == b"aster/1": |
| 107 | + asyncio.create_task(self._server.handle_connection(conn)) |
| 108 | + elif alpn == b"aster.consumer_admission": |
| 109 | + asyncio.create_task(handle_consumer_admission(conn, ...)) |
| 110 | + elif alpn == b"aster.producer_admission": |
| 111 | + asyncio.create_task(handle_producer_admission(conn, ...)) |
| 112 | +``` |
| 113 | + |
| 114 | +### Why one endpoint matters |
| 115 | + |
| 116 | +A single endpoint means: |
| 117 | +- **One node ID** -- peers only need one address to reach you. The same `NodeAddr` works for RPC calls, blob downloads, doc sync, and gossip. |
| 118 | +- **One relay connection** -- iroh maintains a persistent connection to a relay server for NAT traversal. One endpoint = one relay connection, not N. |
| 119 | +- **Connect once, use any protocol** -- a peer that has been admitted via consumer admission can subsequently call RPC methods, fetch blobs, sync docs, and join gossip topics. No additional connections needed. |
| 120 | +- **Gate 0 applies to everything** -- the connection-level admission hook gates ALL protocols. You cannot bypass RPC admission by connecting on the blobs ALPN. |
| 121 | + |
| 122 | + |
| 123 | +## The async bridge |
| 124 | + |
| 125 | +### Language bindings to Rust |
| 126 | + |
| 127 | +The PyO3 module init starts a single shared tokio runtime. Every async Rust function exposed to Python uses `pyo3-async-runtimes::tokio::future_into_py` to convert a Rust `Future` into a Python awaitable. TypeScript uses NAPI-RS with a similar pattern. |
| 128 | + |
| 129 | +```mermaid |
| 130 | +sequenceDiagram |
| 131 | + participant Py as Python asyncio |
| 132 | + participant PyO3 as PyO3 wrapper |
| 133 | + participant Tokio as Tokio runtime |
| 134 | + participant Iroh as Iroh (Rust) |
| 135 | +
|
| 136 | + Py->>PyO3: await node.accept_aster() |
| 137 | + PyO3->>Tokio: future_into_py(async { ... }) |
| 138 | + Tokio->>Iroh: poll Rust future |
| 139 | + Iroh-->>Tokio: connection arrives |
| 140 | + Tokio-->>PyO3: result ready |
| 141 | + PyO3-->>Py: return to event loop |
| 142 | +``` |
| 143 | + |
| 144 | +This means: |
| 145 | +- Python's `asyncio` event loop (or Node.js's event loop) drives the application. |
| 146 | +- Rust's tokio runtime executes the actual I/O. |
| 147 | +- The bridge converts between them without blocking either. |
| 148 | +- Python tests use `asyncio_mode = "auto"` (pytest-asyncio) so every `async def test_*` just works. |
| 149 | + |
| 150 | +### The FFI story for other languages |
| 151 | + |
| 152 | +The core crate is designed to support multiple language bindings: |
| 153 | + |
| 154 | +```mermaid |
| 155 | +graph TD |
| 156 | + CORE["core/src/lib.rs<br/>(tokio-async API)"] |
| 157 | +
|
| 158 | + CORE --> PYO3["PyO3 bridge<br/>future_into_py"] |
| 159 | + CORE --> NAPI["NAPI-RS bridge<br/>napi async"] |
| 160 | + CORE --> CFFI["C FFI (ffi/)<br/>runtime.spawn +<br/>completion event queue"] |
| 161 | + CORE --> JNI["Java JNI (future)<br/>runtime.spawn +<br/>completion event queue"] |
| 162 | +``` |
| 163 | + |
| 164 | +- **Python** -- `future_into_py` bridges directly because Python has native async/await. |
| 165 | +- **TypeScript** -- NAPI-RS async tasks bridge to Node.js promises. |
| 166 | +- **C FFI** -- `runtime.spawn()` kicks off the async operation, then signals completion via an event queue. The C caller polls for results. |
| 167 | +- **Java/Go** (future) -- Same `runtime.spawn()` + completion pattern. |
| 168 | + |
| 169 | +The key insight: no new concurrency primitives cross the FFI boundary. The core crate speaks pure tokio async. Each binding layer adapts that to its language's concurrency model. |
| 170 | + |
| 171 | + |
| 172 | +## Identity and profile architecture |
| 173 | + |
| 174 | +Aster separates three concerns in its trust and identity model: |
| 175 | + |
| 176 | +```mermaid |
| 177 | +graph LR |
| 178 | + subgraph OP["Operator (workstation)"] |
| 179 | + A["~/.aster/config.toml<br/>profile + root_pubkey"] |
| 180 | + B["OS keyring<br/>root_privkey (never on nodes)"] |
| 181 | + end |
| 182 | +
|
| 183 | + subgraph NODE["Node (deployed)"] |
| 184 | + C[".aster-identity<br/>secret_key + endpoint_id<br/>signed peer entries"] |
| 185 | + end |
| 186 | +
|
| 187 | + subgraph MESH["Mesh (shared trust)"] |
| 188 | + D["root_pubkey<br/>(32-byte ed25519 public key)<br/>verifies all credentials"] |
| 189 | + end |
| 190 | +
|
| 191 | + OP -- "aster enroll node" --> NODE |
| 192 | + NODE -- "loads root_pubkey<br/>from peer entry" --> MESH |
| 193 | + OP -- "signs credentials<br/>with root_privkey" --> MESH |
| 194 | +``` |
| 195 | + |
| 196 | +### Operator layer |
| 197 | + |
| 198 | +The operator manages deployment meshes from a workstation. The CLI profile system (`aster profile create/list/use/show/delete`) stores mesh metadata in `~/.aster/config.toml`. The root private key -- the most sensitive secret -- is stored in the OS keyring via the `keyring` package, scoped by profile name (`root_privkey:{name}`). It never touches a running Aster node. |
| 199 | + |
| 200 | +### Node layer |
| 201 | + |
| 202 | +Each deployed node has an `.aster-identity` file containing its secret key (for stable EndpointId) and one or more signed peer entries (enrollment credentials). The file is generated by `aster enroll node` on the operator's machine and deployed alongside the application. A single identity file can hold credentials for multiple meshes and roles. |
| 203 | + |
| 204 | +### Mesh layer |
| 205 | + |
| 206 | +A mesh is defined by its root public key. All nodes in the same mesh share the same root public key and use it to verify enrollment credentials. When `AsterServer` or `AsterClient` loads a peer entry from `.aster-identity`, the `root_pubkey` field in that entry tells the node which mesh it belongs to. |
| 207 | + |
| 208 | +### Configuration flow |
| 209 | + |
| 210 | +```mermaid |
| 211 | +sequenceDiagram |
| 212 | + participant Op as Operator (workstation) |
| 213 | + participant KR as OS Keyring |
| 214 | + participant File as .aster-identity |
| 215 | + participant Srv as AsterServer (node) |
| 216 | +
|
| 217 | + Op->>Op: aster profile create prod |
| 218 | + Op->>KR: aster keygen root --profile prod |
| 219 | + KR-->>Op: root_privkey stored |
| 220 | +
|
| 221 | + Op->>KR: aster enroll node --role producer |
| 222 | + KR-->>Op: read root_privkey |
| 223 | + Op->>Op: generate node keypair + sign credential |
| 224 | + Op->>File: write .aster-identity |
| 225 | +
|
| 226 | + Srv->>File: AsterServer(peer="billing-producer") |
| 227 | + File-->>Srv: secret_key + peer entry |
| 228 | + Srv->>Srv: configure admission gates |
| 229 | + Note over Srv: root_privkey is NOT present on the node |
| 230 | +``` |
| 231 | + |
| 232 | +In dev mode (no `.aster-identity`), the entire identity/profile system is bypassed. Ephemeral keys are generated in memory, consumer gates are opened, and everything works without any files. |
| 233 | + |
| 234 | + |
| 235 | +## Contract identity |
| 236 | + |
| 237 | +Every `@service`-decorated class gets a deterministic `contract_id` -- a BLAKE3 hash of the service's canonical form. This is how Aster knows two services are the same, even across languages. |
| 238 | + |
| 239 | +### How it works |
| 240 | + |
| 241 | +1. The `@service` decorator scans the class and extracts a `ServiceInfo`: method names, parameter types, return types, patterns (unary/streaming), version. |
| 242 | + |
| 243 | +2. `contract_id_from_service(cls)` builds a `ServiceContract` -- a normalized data structure containing: |
| 244 | + - Service name (NFC-normalized Unicode) |
| 245 | + - Version number |
| 246 | + - Scope (shared or stream) |
| 247 | + - Each method: name, pattern, request type graph, response type graph |
| 248 | + - Type definitions: field names, field types, containers (list/set/map), nested references |
| 249 | + |
| 250 | +3. The `ServiceContract` is serialized to bytes using a canonical binary format: |
| 251 | + - Deterministic field ordering (alphabetical by spec) |
| 252 | + - varint length-prefixed strings |
| 253 | + - zigzag-encoded integers |
| 254 | + - No padding, no alignment |
| 255 | + |
| 256 | +4. The bytes are hashed with BLAKE3, producing a 32-byte digest (displayed as 64-char hex). |
| 257 | + |
| 258 | +### Why this matters |
| 259 | + |
| 260 | +The contract_id is the same for any implementation that defines the same methods with the same types at the same version. A Python service and a TypeScript service with identical method signatures produce identical contract IDs. This enables: |
| 261 | + |
| 262 | +- **Safe client/server version checking** -- the consumer's contract_id must match the producer's. |
| 263 | +- **Registry deduplication** -- two producers serving the same contract are interchangeable. |
| 264 | +- **Cross-language interop** -- a Python client can call a TypeScript server (or vice versa) as long as contract IDs match. |
| 265 | + |
| 266 | +### Type graph resolution |
| 267 | + |
| 268 | +Types can be recursive (e.g., a tree node that contains child tree nodes). The contract identity system handles this using Tarjan's algorithm for strongly connected components (SCC): |
| 269 | + |
| 270 | +```mermaid |
| 271 | +graph TD |
| 272 | + A["Build directed graph<br/>of type references"] --> B["Find SCCs<br/>(Tarjan's algorithm)"] |
| 273 | + B --> C["Break cycles with<br/>SELF_REF markers"] |
| 274 | + C --> D["Hash types in reverse<br/>topological order<br/>(leaves first)"] |
| 275 | + D --> E["Reference types by<br/>their BLAKE3 hash"] |
| 276 | +``` |
| 277 | + |
| 278 | +This ensures that even recursive types produce a stable, deterministic contract_id. |
| 279 | + |
| 280 | + |
| 281 | +## RPC wire protocol |
| 282 | + |
| 283 | +When a client calls an RPC method, here is what happens on the wire: |
| 284 | + |
| 285 | +```mermaid |
| 286 | +sequenceDiagram |
| 287 | + participant C as Client |
| 288 | + participant S as Server |
| 289 | +
|
| 290 | + C->>S: Open QUIC bidirectional stream |
| 291 | + C->>S: Frame: StreamHeader (HEADER flag)<br/>service, method, version, metadata |
| 292 | + C->>S: Frame: Request payload (Fory-serialized) |
| 293 | + S->>C: Frame: Response payload (Fory-serialized) |
| 294 | + S->>C: Frame: RpcStatus trailer (TRAILER flag)<br/>code: OK |
| 295 | +``` |
| 296 | + |
| 297 | +Each frame is length-prefixed with flags: |
| 298 | +- `HEADER` (0x04) -- first frame contains the `StreamHeader` |
| 299 | +- `TRAILER` (0x02) -- last frame contains `RpcStatus` (success or error) |
| 300 | +- `COMPRESSED` (0x01) -- payload is zstd-compressed (auto for payloads > 4KB) |
| 301 | + |
| 302 | +Serialization uses Apache Fory in XLANG (cross-language) mode by default. The `ForyCodec` handles encoding/decoding and transparent compression. |
| 303 | + |
| 304 | + |
| 305 | +## Putting it all together |
| 306 | + |
| 307 | +Here is the complete flow of an RPC call from `AsterClient` to `AsterServer`: |
| 308 | + |
| 309 | +```mermaid |
| 310 | +sequenceDiagram |
| 311 | + participant AC as AsterClient |
| 312 | + participant EP as QUIC Endpoint |
| 313 | + participant AS as AsterServer |
| 314 | + participant Svc as HelloService |
| 315 | +
|
| 316 | + Note over AC: 1. Connect |
| 317 | + AC->>EP: Create QUIC endpoint |
| 318 | + AC->>AS: Connect on aster.consumer_admission ALPN |
| 319 | + AC->>AS: Send signed credential |
| 320 | + AS-->>AC: Service list + admission |
| 321 | + AS->>AS: Add client to Gate 0 allowlist |
| 322 | +
|
| 323 | + Note over AC: 2. Get client stub |
| 324 | + AC->>AC: client = await c.client(HelloService) |
| 325 | + AC->>EP: Open connection on aster/1 ALPN |
| 326 | +
|
| 327 | + Note over AC: 3. RPC call |
| 328 | + AC->>AS: Open bidirectional stream |
| 329 | + AC->>AS: StreamHeader + Fory-serialized request |
| 330 | + AS->>Svc: Dispatch to say_hello() |
| 331 | + Svc-->>AS: HelloResponse |
| 332 | + AS-->>AC: Fory-serialized response + OK trailer |
| 333 | +
|
| 334 | + Note over AC: 4. Close |
| 335 | + AC->>EP: Close connections + endpoint |
| 336 | +``` |
| 337 | + |
| 338 | +All of this happens over iroh's QUIC transport with automatic relay fallback, NAT traversal, and connection migration. The application code sees none of this complexity -- just `await hello.say_hello(request)`. |
0 commit comments