Skip to content

Commit c71940a

Browse files
committed
fix ci
1 parent ed1b773 commit c71940a

1 file changed

Lines changed: 338 additions & 0 deletions

File tree

docs/reference/architecture.mdx

Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,338 @@
1+
---
2+
id: architecture
3+
title: Architecture
4+
slug: /reference/architecture
5+
---
6+
7+
Aster is a peer-to-peer RPC framework built on top of iroh. This document explains the full stack, from the decorators you write down to the QUIC packets on the wire.
8+
9+
## The stack
10+
11+
```mermaid
12+
graph TD
13+
subgraph PY["Python / TypeScript layer"]
14+
A["AsterServer, AsterClient, @service, @rpc<br/>ForyCodec, interceptors, trust/admission"]
15+
end
16+
17+
subgraph RS["Rust bindings (PyO3 / NAPI-RS)"]
18+
B["IrohNode, NetClient, BlobsClient,<br/>DocsClient, GossipClient, HookManager"]
19+
end
20+
21+
subgraph CORE["Core (aster_transport_core)"]
22+
C["CoreNode, CoreNetClient, CoreBlobsClient,<br/>CoreDocsClient, CoreGossipClient, AsterQueueHandler"]
23+
end
24+
25+
subgraph IROH["Iroh crates"]
26+
D["iroh (endpoint, router), iroh-blobs,<br/>iroh-docs, iroh-gossip"]
27+
end
28+
29+
subgraph NET["Network"]
30+
E["UDP, relay servers, NAT traversal"]
31+
end
32+
33+
PY -- "PyO3 / NAPI-RS async bridge" --> RS
34+
RS -- "Rust function calls" --> CORE
35+
CORE -- "async Rust (tokio)" --> IROH
36+
IROH -- "QUIC (quinn)" --> NET
37+
```
38+
39+
Each layer has a clear responsibility:
40+
41+
- **Language layer** -- Application-facing API. Service definitions, RPC call/dispatch, serialization (Fory), interceptors, trust/admission logic. All pure Python/TypeScript except for the native bindings import.
42+
- **Rust bindings** -- Thin PyO3/NAPI-RS wrappers that expose Rust async functions as language-native awaitables. No business logic lives here. Each wrapper calls into core and bridges the async boundary.
43+
- **Core** -- The authoritative backend. `CoreNode` manages the iroh `Router`, which dispatches connections by ALPN. All real logic (node creation, blob operations, doc sync, gossip, connection management) lives here.
44+
- **Iroh crates** -- The underlying P2P networking stack. Provides QUIC endpoints with relay fallback, content-addressed blob storage, CRDT document sync, and gossip pub-sub.
45+
46+
47+
## The unified node
48+
49+
The central design decision in Aster is that everything runs through a single iroh endpoint. When `AsterServer.start()` creates the node, it registers all protocol handlers on one `Router`:
50+
51+
```mermaid
52+
graph TD
53+
NODE["IrohNode"]
54+
ROUTER["Router"]
55+
56+
NODE --> ROUTER
57+
58+
ROUTER --> BLOBS["iroh-blobs<br/><code>BLOBS_ALPN</code>"]
59+
ROUTER --> DOCS["iroh-docs<br/><code>DOCS_ALPN</code>"]
60+
ROUTER --> GOSSIP["iroh-gossip<br/><code>GOSSIP_ALPN</code>"]
61+
ROUTER --> RPC["Aster RPC<br/><code>aster/1</code>"]
62+
ROUTER --> CA["Consumer Admission<br/><code>aster.consumer_admission</code>"]
63+
ROUTER --> PA["Producer Admission<br/><code>aster.producer_admission</code>"]
64+
65+
RPC --> QH1["AsterQueueHandler"]
66+
CA --> QH2["AsterQueueHandler"]
67+
PA --> QH3["AsterQueueHandler"]
68+
```
69+
70+
### How the Router works
71+
72+
Iroh's `Router` is the central connection dispatcher. When a peer opens a QUIC connection, the ALPN (Application-Layer Protocol Negotiation) string in the TLS handshake determines which `ProtocolHandler` receives the connection.
73+
74+
Built-in iroh protocols register their own handlers:
75+
- `iroh-blobs` registers for `BLOBS_ALPN` (blob transfer protocol)
76+
- `iroh-docs` registers for `DOCS_ALPN` (document sync protocol)
77+
- `iroh-gossip` registers for `GOSSIP_ALPN` (gossip broadcast protocol)
78+
79+
### AsterQueueHandler
80+
81+
Custom Aster ALPNs (`aster/1`, `aster.consumer_admission`, `aster.producer_admission`) are each registered as an `AsterQueueHandler` instance. This is a thin `ProtocolHandler` implementation that does one thing: when a connection arrives, it places `(alpn, connection)` on a bounded tokio mpsc channel and returns.
82+
83+
All `AsterQueueHandler` instances share a single sender, so from the consumer's perspective there is one unified stream of incoming connections. The language layer calls `node.accept_aster()` to pull from this channel:
84+
85+
```rust
86+
// Rust core (simplified)
87+
struct AsterQueueHandler {
88+
alpn: Vec<u8>,
89+
tx: mpsc::Sender<(Vec<u8>, Connection)>,
90+
}
91+
92+
impl ProtocolHandler for AsterQueueHandler {
93+
async fn accept(&self, conn: Connection) -> Result<()> {
94+
self.tx.send((self.alpn.clone(), conn)).await?;
95+
Ok(())
96+
}
97+
}
98+
```
99+
100+
The Python accept loop in `AsterServer._accept_loop()` then dispatches based on the ALPN:
101+
102+
```python
103+
# Python (simplified)
104+
while True:
105+
alpn, conn = await self._node.accept_aster()
106+
if alpn == b"aster/1":
107+
asyncio.create_task(self._server.handle_connection(conn))
108+
elif alpn == b"aster.consumer_admission":
109+
asyncio.create_task(handle_consumer_admission(conn, ...))
110+
elif alpn == b"aster.producer_admission":
111+
asyncio.create_task(handle_producer_admission(conn, ...))
112+
```
113+
114+
### Why one endpoint matters
115+
116+
A single endpoint means:
117+
- **One node ID** -- peers only need one address to reach you. The same `NodeAddr` works for RPC calls, blob downloads, doc sync, and gossip.
118+
- **One relay connection** -- iroh maintains a persistent connection to a relay server for NAT traversal. One endpoint = one relay connection, not N.
119+
- **Connect once, use any protocol** -- a peer that has been admitted via consumer admission can subsequently call RPC methods, fetch blobs, sync docs, and join gossip topics. No additional connections needed.
120+
- **Gate 0 applies to everything** -- the connection-level admission hook gates ALL protocols. You cannot bypass RPC admission by connecting on the blobs ALPN.
121+
122+
123+
## The async bridge
124+
125+
### Language bindings to Rust
126+
127+
The PyO3 module init starts a single shared tokio runtime. Every async Rust function exposed to Python uses `pyo3-async-runtimes::tokio::future_into_py` to convert a Rust `Future` into a Python awaitable. TypeScript uses NAPI-RS with a similar pattern.
128+
129+
```mermaid
130+
sequenceDiagram
131+
participant Py as Python asyncio
132+
participant PyO3 as PyO3 wrapper
133+
participant Tokio as Tokio runtime
134+
participant Iroh as Iroh (Rust)
135+
136+
Py->>PyO3: await node.accept_aster()
137+
PyO3->>Tokio: future_into_py(async { ... })
138+
Tokio->>Iroh: poll Rust future
139+
Iroh-->>Tokio: connection arrives
140+
Tokio-->>PyO3: result ready
141+
PyO3-->>Py: return to event loop
142+
```
143+
144+
This means:
145+
- Python's `asyncio` event loop (or Node.js's event loop) drives the application.
146+
- Rust's tokio runtime executes the actual I/O.
147+
- The bridge converts between them without blocking either.
148+
- Python tests use `asyncio_mode = "auto"` (pytest-asyncio) so every `async def test_*` just works.
149+
150+
### The FFI story for other languages
151+
152+
The core crate is designed to support multiple language bindings:
153+
154+
```mermaid
155+
graph TD
156+
CORE["core/src/lib.rs<br/>(tokio-async API)"]
157+
158+
CORE --> PYO3["PyO3 bridge<br/>future_into_py"]
159+
CORE --> NAPI["NAPI-RS bridge<br/>napi async"]
160+
CORE --> CFFI["C FFI (ffi/)<br/>runtime.spawn +<br/>completion event queue"]
161+
CORE --> JNI["Java JNI (future)<br/>runtime.spawn +<br/>completion event queue"]
162+
```
163+
164+
- **Python** -- `future_into_py` bridges directly because Python has native async/await.
165+
- **TypeScript** -- NAPI-RS async tasks bridge to Node.js promises.
166+
- **C FFI** -- `runtime.spawn()` kicks off the async operation, then signals completion via an event queue. The C caller polls for results.
167+
- **Java/Go** (future) -- Same `runtime.spawn()` + completion pattern.
168+
169+
The key insight: no new concurrency primitives cross the FFI boundary. The core crate speaks pure tokio async. Each binding layer adapts that to its language's concurrency model.
170+
171+
172+
## Identity and profile architecture
173+
174+
Aster separates three concerns in its trust and identity model:
175+
176+
```mermaid
177+
graph LR
178+
subgraph OP["Operator (workstation)"]
179+
A["~/.aster/config.toml<br/>profile + root_pubkey"]
180+
B["OS keyring<br/>root_privkey (never on nodes)"]
181+
end
182+
183+
subgraph NODE["Node (deployed)"]
184+
C[".aster-identity<br/>secret_key + endpoint_id<br/>signed peer entries"]
185+
end
186+
187+
subgraph MESH["Mesh (shared trust)"]
188+
D["root_pubkey<br/>(32-byte ed25519 public key)<br/>verifies all credentials"]
189+
end
190+
191+
OP -- "aster enroll node" --> NODE
192+
NODE -- "loads root_pubkey<br/>from peer entry" --> MESH
193+
OP -- "signs credentials<br/>with root_privkey" --> MESH
194+
```
195+
196+
### Operator layer
197+
198+
The operator manages deployment meshes from a workstation. The CLI profile system (`aster profile create/list/use/show/delete`) stores mesh metadata in `~/.aster/config.toml`. The root private key -- the most sensitive secret -- is stored in the OS keyring via the `keyring` package, scoped by profile name (`root_privkey:{name}`). It never touches a running Aster node.
199+
200+
### Node layer
201+
202+
Each deployed node has an `.aster-identity` file containing its secret key (for stable EndpointId) and one or more signed peer entries (enrollment credentials). The file is generated by `aster enroll node` on the operator's machine and deployed alongside the application. A single identity file can hold credentials for multiple meshes and roles.
203+
204+
### Mesh layer
205+
206+
A mesh is defined by its root public key. All nodes in the same mesh share the same root public key and use it to verify enrollment credentials. When `AsterServer` or `AsterClient` loads a peer entry from `.aster-identity`, the `root_pubkey` field in that entry tells the node which mesh it belongs to.
207+
208+
### Configuration flow
209+
210+
```mermaid
211+
sequenceDiagram
212+
participant Op as Operator (workstation)
213+
participant KR as OS Keyring
214+
participant File as .aster-identity
215+
participant Srv as AsterServer (node)
216+
217+
Op->>Op: aster profile create prod
218+
Op->>KR: aster keygen root --profile prod
219+
KR-->>Op: root_privkey stored
220+
221+
Op->>KR: aster enroll node --role producer
222+
KR-->>Op: read root_privkey
223+
Op->>Op: generate node keypair + sign credential
224+
Op->>File: write .aster-identity
225+
226+
Srv->>File: AsterServer(peer="billing-producer")
227+
File-->>Srv: secret_key + peer entry
228+
Srv->>Srv: configure admission gates
229+
Note over Srv: root_privkey is NOT present on the node
230+
```
231+
232+
In dev mode (no `.aster-identity`), the entire identity/profile system is bypassed. Ephemeral keys are generated in memory, consumer gates are opened, and everything works without any files.
233+
234+
235+
## Contract identity
236+
237+
Every `@service`-decorated class gets a deterministic `contract_id` -- a BLAKE3 hash of the service's canonical form. This is how Aster knows two services are the same, even across languages.
238+
239+
### How it works
240+
241+
1. The `@service` decorator scans the class and extracts a `ServiceInfo`: method names, parameter types, return types, patterns (unary/streaming), version.
242+
243+
2. `contract_id_from_service(cls)` builds a `ServiceContract` -- a normalized data structure containing:
244+
- Service name (NFC-normalized Unicode)
245+
- Version number
246+
- Scope (shared or stream)
247+
- Each method: name, pattern, request type graph, response type graph
248+
- Type definitions: field names, field types, containers (list/set/map), nested references
249+
250+
3. The `ServiceContract` is serialized to bytes using a canonical binary format:
251+
- Deterministic field ordering (alphabetical by spec)
252+
- varint length-prefixed strings
253+
- zigzag-encoded integers
254+
- No padding, no alignment
255+
256+
4. The bytes are hashed with BLAKE3, producing a 32-byte digest (displayed as 64-char hex).
257+
258+
### Why this matters
259+
260+
The contract_id is the same for any implementation that defines the same methods with the same types at the same version. A Python service and a TypeScript service with identical method signatures produce identical contract IDs. This enables:
261+
262+
- **Safe client/server version checking** -- the consumer's contract_id must match the producer's.
263+
- **Registry deduplication** -- two producers serving the same contract are interchangeable.
264+
- **Cross-language interop** -- a Python client can call a TypeScript server (or vice versa) as long as contract IDs match.
265+
266+
### Type graph resolution
267+
268+
Types can be recursive (e.g., a tree node that contains child tree nodes). The contract identity system handles this using Tarjan's algorithm for strongly connected components (SCC):
269+
270+
```mermaid
271+
graph TD
272+
A["Build directed graph<br/>of type references"] --> B["Find SCCs<br/>(Tarjan's algorithm)"]
273+
B --> C["Break cycles with<br/>SELF_REF markers"]
274+
C --> D["Hash types in reverse<br/>topological order<br/>(leaves first)"]
275+
D --> E["Reference types by<br/>their BLAKE3 hash"]
276+
```
277+
278+
This ensures that even recursive types produce a stable, deterministic contract_id.
279+
280+
281+
## RPC wire protocol
282+
283+
When a client calls an RPC method, here is what happens on the wire:
284+
285+
```mermaid
286+
sequenceDiagram
287+
participant C as Client
288+
participant S as Server
289+
290+
C->>S: Open QUIC bidirectional stream
291+
C->>S: Frame: StreamHeader (HEADER flag)<br/>service, method, version, metadata
292+
C->>S: Frame: Request payload (Fory-serialized)
293+
S->>C: Frame: Response payload (Fory-serialized)
294+
S->>C: Frame: RpcStatus trailer (TRAILER flag)<br/>code: OK
295+
```
296+
297+
Each frame is length-prefixed with flags:
298+
- `HEADER` (0x04) -- first frame contains the `StreamHeader`
299+
- `TRAILER` (0x02) -- last frame contains `RpcStatus` (success or error)
300+
- `COMPRESSED` (0x01) -- payload is zstd-compressed (auto for payloads > 4KB)
301+
302+
Serialization uses Apache Fory in XLANG (cross-language) mode by default. The `ForyCodec` handles encoding/decoding and transparent compression.
303+
304+
305+
## Putting it all together
306+
307+
Here is the complete flow of an RPC call from `AsterClient` to `AsterServer`:
308+
309+
```mermaid
310+
sequenceDiagram
311+
participant AC as AsterClient
312+
participant EP as QUIC Endpoint
313+
participant AS as AsterServer
314+
participant Svc as HelloService
315+
316+
Note over AC: 1. Connect
317+
AC->>EP: Create QUIC endpoint
318+
AC->>AS: Connect on aster.consumer_admission ALPN
319+
AC->>AS: Send signed credential
320+
AS-->>AC: Service list + admission
321+
AS->>AS: Add client to Gate 0 allowlist
322+
323+
Note over AC: 2. Get client stub
324+
AC->>AC: client = await c.client(HelloService)
325+
AC->>EP: Open connection on aster/1 ALPN
326+
327+
Note over AC: 3. RPC call
328+
AC->>AS: Open bidirectional stream
329+
AC->>AS: StreamHeader + Fory-serialized request
330+
AS->>Svc: Dispatch to say_hello()
331+
Svc-->>AS: HelloResponse
332+
AS-->>AC: Fory-serialized response + OK trailer
333+
334+
Note over AC: 4. Close
335+
AC->>EP: Close connections + endpoint
336+
```
337+
338+
All of this happens over iroh's QUIC transport with automatic relay fallback, NAT traversal, and connection migration. The application code sees none of this complexity -- just `await hello.say_hello(request)`.

0 commit comments

Comments
 (0)