research: mesh-llm integration — p2p inference mesh + model sharding under obol-stack economic layer

## Summary

Explore integrating [mesh-llm](https://github.com/michaelneale/mesh-llm) (by @michaelneale at Block, signal-boosted by Jack Dorsey) as the p2p networking and inference routing layer underneath obol-stack's economic layer.

mesh-llm provides what obol-stack currently lacks (p2p mesh networking, model sharding across nodes, smart inference routing). obol-stack provides what mesh-llm lacks (payments, economic incentives, on-chain identity).

Source: https://x.com/jack/status/2039736688457507251 (523 likes, 261 bookmarks)

## What mesh-llm does

Repo: https://github.com/michaelneale/mesh-llm (Rust, actively maintained, last push Apr 2 2026)
Docs: https://docs.anarchai.org

### Discovery — Two mechanisms

**Private mesh (invite token):**
- First node prints a base64 token containing its iroh QUIC endpoint address
- Others join via `--join <token>`
- Direct peer connection, no registry needed

**Public mesh (Nostr relay):**
- Nodes publish mesh listings to Nostr relays (kind 31990 events)
- Listings contain: invite token, served models, VRAM, node count
- `--auto` flag discovers meshes from Nostr
- Named meshes use deterministic IDs: `sha256("mesh-llm:" + name + nostr_pubkey)`

### Networking

- Built on **iroh** (Rust QUIC library) with persistent QUIC connections
- ALPN negotiation: `mesh-llm/1` (protobuf v1) with backward compat
- Bi-streams multiplexed by first byte: gossip (0x01), tunnel/RPC (0x02), tunnel map (0x03), tunnel/HTTP (0x04), route request (0x05), peer down (0x06), peer leaving (0x07), blackboard (0x08), plugin channel (0x09)

### Gossip Protocol

- Protobuf `GossipFrame` with `PeerAnnouncement` messages
- Contains: endpoint ID, role, VRAM, serving models, model demand, RTT, GPU info
- Identity cryptographically verified against QUIC connection
- Model demand signals propagate infectiously with 24h TTL decay

### Inference Routing — Multi-layer (the clever part)

1. **API Proxy** (port 9337): OpenAI-compatible HTTP endpoint, reads `model` field for routing
2. **Smart Router**: Classifies requests by category (Code, Reasoning, Chat, ToolCall, Creative, Info) and complexity (Quick/Moderate/Deep), scores available models
3. **Per-model host election**: Highest-VRAM node wins (deterministic tiebreak by node ID). Three modes:
   - **Solo**: Model fits on one machine, runs locally
   - **Pipeline parallel**: Dense model split across nodes proportional to VRAM, 80ms RTT cap
   - **MoE expert parallel**: Expert shards distributed, sessions hash-routed for KV cache locality
4. **Prefix affinity**: Routes repeat requests to same node for KV cache reuse
5. **Load balancing**: Round-robin for replicated models, session-sticky hash for MoE

### Tunnel System

TCP-to-QUIC bidirectional relay. Local TCP ports opened per peer, llama.cpp connects locally, traffic relayed over QUIC bi-streams to remote node's rpc-server.

## Gap Analysis: mesh-llm vs obol-stack

| Capability | mesh-llm | obol-stack | Integration opportunity |
|---|---|---|---|
| Node discovery | Nostr relays + invite tokens | ERC-8004 on-chain registry | Dual discovery: Nostr for fast p2p, ERC-8004 for persistent identity |
| Networking | iroh QUIC p2p | HTTP/REST | mesh-llm's QUIC mesh for inter-node, obol HTTP API for external clients |
| Payments | None (free sharing) | x402 USDC per-request | Add x402 payment gate at mesh-llm's API proxy layer |
| Model sharding | Pipeline + MoE parallel | Single node only | **Major gap** — mesh-llm enables running 70B+ across multiple consumer GPUs |
| Smart routing | Category + complexity scoring | Not yet | Adopt mesh-llm's router for obol-stack inference |
| Identity | QUIC crypto keys | EVM addresses + ERC-8004 NFTs | Bridge: ERC-8004 identity → iroh node identity |
| Incentives | Altruistic sharing | OPOW rewards + escrow | Add reward engine on top of mesh participation |
| GPU profiling | VRAM + RTT announcements via gossip | Hardware detection (PR #308) | Share hardware profiles across both layers |

## Integration Architecture

```
                    External Clients
                          │
                    ┌──────▼──────┐
                    │  obol-stack │
                    │  HTTP API   │
                    │  + x402     │
                    │  payments   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  mesh-llm   │
                    │  API proxy  │
                    │  (port 9337)│
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────▼────┐ ┌────▼─────┐ ┌───▼──────┐
        │ Node A   │ │ Node B   │ │ Node C   │
        │ 16GB GPU │ │ 24GB GPU │ │ 16GB GPU │
        │ iroh p2p │ │ iroh p2p │ │ iroh p2p │
        │ gossip   │ │ gossip   │ │ gossip   │
        └──────────┘ └──────────┘ └──────────┘
              │            │            │
              └────────────┴────────────┘
                  QUIC mesh (model sharding,
                  pipeline parallel, MoE parallel)
```

**Flow:**
1. External client hits obol-stack HTTP API with x402 payment
2. obol-stack validates payment, forwards to mesh-llm API proxy
3. mesh-llm smart router picks best node(s) based on model, VRAM, latency
4. If model needs sharding: pipeline/MoE parallel across mesh nodes
5. Response flows back through proxy → obol-stack → client
6. obol-stack settles payment, distributes rewards to participating nodes via OPOW

## What This Enables

1. **Pool consumer GPUs to run large models**: Three 16GB GPUs could run a 70B Q4 model via pipeline parallel — impossible on single nodes
2. **Smart model placement**: mesh-llm's election system automatically puts models on the best-equipped nodes
3. **Economic incentive for the mesh**: obol-stack's x402 + reward engine pays nodes proportional to their contribution
4. **Hybrid discovery**: Nostr for fast local mesh formation, ERC-8004 for persistent network-wide identity and reputation
5. **KV cache optimization**: Prefix affinity + session routing means repeat users hit warm caches

## Implementation Path

### Phase 1: Evaluate (this issue)
- [ ] Clone and run mesh-llm locally between SilverMesh + SilverXPS
- [ ] Benchmark pipeline parallel: split Qwen3.5-27B across two machines
- [ ] Measure overhead: latency added by QUIC tunnel vs direct inference
- [ ] Test Nostr discovery: how fast do nodes find each other?

### Phase 2: Bridge
- [ ] Add x402 payment middleware at mesh-llm's API proxy
- [ ] Map iroh node IDs to ERC-8004 identities
- [ ] Forward hardware announcements from mesh-llm gossip to obol-stack registry
- [ ] Add obol-stack reward distribution for mesh participants

### Phase 3: Integrate
- [ ] Ship mesh-llm as optional obol-stack component (Helm chart or binary)
- [ ] `obol mesh join` / `obol mesh create` commands
- [ ] Unified discovery: ERC-8004 registered nodes auto-join mesh
- [ ] Pipeline parallel for dense models, MoE parallel for MoE — enabled by default when multiple nodes detected

## Key Questions

1. **Rust ↔ Go bridge**: mesh-llm is Rust, obol-stack is Go. Integration options: (a) sidecar process with HTTP/gRPC bridge, (b) FFI via cgo, (c) separate binary managed by obol-stack's Helm chart. Sidecar is cleanest.

2. **Nostr vs ERC-8004 for discovery**: Do we need both? Nostr is fast and free (no gas), ERC-8004 is persistent and composable. Probably both — Nostr for real-time mesh topology, ERC-8004 for identity and reputation that persists across mesh sessions.

3. **Payment granularity**: x402 is per-request. For pipeline parallel, a single request involves multiple nodes. How to split payment? Options: (a) proportional to VRAM contributed, (b) proportional to layers processed, (c) fixed split defined by election.

4. **RTT budget**: mesh-llm caps pipeline parallel at 80ms RTT. This limits geographical distribution. Obol-stack nodes could be anywhere. Need to define mesh topology constraints.

5. **Trust model**: mesh-llm trusts all mesh participants (verified by QUIC crypto). Obol-stack has commit-reveal verification and slashing. How to reconcile? Probably: mesh-llm handles networking, obol-stack handles economic trust.

## References

- mesh-llm repo: https://github.com/michaelneale/mesh-llm
- mesh-llm docs: https://docs.anarchai.org
- Jack Dorsey signal: https://x.com/jack/status/2039736688457507251
- iroh QUIC library: https://github.com/n0-computer/iroh
- obol-stack PR #288: inference monetization (M1 sell flow)
- obol-stack Issue #300: inference lifecycle
- obol-stack Issue #307: first-run wizard + buyer flow
- Nostr NIP for mesh discovery: kind 31990 (parameterized replaceable event)

## Labels

`research` `component:inference` `component:networking` `priority:medium` `size:XL`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: mesh-llm integration — p2p inference mesh + model sharding under obol-stack economic layer #309

Summary

What mesh-llm does

Discovery — Two mechanisms

Networking

Gossip Protocol

Inference Routing — Multi-layer (the clever part)

Tunnel System

Gap Analysis: mesh-llm vs obol-stack

Integration Architecture

What This Enables

Implementation Path

Phase 1: Evaluate (this issue)

Phase 2: Bridge

Phase 3: Integrate

Key Questions

References

Labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Capability	mesh-llm	obol-stack	Integration opportunity
Node discovery	Nostr relays + invite tokens	ERC-8004 on-chain registry	Dual discovery: Nostr for fast p2p, ERC-8004 for persistent identity
Networking	iroh QUIC p2p	HTTP/REST	mesh-llm's QUIC mesh for inter-node, obol HTTP API for external clients
Payments	None (free sharing)	x402 USDC per-request	Add x402 payment gate at mesh-llm's API proxy layer
Model sharding	Pipeline + MoE parallel	Single node only	Major gap — mesh-llm enables running 70B+ across multiple consumer GPUs
Smart routing	Category + complexity scoring	Not yet	Adopt mesh-llm's router for obol-stack inference
Identity	QUIC crypto keys	EVM addresses + ERC-8004 NFTs	Bridge: ERC-8004 identity → iroh node identity
Incentives	Altruistic sharing	OPOW rewards + escrow	Add reward engine on top of mesh participation
GPU profiling	VRAM + RTT announcements via gossip	Hardware detection (PR #308)	Share hardware profiles across both layers

research: mesh-llm integration — p2p inference mesh + model sharding under obol-stack economic layer #309

Description

Summary

What mesh-llm does

Discovery — Two mechanisms

Networking

Gossip Protocol

Inference Routing — Multi-layer (the clever part)

Tunnel System

Gap Analysis: mesh-llm vs obol-stack

Integration Architecture

What This Enables

Implementation Path

Phase 1: Evaluate (this issue)

Phase 2: Bridge

Phase 3: Integrate

Key Questions

References

Labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions