Skip to content

research: mesh-llm integration — p2p inference mesh + model sharding under obol-stack economic layer #309

@bussyjd

Description

@bussyjd

Summary

Explore integrating mesh-llm (by @michaelneale at Block, signal-boosted by Jack Dorsey) as the p2p networking and inference routing layer underneath obol-stack's economic layer.

mesh-llm provides what obol-stack currently lacks (p2p mesh networking, model sharding across nodes, smart inference routing). obol-stack provides what mesh-llm lacks (payments, economic incentives, on-chain identity).

Source: https://x.com/jack/status/2039736688457507251 (523 likes, 261 bookmarks)

What mesh-llm does

Repo: https://github.com/michaelneale/mesh-llm (Rust, actively maintained, last push Apr 2 2026)
Docs: https://docs.anarchai.org

Discovery — Two mechanisms

Private mesh (invite token):

  • First node prints a base64 token containing its iroh QUIC endpoint address
  • Others join via --join <token>
  • Direct peer connection, no registry needed

Public mesh (Nostr relay):

  • Nodes publish mesh listings to Nostr relays (kind 31990 events)
  • Listings contain: invite token, served models, VRAM, node count
  • --auto flag discovers meshes from Nostr
  • Named meshes use deterministic IDs: sha256("mesh-llm:" + name + nostr_pubkey)

Networking

  • Built on iroh (Rust QUIC library) with persistent QUIC connections
  • ALPN negotiation: mesh-llm/1 (protobuf v1) with backward compat
  • Bi-streams multiplexed by first byte: gossip (0x01), tunnel/RPC (0x02), tunnel map (0x03), tunnel/HTTP (0x04), route request (0x05), peer down (0x06), peer leaving (0x07), blackboard (0x08), plugin channel (0x09)

Gossip Protocol

  • Protobuf GossipFrame with PeerAnnouncement messages
  • Contains: endpoint ID, role, VRAM, serving models, model demand, RTT, GPU info
  • Identity cryptographically verified against QUIC connection
  • Model demand signals propagate infectiously with 24h TTL decay

Inference Routing — Multi-layer (the clever part)

  1. API Proxy (port 9337): OpenAI-compatible HTTP endpoint, reads model field for routing
  2. Smart Router: Classifies requests by category (Code, Reasoning, Chat, ToolCall, Creative, Info) and complexity (Quick/Moderate/Deep), scores available models
  3. Per-model host election: Highest-VRAM node wins (deterministic tiebreak by node ID). Three modes:
    • Solo: Model fits on one machine, runs locally
    • Pipeline parallel: Dense model split across nodes proportional to VRAM, 80ms RTT cap
    • MoE expert parallel: Expert shards distributed, sessions hash-routed for KV cache locality
  4. Prefix affinity: Routes repeat requests to same node for KV cache reuse
  5. Load balancing: Round-robin for replicated models, session-sticky hash for MoE

Tunnel System

TCP-to-QUIC bidirectional relay. Local TCP ports opened per peer, llama.cpp connects locally, traffic relayed over QUIC bi-streams to remote node's rpc-server.

Gap Analysis: mesh-llm vs obol-stack

Capability mesh-llm obol-stack Integration opportunity
Node discovery Nostr relays + invite tokens ERC-8004 on-chain registry Dual discovery: Nostr for fast p2p, ERC-8004 for persistent identity
Networking iroh QUIC p2p HTTP/REST mesh-llm's QUIC mesh for inter-node, obol HTTP API for external clients
Payments None (free sharing) x402 USDC per-request Add x402 payment gate at mesh-llm's API proxy layer
Model sharding Pipeline + MoE parallel Single node only Major gap — mesh-llm enables running 70B+ across multiple consumer GPUs
Smart routing Category + complexity scoring Not yet Adopt mesh-llm's router for obol-stack inference
Identity QUIC crypto keys EVM addresses + ERC-8004 NFTs Bridge: ERC-8004 identity → iroh node identity
Incentives Altruistic sharing OPOW rewards + escrow Add reward engine on top of mesh participation
GPU profiling VRAM + RTT announcements via gossip Hardware detection (PR #308) Share hardware profiles across both layers

Integration Architecture

                    External Clients
                          │
                    ┌──────▼──────┐
                    │  obol-stack │
                    │  HTTP API   │
                    │  + x402     │
                    │  payments   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  mesh-llm   │
                    │  API proxy  │
                    │  (port 9337)│
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────▼────┐ ┌────▼─────┐ ┌───▼──────┐
        │ Node A   │ │ Node B   │ │ Node C   │
        │ 16GB GPU │ │ 24GB GPU │ │ 16GB GPU │
        │ iroh p2p │ │ iroh p2p │ │ iroh p2p │
        │ gossip   │ │ gossip   │ │ gossip   │
        └──────────┘ └──────────┘ └──────────┘
              │            │            │
              └────────────┴────────────┘
                  QUIC mesh (model sharding,
                  pipeline parallel, MoE parallel)

Flow:

  1. External client hits obol-stack HTTP API with x402 payment
  2. obol-stack validates payment, forwards to mesh-llm API proxy
  3. mesh-llm smart router picks best node(s) based on model, VRAM, latency
  4. If model needs sharding: pipeline/MoE parallel across mesh nodes
  5. Response flows back through proxy → obol-stack → client
  6. obol-stack settles payment, distributes rewards to participating nodes via OPOW

What This Enables

  1. Pool consumer GPUs to run large models: Three 16GB GPUs could run a 70B Q4 model via pipeline parallel — impossible on single nodes
  2. Smart model placement: mesh-llm's election system automatically puts models on the best-equipped nodes
  3. Economic incentive for the mesh: obol-stack's x402 + reward engine pays nodes proportional to their contribution
  4. Hybrid discovery: Nostr for fast local mesh formation, ERC-8004 for persistent network-wide identity and reputation
  5. KV cache optimization: Prefix affinity + session routing means repeat users hit warm caches

Implementation Path

Phase 1: Evaluate (this issue)

  • Clone and run mesh-llm locally between SilverMesh + SilverXPS
  • Benchmark pipeline parallel: split Qwen3.5-27B across two machines
  • Measure overhead: latency added by QUIC tunnel vs direct inference
  • Test Nostr discovery: how fast do nodes find each other?

Phase 2: Bridge

  • Add x402 payment middleware at mesh-llm's API proxy
  • Map iroh node IDs to ERC-8004 identities
  • Forward hardware announcements from mesh-llm gossip to obol-stack registry
  • Add obol-stack reward distribution for mesh participants

Phase 3: Integrate

  • Ship mesh-llm as optional obol-stack component (Helm chart or binary)
  • obol mesh join / obol mesh create commands
  • Unified discovery: ERC-8004 registered nodes auto-join mesh
  • Pipeline parallel for dense models, MoE parallel for MoE — enabled by default when multiple nodes detected

Key Questions

  1. Rust ↔ Go bridge: mesh-llm is Rust, obol-stack is Go. Integration options: (a) sidecar process with HTTP/gRPC bridge, (b) FFI via cgo, (c) separate binary managed by obol-stack's Helm chart. Sidecar is cleanest.

  2. Nostr vs ERC-8004 for discovery: Do we need both? Nostr is fast and free (no gas), ERC-8004 is persistent and composable. Probably both — Nostr for real-time mesh topology, ERC-8004 for identity and reputation that persists across mesh sessions.

  3. Payment granularity: x402 is per-request. For pipeline parallel, a single request involves multiple nodes. How to split payment? Options: (a) proportional to VRAM contributed, (b) proportional to layers processed, (c) fixed split defined by election.

  4. RTT budget: mesh-llm caps pipeline parallel at 80ms RTT. This limits geographical distribution. Obol-stack nodes could be anywhere. Need to define mesh topology constraints.

  5. Trust model: mesh-llm trusts all mesh participants (verified by QUIC crypto). Obol-stack has commit-reveal verification and slashing. How to reconcile? Probably: mesh-llm handles networking, obol-stack handles economic trust.

References

Labels

research component:inference component:networking priority:medium size:XL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions