Skip to content

eRPC: add consensus failsafe (maxParticipants=3, agreementThreshold=2) for state-read methods #358

@bussyjd

Description

@bussyjd

Problem

The obol-stack eRPC config (internal/embed/infrastructure/values/erpc.yaml.gotmpl) routes each RPC request to a single upstream chosen by eRPC's selectionPolicy + score, with hedge for latency fallback. For four EVM networks (mainnet, hoodi, base, base-sepolia) we currently rely on one upstream's answer per request.

For our read paths that underpin payment/registration correctness — agent-registration document fetches, ERC-8004 registry reads, USDC balance checks, eth_call of payment requirements — a single malicious or desynced upstream can return a wrong answer that no layer above detects. Consensus validation between multiple upstreams catches this cheaply.

Proposed config

Add a consensus entry to the failsafe list for high-trust read methods on each EVM network:

failsafe:
  - matchMethod: "eth_call|eth_getLogs|eth_getTransactionReceipt|eth_getTransactionByHash|eth_getBlockByNumber|eth_getBlockByHash|eth_chainId"
    consensus:
      maxParticipants: 3        # fan out to 3 upstreams in parallel
      agreementThreshold: 2     # 2 of 3 must match → return majority answer
      punishMisbehavior:
        disputeThreshold: 3
        disputeWindow: 10s
        sitOutPenalty: 5m
  - matchMethod: "*"            # non-consensus path stays for latency-sensitive reads
    timeout:
      duration: 30s
    retry:
      maxAttempts: 2
      delay: 100ms
    hedge:
      delay: 500ms
      maxCount: 1

Apply to all four EVM network blocks (lines ~80-145 of the gotmpl). Keep the existing selectionPolicy intact for eth_sendRawTransaction routing; consensus only activates for read methods.

Why 3/2, not 2/2

  • 2/2 means every paid request fails as soon as one upstream is flaky → negates the resilience we already have.
  • 3/2 tolerates one upstream failure/disagreement per request, returns the majority answer, and the punishMisbehavior block auto-quarantines consistently-misbehaving upstreams for 5 min.

Upstream prerequisite

Each affected chain must have ≥ 3 upstreams configured in the upstreams: array for consensus to have anyone to vote with. Current state:

  • chainId: 1 (mainnet) — verify count; add more public RPCs via obol network add if needed.
  • chainId: 560048 (hoodi) — likely only 1 today.
  • chainId: 8453 (base) — verify.
  • chainId: 84532 (base-sepolia) — 1 (base-sepolia-publicnode) + whatever is added by obol network add.

When count < 3, eRPC degrades gracefully — it queries however many exist — but the resilience goal isn't met. So this issue should include bumping the default ChainList seed count or guaranteeing a minimum.

Explicit non-goals

  • Do not apply consensus to eth_sendRawTransaction — routing stays single-upstream (already handled by selectionPolicy).
  • Do not apply to eth_blockNumber/eth_syncing/latency-critical head checks — treat those as the matchMethod: "*" fallthrough.
  • Do not use agreementThreshold: 3 (of 3) — one slow upstream fails every call.

Optional: nonce handling

For eth_getTransactionCount, lagging replicas routinely disagree. Instead of strict consensus, use:

- matchMethod: "eth_getTransactionCount"
  consensus:
    maxParticipants: 3
    agreementThreshold: 1
    preferHighestValueFor:
      eth_getTransactionCount:
        - result

This returns the highest observed nonce, preventing stale-nonce transaction failures.

Validation plan

  • Unit: internal/network/erpc_test.go — template render with the new failsafe block.
  • Integration: seed 3 base-sepolia RPCs (publicnode + alchemy public + drpc public). Probe eth_call against the registry contract; flip one upstream to return wrong data (mock); confirm request still returns majority answer.
  • Observability: eRPC emits metrics on consensus participation — expose them in Grafana.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions