cl: cap req/resp response body size#21562
Conversation
Caplin's req/resp handler buffered a peer's entire response into an in-memory httptest.ResponseRecorder via an uncapped io.Copy, bounded only by a read deadline (10s, since no caller sets REQRESP-EXPECTED-CHUNKS). A peer can flood the response stream and drive the victim's heap to many GiB. The cheapest trigger is the Status handshake the victim sends back to every inbound connection (discovery.onConnection -> ValidatePeer), so it is remotely and unauthenticatedly reachable. Wrap the response copy in an io.LimitReader sized per topic: single-chunk protocols (status, ping, metadata, goodbye, light-client singles) are capped at one MAX_CHUNK_SIZE (15 MiB), and by_range / by_root protocols at MAX_REQUEST_BLOCKS chunks. Legitimate responses sit far below these ceilings, so a flooding peer is bounded instead of unbounded; the Status handshake vector in particular is tightly capped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit capped single-chunk responses at one MAX_CHUNK_SIZE (15 MiB). Single-object protocols (status, ping, metadata, goodbye, light-client singles) are at most tens of KiB, so 15 MiB left a large amplification: a 128-peer Status-handshake flood could still force ~3.8 GiB of transient heap. Cap single-object responses at 1 MiB (tens of x margin over the largest such object), cutting the residual to ~256 MiB. The by_range / by_root ceilings are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens Caplin’s cl/sentinel/httpreqresp req/resp HTTP bridge against unbounded memory growth by capping how much response data is buffered/read from a libp2p stream (especially impactful for the automatic Status handshake path).
Changes:
- Add per-topic maximum response body sizing (
maxResponseBodySize) with a tight cap for single-chunk protocols and a larger cap for_by_range/_by_rootmulti-chunk protocols. - Apply the cap by wrapping the stream reader with
io.LimitReaderin the responseio.Copy. - Add tests validating the topic→ceiling mapping and that a flooding peer cannot cause buffering beyond
maxChunkSizefor a single-chunk (Status) response.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cl/sentinel/httpreqresp/server.go |
Introduces max response sizing logic and enforces it via io.LimitReader during response copy. |
cl/sentinel/httpreqresp/server_test.go |
Adds tests for the sizing logic and a libp2p-based flood regression test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
beacon_blocks_by_head is multi-chunk (up to MAX_REQUEST_BLOCKS_DENEB blocks), but its topic contains neither _by_range nor _by_root, so maxResponseBodySize classified it as a single-object response and would cap it at 1 MiB. Match _by_head alongside _by_range / _by_root so the ceiling reflects the protocol's real shape. Also fix a stale flood-test comment that still referenced the pre-tightening maxChunkSize cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review follow-ups on the response-body size cap: - Reuse clparams.MaxChunkSize (spec MAX_CHUNK_SIZE) instead of a duplicated 15 MiB literal so the two can't drift. - Make TestMaxResponseBodySize exhaustive over the protocol constants in communication/topics.go. The cap classifies by substring, so a new multi-chunk protocol whose topic lacks _by_range/_by_root/_by_head would be silently capped at the single-object ceiling and truncated; this test now fails if such a protocol is added without updating maxResponseBodySize. - Add TestMultiChunkResponseNotCappedAtSingleObject: a legitimate by_range response larger than the single-object cap must pass through the capped receive path uncapped. The handlers tests only exercise the serving side via direct streams, not httpreqresp's NewRequestHandler. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ount The multi-chunk response cap was a flat MAX_REQUEST_BLOCKS x MAX_CHUNK_SIZE (~15 GiB) for every by_range/by_root/by_head topic, regardless of how many items the request actually asked for. Plumb the caller's expected response chunk count through RequestData (new max_response_chunks field) and the Reqresp-Max-Response-Chunks header so the handler caps at expectedChunks x MAX_CHUNK_SIZE, clamped to the old ceiling. A request for N blocks is now bounded at N x MAX_CHUNK_SIZE instead of the full 15 GiB. The cap never truncates a compliant response: the responder returns at most the requested number of chunks and each chunk is at most MAX_CHUNK_SIZE, so the bound is an upper bound on the real response. An absent or zero count falls back to the old ceiling, so any path that doesn't set it (e.g. column-by-root via RequestDataWithPeer) is unchanged and safe. The read deadline is untouched (the count rides a separate header, not REQRESP-EXPECTED-CHUNKS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fetchPeerResponse now bounds victim.Connect and the request with a 30s timeout context, so a handler regression fails fast instead of hanging until the Go test timeout. - Write exactly payloadSize bytes (cap the final write to the remainder) so the helper is correct for sizes that aren't a multiple of 1 MiB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prior commit bounded multi-chunk responses by requested chunk count x MAX_CHUNK_SIZE. That stays tight for blocks (which can legitimately approach MAX_CHUNK_SIZE) but is ~100x too loose for blob/column sidecars — a blob sidecar is a fixed ~129 KiB, far below the 15 MiB per-chunk max — so a blob by_range request still permitted a multi-GB cap. Replace the plumbed chunk count with a per-protocol byte budget: - blocks / execution payloads: count x MAX_CHUNK_SIZE (items can be large) - blob sidecars: numSidecars x fixed BlobSidecar SSZ size - data-column sidecars: numSidecars x (MaxBlobsPerBlockUpperBound cells + slack), via a new clparams helper so the bound tracks the fork/BPO blob limit Each budget is wrapped in snappy.MaxEncodedLen + framing: the cap is on the on-wire (snappy-framed) bytes and blob/column payloads are incompressible, so a raw-size budget would truncate. RequestData.max_response_chunks becomes max_response_bytes; the field is also added to RequestDataWithPeer so column-by-root is bounded. A blob by_range flood now caps at tens-to-hundreds of MiB instead of multiple GiB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eiling maxResponseBodySize compared the caller's on-wire byte budget against a ceiling computed from MAX_CHUNK_SIZE (the uncompressed per-chunk limit), so a large but valid multi-chunk budget could be clamped below its intended on-wire value and truncate the response. The budget is computed by cl/rpc with snappy-aware sizing, is not peer-controllable and is already bounded, so use it as-is; the MAX_REQUEST_BLOCKS × MAX_CHUNK_SIZE ceiling now applies only as the fallback when no budget is provided. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for working on this. The Status/single-object flood case looks much safer now: the handler caps the read, returns 413 on overflow, and avoids setting I think the main remaining issue is that multi-chunk responses are still fully buffered. respBody, err := io.ReadAll(io.LimitReader(stream, limit+1))For multi-chunk protocols, I’d suggest changing this path to streaming decode instead of full-response buffering:
That would make Two smaller points:
httpReq, err := http.NewRequestWithContext(ctx, "GET", "http://service.internal/", bytes.NewBuffer(req.Data))
|
- Document the Reqresp-Max-Response-Bytes and REQRESP-FALLBACK-BODY request headers in Do's header list, which had drifted from the code. - Replace the dead non-captureWriter io.Copy fallback with an explicit 500. Do always supplies a captureWriter and the streaming hand-off can't carry a body through an arbitrary ResponseWriter, so a different writer is a wiring error, not a degraded-but-working path; the old fallback also called http.Error after the stream had started, which can't change a committed status. - Add TestDoClosesStreamingBodyOnContextCancel: when Do returns early on a cancelled request context, the streaming body the handler publishes afterwards owns a live stream, so Do must still drain and close it. Covers the otherwise unexercised cancel branch.
…cess only Two review follow-ups: - Do now recovers a panic from handler.ServeHTTP in the goroutine and publishes a 500, so a handler panic can't crash the process or leave Do blocked forever on a background-context request. The handler's deferred stream.Close still runs during the unwind, so no stream is leaked. Covered by TestDoRecoversHandlerPanic. - NewRequestHandler checks the captureWriter precondition right after reading the response code, so the CONTENT-ENCODING: snappy/stream and the other response headers are set only on the success path, not ahead of an error response.
|
Thanks @domiwei — I think you caught the revision just before the streaming refactor I pushed earlier today (89083db): the handler no longer does On the remaining points:
|
No behavior change: - streamBody embeds io.Reader/io.Closer instead of hand-written Read/Close forwarders. - Do publishes the response from a single site in the deferred func; the panic-recovery path swaps the writer to a 500 rather than duplicating the channel send.
The ctx-cancel branch only closes the late-published response body (which releases the libp2p stream); it does not drain it. Match the comment.
The flood/budget tests invoked NewRequestHandler directly, but production reaches it through a chi.Router (sentinel.go). Route the e2e helper through chi and add a focused guard so that middleware wrapping the ResponseWriter — which would 500 every req/resp via the *captureWriter type assertion — fails the suite instead of silently breaking peer validation.
|
I can't tell if the 413 is returned if the response body exceeds the limit but response header was already returned. There's 2 ways this can be detected by http: the response body doesn't match the advertised length, or the response body fails some other check specific to the content. The first is preferable, but requires you to know the response body length in advance, which can require two passes, or buffering. There is a helper to buffer with a max size in memory before switching to writing to a temporary file. It's probably overkill at this point, I'm satisfied with not buffering unnecessarily if people are satisfied the size limits are respected. |
|
False negative in MQ Sonar 😭 |
@anacrolix fix - #21632 |
… jar (erigontech#21632) ## Problem The sonar job pulls two artifacts from `scanner.sonarcloud.io` on every scan: a JRE ("JRE provisioning") and the scanner-engine jar. That CDN intermittently 403s GitHub-runner IPs, and the blocks outlive the spaced retry added in erigontech#21604: - [CI Gate merge-queue run](https://github.com/erigontech/erigon/actions/runs/26994431020/job/79661154435): scan and retry both hit `HTTP 403 Forbidden` on the JRE tarball, failing the gate and bouncing erigontech#21562 out of the merge queue — after the coverage suite had already passed. - [CI Gate run on this PR](https://github.com/erigontech/erigon/actions/runs/27003084716/job/79687976307): with JRE provisioning eliminated, scan and retry both 403'd on the engine jar instead. The 403s are IP-scoped blocking, not artifact availability: the exact jar URL that failed serves 200 from outside the runners (published Jun 1, still on the CDN), and `api.sonarcloud.io` answered fine in the same failing run — only the `scanner.sonarcloud.io` host blocks, and for longer than the 90s retry spacing, so a same-runner retry cannot ride it out. ## Fix Remove both per-scan dependencies on that host. **JRE**: skip provisioning and point the scanner at the JDK already baked into the runner image, via the scanner's documented switches: - `SONAR_SCANNER_SKIP_JRE_PROVISIONING=true` - `SONAR_SCANNER_JAVA_EXE_PATH=$JAVA_HOME_21_X64/bin/java` The ubuntu-24.04 image ships Temurin 21 — the same major version Sonar provisions (the failing artifact was `OpenJDK21U-jre_...21.0.9`). The env vars go through `$GITHUB_ENV`, so both the scan and the retry step inherit them. `cleanup-space` in setup-erigon does not remove the preinstalled JDKs. **Engine jar**: seed it into the actions cache from cache-warming push runs; PR and merge-queue scans restore it. The seed step queries `api.sonarcloud.io/analysis/engine` (the host that stays reachable; returns `{filename, sha256, downloadUrl}`), downloads the jar with retries, sha256-verifies it, and saves it in the scanner's content-addressed download-cache layout — `~/.sonar/cache/<sha256>/<filename>`, cache key `sonar-scanner-engine|<sha256>`. At scan time the bootstrapper asks the API for the prescribed sha and, finding it in the local cache, never contacts the CDN. Verified end-to-end with scanner CLI 8.1.0.6389 against a cache seeded exactly as the workflow does it: the debug log shows the metadata call, zero requests to `scanner.sonarcloud.io`, and the engine launched straight from the cached jar. Cache-warming runs on every push to main/release, and SonarCloud rotates engines every few days (12.37 published Jun 1, 12.38 on Jun 5), so seeds refresh within hours of a rotation. ## Failure modes considered - Runner image drops Temurin 21: the `[ -x ... ]` guard leaves the env vars unset and the scanner falls back to downloading, i.e. current behavior. - SonarCloud raises its minimum JRE above 21: the scan fails deterministically with a version error (historically preceded by months of deprecation warnings in the scan log); fix is bumping the env var to `JAVA_HOME_25_X64`, which the image already ships. - Engine cache miss (version rotated since the last base-branch push, or cache evicted): the scanner falls back to the direct download plus the existing retry — today's behavior, never worse. - Seeding fails (the CDN 403s the cache-warming runner too, or the bootstrap API contract changes): the lookup and download steps are `continue-on-error`, no cache is saved, and the next push to the branch retries; scans fall back as above. The scanner CLI zip and GPG key still come per run from `binaries.sonarsource.com` and the keyserver; those hosts have not been the ones failing, and the existing retry covers them. Note: the first engine seed only materializes once this merges (cache-warming triggers on push to main), so this PR's own sonar runs still use the fallback download path.
What
Caplin's req/resp handler (
cl/sentinel/httpreqresp) buffered a peer's entire response into memory via an uncappedio.Copy, bounded only by a 10s read deadline. A peer that floods the response stream could drive the victim's heap to many GiB. The cheapest trigger is unauthenticated and automatic: on every inbound connection the victim sends a Status request back to the peer (discovery.onConnection→handshake.ValidatePeer), built withhttp.NewRequest(background context), so the handler runs the full copy with no caller-side timeout.Fix
Stream the response through a per-topic size cap instead of buffering it all up front (the reply never needs Content-Length): a compliant response passes through untouched, and one that exceeds its cap surfaces an explicit
ErrResponseTooLargeread error rather than being silently truncated and accepted. The cap is an erroringmaxBytesReader(net/http.MaxBytesReaderwithout the ResponseWriter coupling), notio.LimitReader, so a valid prefix followed by junk is rejected, not accepted. Sizes per topic:by_range/by_root/by_head: blocks, blobs, data columns, exec payloads) → a per-request byte budget =chunkCount × max-on-wire-size-of-one-chunk, computed by the typedcl/rpcbuilders (which know both the requested item count and the per-protocol chunk size) and plumbed through a newmax_response_bytesfield onRequestData/RequestDataWithPeerand theReqresp-Max-Response-Bytesheader. The handler clamps that budget to an absolute on-wireMAX_REQUEST_BLOCKS × MAX_CHUNK_SIZEceiling, so a miscomputed or overflowed budget can't make it stream more than a spec-maximal response. A multi-chunk request that arrives without a budget falls back to the tight single-object cap (not the ceiling), so a forgotten budget fails loud withErrResponseTooLargerather than risking an OOM.Per-chunk sizing: blocks and execution payloads can fill a chunk; a blob sidecar is a fixed ~129 KiB; a data-column sidecar scales with the fork's max blobs (
clparams.MaxBlobsPerBlockUpperBound, which covers the BPO schedule). Because the cap is on on-wire (snappy-framed) bytes and blob/column payloads are incompressible, each budget is wrapped insnappy.MaxEncodedLen + framing(saturating toMaxUint64rather than overflowing) so it never truncates a compliant response.Protocol classification and the wire-size helper live in
cl/sentinel/communication(AllProtocols,MaxWireResponseBytes) as a single source of truth shared by thecl/rpcbudget and thehttpreqrespbackstop. A unit test cross-checks the registry against theby_range/_by_root/_by_headnaming convention, so a mis-classified protocol fails the build rather than silently getting the wrong cap.Because the response is streamed, an over-cap flood is no longer a
413(by the time the overflow is seen, the200status and the peer's response code have already been committed) but anErrResponseTooLargefrom the body read.handshake.ValidatePeeralready fails closed on any body read/decode error;service.requestPeerdrops the peer on it, preserving the peer removal the413path used to give.The
Dobridge is also hardened: a handler panic is recovered and returned as a500instead of crashing the process, and a streamed body is closed even if the request context is cancelled before the handler publishes it, so the stream isn't leaked.The read deadline (and sync timing) is unchanged — the budget rides its own header. A single connection can no longer OOM the node: the worst-case 128-peer Status flood drops from unbounded to ~256 MiB, and a blob
by_rangeflood from multiple GiB to tens-to-hundreds of MiB.Test
Flood/cap tests drive the real
NewRequestHandlerreceive path over two libp2p hosts:TestResponseBodyRejectedOnFlood— a compliant Status response streams through in full (200); a 40 MiB Status flood is bounded at the cap and surfacesErrResponseTooLarge, not truncated and accepted.TestMultiChunkResponseCappedWithoutBudget— an unbudgeted 5 MiBby_rangeresponse is bounded at the single-object cap (not the ceiling) and surfacesErrResponseTooLarge.TestMultiChunkResponseRejectedOverByteBudget— a response within the 8 MiB budget passes through in full (200); a 40 MiB flood over it surfacesErrResponseTooLarge.TestMaxBytesReader— the capping reader streams bytes up to its limit unchanged, then returnsErrResponseTooLarge(rather than truncating at EOF) for anything over it.TestDoCopiesHandlerWriteBuffer—Docopies the handler's write buffer rather than retaining it, so a caller that reuses its buffer (e.g.http.Errorviafmt) can't corrupt the response.TestDoClosesStreamingBodyOnContextCancel— ifDoreturns early on a cancelled request context, the streaming body the handler publishes afterwards is still closed, so the libp2p stream isn't leaked.TestDoRecoversHandlerPanic— a handler panic is recovered and surfaced as a500rather than crashing the process.TestMaxResponseBodySize— iteratescommunication.AllProtocols: single-object protocols and unbudgeted multi-chunk requests get the tight cap; a budget below the ceiling is honored, above it is clamped.communication.TestResponseShapeConvention— cross-checks every protocol's multi-chunk flag against theby_range/_by_root/_by_headnaming convention, so a new mis-classified protocol fails the build.communication.TestMaxWireResponseBytes— the on-wire budget saturates toMaxUint64on overflow instead of wrapping to a smaller (truncating) value.communication.TestMaxWireResponseBytesBoundsStreamEncoding— the budget upper-bounds the real snappy-stream framed size for items up toMAX_CHUNK_SIZE.communication.TestIsMultiChunkProtocol— known protocols classify correctly; unknown ones fail closed to single-object.clparams.TestMaxBlobsPerBlockUpperBound— the fork/BPO max-blobs helper.Fixes ethereum-bounty/erigon#13