fix: build request body up-front so http2 retry works on GOAWAY#275
Open
olegkrutikov wants to merge 1 commit into
Open
fix: build request body up-front so http2 retry works on GOAWAY#275olegkrutikov wants to merge 1 commit into
olegkrutikov wants to merge 1 commit into
Conversation
rawRequest streamed the multipart body through an io.Pipe, which is not replayable. http.NewRequestWithContext therefore couldn't derive Request.GetBody, and http2.Transport had no way to retry a POST when Telegram's server sent a GOAWAY frame mid-flight (a routine part of HTTP/2 connection draining). Every POST issued on a draining connection failed with: http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error Long-poll getUpdates (a GET, no body) was unaffected, so a bot would keep receiving updates while silently failing every sendMessage / editMessageText / answerCallback for the duration of the bad connection. In one production deployment this manifested as the bot going completely mute for ~36 hours after Telegram rotated a server behind it. Build the body into a bytes.Buffer and pass bytes.NewReader to NewRequestWithContext; net/http then auto-populates GetBody and ContentLength, and http2.Transport retries transparently on GOAWAY. Trade-off: the entire request body is held in memory until the request is sent. Most methods are a few KB; file uploads are bounded by Telegram's per-request limit (~50 MB) and only held transiently. Tests: Test_rawRequest_setsGetBody asserts GetBody is non-nil, ContentLength is correct, and two GetBody calls each return bytes identical to the original body.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
rawRequeststreams the multipart request body through anio.Pipe. Pipe readers aren't replayable, sohttp.NewRequestWithContextcannot deriveRequest.GetBody, andhttp2.Transporttherefore has no way to retry a POST when Telegram's server sends a GOAWAY frame mid-flight.GOAWAY is a routine part of HTTP/2 connection draining — Telegram emits it during normal load-balancer reassignment / server rotation. Whenever it arrives between the time
client.Dowrites the request line and headers and the time it would receive the response, every in-flight POST on that connection fails with:Long-poll
getUpdatesis a GET (no body), so it isn't affected — net/http retries it transparently. The result is a bot that keeps receiving updates but silently fails everysendMessage/editMessageText/answerCallbackQueryfor as long as the bad connection is reused. In one production deployment this manifested as the bot going completely mute for ~36 hours after Telegram rotated a server behind it; the journal showed the GOAWAY error fanning out across every outbound call simultaneously whilegetUpdatescontinued uninterrupted.This affects every user of the library who reaches Telegram over HTTP/2 (i.e. nearly everyone — Go's default transport negotiates h2 via ALPN). The current workaround is to disable HTTP/2 client-side by setting
Transport.TLSNextPrototo an empty map, which forces HTTP/1.1 keep-alive and avoids the GOAWAY-retry trap entirely.Fix
Build the multipart body into a
bytes.Bufferand passbytes.NewReader(buf.Bytes())toNewRequestWithContext.net/httpthen auto-populatesRequest.GetBodyandRequest.ContentLengthfor*bytes.Reader, andhttp2.Transportretries transparently on GOAWAY.The goroutine that wrote into the pipe is gone, and so is the error-handling that closed it on partial failures. Both became unnecessary once the body is materialised before
client.Dois called.Trade-off
The entire request body is held in memory until the request is sent. For the vast majority of methods (
sendMessage,editMessageText, callback answers, …) the body is a few KB. For file uploads the body is bounded by Telegram's bot-API per-request limit (~50 MB) and only held transiently — comparable to the OS socket buffer the streaming version would queue anyway, and small enough to be a non-issue on any modern host. If a future user needs zero-copy streaming for high-volume large uploads, the seam to reintroduce a streaming path is clear (branch on whetherparamscontains an upload), but I'd argue the current behaviour was a strict bug for everyone else and shouldn't be preserved by default.Test
Test_rawRequest_setsGetBodybuilds asendMessagerequest, captures the resulting*http.Request, and asserts:req.GetBody != nil(precondition for h2 retry).req.ContentLength > 0and matches the body length.req.GetBody()calls each return bytes identical toreq.Body.name="chat_id",name="text", the literal text value).That covers the precise contract
http2.Transportrelies on; the actual retry-on-GOAWAY logic is exercised by net/http's own tests and doesn't need to be re-tested here.go test ./...,go test -race, andgo vet ./...all clean.Reproduction
Run for long enough that Telegram cycles a connection (typically minutes to hours depending on which datacenter you hit). On
mainyou'll eventually see thecannot retry err [http2: ... GOAWAY]error and every subsequent send on that connection fails. With this patch,client.Doretries silently and the loop keeps going.