fix(go): back off before re-polling failed batch#284
Draft
NikolayS wants to merge 1 commit into
Draft
Conversation
When a batch was received but finalization failed (a Nack failed, or Ack returned an error), the consumer loop re-polled immediately. The unfinished batch is redelivered by pgque.next_batch at once, so a persistent nack/ack failure (e.g. partial grants) produced a tight loop re-running every handler at full speed, and even one transient ack failure re-executed the whole batch with zero delay. Sleep pollInterval (respecting ctx cancellation) before re-polling on both the nack-failure and ack-error paths. The Ack n==0 stale/double ack case stays warning-only with no sleep. Verified red/green: the new tests counted 52k/236k Receive calls in 400 ms on the unfixed code; with the fix the count stays within the pollInterval bound. https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
In
clients/go/consumer.go, the consumer loop sleptpollIntervalonly on receive error or empty batch. When a batch WAS received but finalization failed —nackFailedtrue (loop didcontinuewith no sleep) orAckreturned an error (fell through to the loop top with no sleep) — the loop re-polled immediately. Because the batch was never finished,pgque.next_batchreturns the SAME batch instantly, so a persistent nack/ack failure (e.g. partial grants where receive works but nack/ack don't) produced a tight loop at full speed: re-receive same batch, re-run all handlers (duplicate side effects), fail again, repeat. Even one transient ack failure re-executed the whole batch's handlers with zero delay.Fix
Sleep
pollInterval(respecting ctx cancellation, sameselect { case <-ctx.Done() ... case <-time.After(...) }pattern used in the receive-error and empty paths) before re-polling on both the nack-failure path and the Ack-error path. TheAckn==0 stale/double-ack case stays warning-only and gains no sleep.Red/green TDD: added
TestConsumer_NackFailure_BacksOffBeforeRepollandTestConsumer_AckFailure_BacksOffBeforeRepollwith a stub backend (redeliverStubBackend) whoseReceivealways returns the same batch and whoseNack/Ackfail; the tests bound the number ofReceivecalls in a 400 ms window with a 50 ms poll interval.Verification
Red (on unfixed code,
go test -run BacksOffBeforeRepoll ./...inclients/go):Green (with fix, in
clients/go):Addresses finding B1 (Go) of #283
https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
Generated by Claude Code