Skip to content

fix(messages): prevent infinite re-read loop when polling range is empty#1685

Closed
Mohoki wants to merge 3 commits intokafbat:mainfrom
Mohoki:issues/1684
Closed

fix(messages): prevent infinite re-read loop when polling range is empty#1685
Mohoki wants to merge 3 commits intokafbat:mainfrom
Mohoki:issues/1684

Conversation

@Mohoki
Copy link
Copy Markdown

@Mohoki Mohoki commented Feb 10, 2026

  • Breaking change? (if so, please describe the impact and migration path for existing application instances)

What changes did you make? (Give an overview)

When messagesPerPage (limit) was 0, msgsToPollPerPartition became 0, producing an empty range (from, from). nextPollingRange then returned the same range, so the consumer never advanced and re-read the same data until timeout, causing hundreds of millions of messages consumed and zero results (No messages found).

Changes:

  • ForwardEmitter/BackwardEmitter: ensure msgsToPollPerPartition is at least 1 via Math.max(1, ...) so the range is never empty.
  • MessagesService: when loading by cursor, pass fixPageSize(cursor.limit()) so a cursor with limit 0 is normalized to defaultPageSize and never reaches the emitter.

Tests:

  • RecordEmitterTest: add forwardAndBackwardEmitterWithZeroMessagesPerPageCompleteWithoutHanging
  • CursorTest: add forwardEmitterWithZeroLimitCompletesWithoutHanging and backwardEmitterWithZeroLimitCompletesWithoutHanging

Is there anything you'd like reviewers to focus on?

How Has This Been Tested? (put an "x" (case-sensitive!) next to an item)

  • No need to
  • Manually (please, describe, if necessary)
  • Unit checks
  • Integration checks
  • Covered by existing automation

Checklist (put an "x" (case-sensitive!) next to all the items, otherwise the build will fail)

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. ENVIRONMENT VARIABLES)
  • My changes generate no new warnings (e.g. Sonar is happy)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Check out Contributing and Code of Conduct

A picture of a cute animal (not mandatory but encouraged)

Resolves: #1684

When messagesPerPage (limit) was 0, msgsToPollPerPartition became 0,
producing an empty range (from, from). nextPollingRange then returned
the same range, so the consumer never advanced and re-read the same
data until timeout, causing hundreds of millions of messages consumed
and zero results (No messages found).

Changes:
- ForwardEmitter/BackwardEmitter: ensure msgsToPollPerPartition is at
  least 1 via Math.max(1, ...) so the range is never empty.
- MessagesService: when loading by cursor, pass fixPageSize(cursor.limit())
  so a cursor with limit 0 is normalized to defaultPageSize and never
  reaches the emitter.

Tests:
- RecordEmitterTest: add forwardAndBackwardEmitterWithZeroMessagesPerPageCompleteWithoutHanging
- CursorTest: add forwardEmitterWithZeroLimitCompletesWithoutHanging and
  backwardEmitterWithZeroLimitCompletesWithoutHanging
@Mohoki Mohoki requested a review from a team as a code owner February 10, 2026 12:54
@kapybro kapybro Bot added status/triage Issues pending maintainers triage status/triage/manual Manual triage in progress status/triage/completed Automatic triage completed and removed status/triage Issues pending maintainers triage labels Feb 10, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mohoki! 👋

Welcome, and thank you for opening your first PR in the repo!

Please wait for triaging by our maintainers.

Please take a look at our contributing guide.

@Haarolean
Copy link
Copy Markdown
Member

When messagesPerPage (limit) was 0

please elaborate on how can this happen

@Mohoki
Copy link
Copy Markdown
Author

Mohoki commented Feb 11, 2026

Good point.

limit=0 shouldn’t happen in the normal first-page flow (there we already run fixPageSize(...)), but in the cursor flow we previously used cursor.limit() as-is. So if an old/invalid cursor had 0, it went straight to emitters and triggered the zero-step behavior.

That’s why we fixed both:

  • normalize limit in cursor path too;
  • force emitter step to be at least 1.

At the same time, from newer logs we see another issue: offsets were moving and polling finished, so that case was not the limit=0 loop. There’s a separate range-polling efficiency problem causing huge consumed data.

So this PR fixes a real edge case, but it’s not the full root-cause fix for the heavy/slow reads. Better not merge as “final fix” yet.

@Mohoki Mohoki requested a review from a team as a code owner February 11, 2026 11:49
@Haarolean
Copy link
Copy Markdown
Member

can you provide steps to reproduce this issue? let's start with that.

@Mohoki
Copy link
Copy Markdown
Author

Mohoki commented Feb 11, 2026

Hi @Haarolean, repro steps for the original issue:

Create a topic with many partitions and a large backlog (e.g. ~30 partitions, ~200k+ records total)
Open topic messages in UI
Apply a filter that not matches any record (or just one so UI keeps searching)
Trigger search and watch backend logs/consuming stats (messagesConsumed, bytesConsumed)

@Mohoki Mohoki marked this pull request as draft February 11, 2026 13:25
- Added maxMessagesToScanPerPoll property to ClustersProperties and PollingSettings.
- Updated BackwardEmitter and ForwardEmitter to use nextChunkSizePerPartition for calculating messages to poll.
- Enhanced ConsumerGroupService to create consumers with a configurable poll limit.
- Updated MessagesService to pass the limit when creating consumers.
- Added tests for PollingSettings to ensure correct behavior with maxMessagesToScanPerPoll.
- Updated API documentation and contract specifications to reflect the new configuration.

This change improves the flexibility of message polling by allowing configuration of the maximum number of messages to scan per poll.
- Added methods to AbstractEmitter, MessagesProcessing, and ConsumingStats to support sending consuming stats for in-range records.
- Updated RangePollingEmitter to utilize new methods for reporting in-range-only stats during polling.
- Introduced a test case in RecordEmitterTest to verify that consuming stats reflect only in-range records.

This change improves the accuracy of consuming statistics reported by range emitters, ensuring they only account for records within the specified range.
@Haarolean Haarolean closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/triage/completed Automatic triage completed status/triage/manual Manual triage in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Huge consumed bytes/messages

2 participants