Skip to content

[server] Throttle auto-partition drop to protect coordinator queue#3174

Open
swuferhong wants to merge 1 commit into
apache:mainfrom
swuferhong:remove-drop-part-jett
Open

[server] Throttle auto-partition drop to protect coordinator queue#3174
swuferhong wants to merge 1 commit into
apache:mainfrom
swuferhong:remove-drop-part-jett

Conversation

@swuferhong
Copy link
Copy Markdown
Contributor

@swuferhong swuferhong commented Apr 23, 2026

Purpose

Linked issue: close #3173

Two related changes on the auto-partition path:

  1. Fix: dropPartitions previously used createPartitionInstant, which included the random jitter that is only meant to spread partition creation load. As a result, expired partitions were not always cleaned up on time. Use the actual current time for drops so retention is honored promptly.

  2. Throttle: removing the jitter from drops means every auto-partition table rotates its day partition at the same instant (typically midnight). One drop fans out into numBuckets * replicationFactor bucket-deletion events on the coordinator event queue, so a simultaneous burst across many tables floods the queue and starves normal coordinator work (leader election, metadata updates).

Add an adaptive throttle that decides per round how aggressively to drop, jointly based on coordinator queue pressure and pending drop volume:

  • Queue-aware backpressure: skip drops when the coordinator event queue size crosses a configurable threshold, retry next round.
  • Per-round bucket-deletion budget shared across all auto-partition tables. Accounting in buckets (not partitions) gives uniform protection regardless of each table's bucket count.
  • Starvation guard: tables whose single-partition bucket count exceeds the remaining budget can still drop one partition per round, so very large tables cannot be permanently blocked.

Pre-creation of new partitions is unaffected. Leftover expired partitions are picked up in the next check interval, which preserves timely cleanup without bursting the coordinator queue.

Brief change log

Tests

API and Format

Documentation

@swuferhong swuferhong force-pushed the remove-drop-part-jett branch from 8369aee to af71f36 Compare May 20, 2026 11:59
@swuferhong swuferhong changed the title [server] Fix auto partition drop delayed by DAY creation jitter [server] Throttle auto-partition drop to protect coordinator queue May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] Auto partition retention cleanup delayed by DAY partition creation jitter

1 participant