Skip to content

refactor(inkless): Split InitDisklessLogManager logic into separate components#561

Merged
jeqo merged 3 commits intomainfrom
giuseppelillo/refactor-init-diskless-log-manager
Apr 8, 2026
Merged

refactor(inkless): Split InitDisklessLogManager logic into separate components#561
jeqo merged 3 commits intomainfrom
giuseppelillo/refactor-init-diskless-log-manager

Conversation

@giuseppelillo
Copy link
Copy Markdown
Contributor

@giuseppelillo giuseppelillo commented Apr 1, 2026

Extract logic related to the states and state machine into InitDisklessLogState, and the logic related to request batching and sending into InitDisklessLogBatchQueue.

@giuseppelillo giuseppelillo force-pushed the giuseppelillo/refactor-init-diskless-log-manager branch from 00dc642 to 6765d64 Compare April 3, 2026 09:42
@giuseppelillo giuseppelillo changed the title refactor(inkless): Split InitDisklessLogManager logic into its own state classes refactor(inkless): Split InitDisklessLogManager logic into state classes and batch queue class Apr 3, 2026
@giuseppelillo giuseppelillo force-pushed the giuseppelillo/refactor-init-diskless-log-manager branch 2 times, most recently from b557015 to 7b6fd8a Compare April 3, 2026 13:26
@giuseppelillo giuseppelillo changed the title refactor(inkless): Split InitDisklessLogManager logic into state classes and batch queue class refactor(inkless): Split InitDisklessLogManager logic into separate components Apr 3, 2026
@giuseppelillo giuseppelillo requested a review from Copilot April 3, 2026 13:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the diskless log initialization flow by extracting the per-partition state machine into InitDisklessLogState and extracting batched request scheduling/sending/retry into InitDisklessLogBatchQueue, with corresponding updates to unit/integration-style tests.

Changes:

  • Introduces a sealed state model (WaitingForReplication, SendingToController, AwaitingMetadata) and a protocol for sending SendingToController batches to the controller.
  • Adds a generic retriable batch queue to coalesce work (linger) and retry failures with capped exponential delay.
  • Updates tests to assert against state types rather than the removed InitState enum.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/src/main/scala/kafka/server/InitDisklessLogManager.scala Switches manager to track InitDisklessLogState objects and delegates batching/retries to the new queue.
core/src/main/scala/kafka/server/InitDisklessLogState.scala Adds the extracted state machine and the controller batch protocol for SendingToController.
core/src/main/scala/kafka/server/InitDisklessLogBatchQueue.scala Adds a generic batch queue with linger + retry scheduling and per-partition result futures.
core/src/test/scala/unit/kafka/server/metadata/InitDisklessLogFlowTest.scala Updates assertions to validate tracked state by class/type.
core/src/test/scala/unit/kafka/server/InitDisklessLogManagerTest.scala Updates tests for new state model and adds additional retry/backoff-related test coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/main/scala/kafka/server/InitDisklessLogState.scala Outdated
Comment thread core/src/main/scala/kafka/server/InitDisklessLogState.scala
Comment thread core/src/main/scala/kafka/server/InitDisklessLogBatchQueue.scala Outdated
Comment thread core/src/main/scala/kafka/server/InitDisklessLogBatchQueue.scala Outdated
Comment thread core/src/main/scala/kafka/server/InitDisklessLogManager.scala
…omponents

Extract logic related to the states and state machine into InitDisklessLogState,
and the logic related to request batching and sending into
InitDisklessLogBatchQueue.
@giuseppelillo giuseppelillo force-pushed the giuseppelillo/refactor-init-diskless-log-manager branch from 7b6fd8a to f68b6e9 Compare April 3, 2026 13:51
@giuseppelillo giuseppelillo marked this pull request as ready for review April 3, 2026 14:01
Comment thread core/src/main/scala/kafka/server/InitDisklessLogManager.scala
Comment thread core/src/main/scala/kafka/server/InitDisklessLogBatchQueue.scala
viktorsomogyi
viktorsomogyi previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the refactoring looks good to me. Though I think we can reconsider the generic batching framework and go further on the PartitionListener-related changes.

def validate(): Unit = {
if (!partition.isSealed) {
error(s"Partition is not sealed, which should never happen. Skipping migration.")
throw new IllegalArgumentException()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an error on case class constructor is not a common pattern on the kafka codebase, maybe better to leave this validation on the manager?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the validation in the advance state logic, so it goes into the Failed state instead of being a separate check.

Comment thread core/src/main/scala/kafka/server/InitDisklessLogState.scala
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should consider dropping this generic batch queue framework. It introduces some behavioral changes and doesn't have much reuse on the upcoming PR #563 as most features are by-pass (no batching, no parsing, sync calls, etc).
It also diverges from AlterPartitionManager patterns that already copes with similar requirements and it's already followed in the current implementation -- increasing cognitive load.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the batch queue code to be less complex now. It's just an abstract class that can be extended. I still think there's value in having an abstract class that implements most of the machinery (batching, retries, backoff), because it's really code that would be duplicated in the implementation of the InitDisklessLog call for the control plane. #563 was not updated yet to have batching but now it has. It also needs the same mechanism for retries and backoff calculation. Let me know what you think about this new abstraction.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for this approach for now. We can reassess if needed after the next PR lands.

@jeqo jeqo merged commit 68f438a into main Apr 8, 2026
8 checks passed
@jeqo jeqo deleted the giuseppelillo/refactor-init-diskless-log-manager branch April 8, 2026 14:59
jeqo pushed a commit that referenced this pull request Apr 14, 2026
…omponents (#561)

Extract logic related to the states and state machine into InitDisklessLogState,
and the logic related to request batching and sending into
InitDisklessLogBatchQueue.
jeqo pushed a commit that referenced this pull request Apr 14, 2026
…omponents (#561)

Extract logic related to the states and state machine into InitDisklessLogState,
and the logic related to request batching and sending into
InitDisklessLogBatchQueue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants