Enhance warmup service with lifetime limit and error handling#58
Merged
EliteScouter merged 1 commit intoEliteScouter:mainfrom Mar 30, 2026
Merged
Enhance warmup service with lifetime limit and error handling#58EliteScouter merged 1 commit intoEliteScouter:mainfrom
EliteScouter merged 1 commit intoEliteScouter:mainfrom
Conversation
Added a maximum warmup lifetime constant and improved error handling in the polling mechanism to prevent task cancellation due to uncaught exceptions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: WarmupService poller silently dies from uncaught exception, permanently breaking all teleports
Summary
A single uncaught exception in
WarmupService.pollWarmups()permanently kills the warmup poller thread, causing all warmup-based teleports (/home,/warp,/tpa,/spawn,/back,/rtp) to silently stop working for every player on the server until restart. Admin teleports with 0 warmup are unaffected, masking the issue.Bug Description
WarmupServiceusesScheduledExecutorService.scheduleAtFixedRate()to poll pending warmups every 100ms. This is a well-documented Java behavior: if the scheduled Runnable throws any uncaught exception, the executor permanently and silently cancels the recurring task. No error is logged by the executor itself — the task simply stops running.pollWarmups()currently has no try/catch protection. Ifworld.execute()throws for any warmup entry (e.g., the world was destroyed between the null check and the execute call during dungeon instance teardown), the exception propagates out ofpollWarmups(), and the executor kills the task. The warmup system is now dead.To make it worse,
ensurePollerRunning()only checkspollTask.isCancelled()to decide whether to restart the poller. When a scheduled task dies from an exception,isCancelled()returnsfalse— it wasn't cancelled, it crashed.isDone()would returntrue, but nobody checks it. So even when new players triggerstartWarmup(), the method thinks the poller is still running and doesn't restart it.The result: a single exception permanently bricks the entire teleport warmup system for all players until server restart.
What Players Experience
startWarmup()still adds entries to thependingmap. ButtickWarmup()never runs because the poller is dead, so the teleport never completes. The entry stays inpendingforever.hasActiveWarmup()before starting (/home,/warp,/back,/tpa,/rtp) reject the player with "You already have a teleport in progress!" because the stale entry from the first attempt is still in the map./spawn: Doesn't checkhasActiveWarmup(), so it replaces the pending entry each time — but the new warmup also never completes because the poller is still dead.startWarmup()viaworld.execute(onComplete)without going through the poller at all. This makes the bug appear player-specific when it's actually global.pendingmap state is on the server.Evidence From Production
This bug was observed on a 50+ player Hytale server (build 2026.03.26) running EliteEssentials 2.0.1. The server had heavy dungeon instance churn (instances being created and destroyed every 30-60 seconds) with frequent cross-world player transfers.
At approximately 16:25 UTC, all player teleports stopped working simultaneously. Server logs show:
World.consumeTaskQueuethrowingIllegalStateException: Window id 1 is invalid!— the world task queue was actively failing tasks during this periodworld.execute()calls racing with world destructionNo
[Warmup]error was logged because the exception was swallowed silently byScheduledExecutorService.The Fix
Three changes, all in
WarmupService.java:1.
pollWarmups()— Two layers of try/catchAn inner per-warmup try/catch ensures one bad warmup (e.g., a destroyed world) can't kill processing for other players. The failed entry is removed and logged. An outer try/catch acts as an absolute last resort so the poller can never die.
2.
ensurePollerRunning()— CheckisDone()in addition toisCancelled()This is a defense-in-depth measure. With the try/catch fix, the poller should never die. But if it somehow does, the next
startWarmup()call will now correctly detect the dead task and restart it.3. Stale warmup cleanup — 60-second maximum lifetime
A
MAX_WARMUP_LIFETIME_NANOSconstant (60 seconds). During each poll cycle, any warmup entry older than 60 seconds is force-removed with a warning log. No legitimate warmup should ever take 60 seconds. This catches edge cases whereworld.execute()silently drops a task without throwing, leaving the entry stuck inpendingforever.A
createdAtNanosfield is added toPendingWarmupto support this check.Impact
WarmupService.javaonly.How to Reproduce
The bug requires
world.execute()to throw insidepollWarmups(). The most reliable trigger is heavy dungeon instance churn with cross-world player transfers — a world being destroyed between theworld == nullcheck and theworld.execute()call. On a busy server with 40+ players running instances, this is a matter of when, not if.