Skip to content

Enhance warmup service with lifetime limit and error handling#58

Merged
EliteScouter merged 1 commit intoEliteScouter:mainfrom
Dimotai:patch-1
Mar 30, 2026
Merged

Enhance warmup service with lifetime limit and error handling#58
EliteScouter merged 1 commit intoEliteScouter:mainfrom
Dimotai:patch-1

Conversation

@Dimotai
Copy link
Copy Markdown
Contributor

@Dimotai Dimotai commented Mar 29, 2026

Fix: WarmupService poller silently dies from uncaught exception, permanently breaking all teleports

Summary

A single uncaught exception in WarmupService.pollWarmups() permanently kills the warmup poller thread, causing all warmup-based teleports (/home, /warp, /tpa, /spawn, /back, /rtp) to silently stop working for every player on the server until restart. Admin teleports with 0 warmup are unaffected, masking the issue.

Bug Description

WarmupService uses ScheduledExecutorService.scheduleAtFixedRate() to poll pending warmups every 100ms. This is a well-documented Java behavior: if the scheduled Runnable throws any uncaught exception, the executor permanently and silently cancels the recurring task. No error is logged by the executor itself — the task simply stops running.

pollWarmups() currently has no try/catch protection. If world.execute() throws for any warmup entry (e.g., the world was destroyed between the null check and the execute call during dungeon instance teardown), the exception propagates out of pollWarmups(), and the executor kills the task. The warmup system is now dead.

To make it worse, ensurePollerRunning() only checks pollTask.isCancelled() to decide whether to restart the poller. When a scheduled task dies from an exception, isCancelled() returns false — it wasn't cancelled, it crashed. isDone() would return true, but nobody checks it. So even when new players trigger startWarmup(), the method thinks the poller is still running and doesn't restart it.

The result: a single exception permanently bricks the entire teleport warmup system for all players until server restart.

What Players Experience

  • Players with warmup > 0: They see the warmup countdown message ("Teleporting in 5 seconds... stand still!") because startWarmup() still adds entries to the pending map. But tickWarmup() never runs because the poller is dead, so the teleport never completes. The entry stays in pending forever.
  • Subsequent attempts: Commands that check hasActiveWarmup() before starting (/home, /warp, /back, /tpa, /rtp) reject the player with "You already have a teleport in progress!" because the stale entry from the first attempt is still in the map.
  • /spawn: Doesn't check hasActiveWarmup(), so it replaces the pending entry each time — but the new warmup also never completes because the poller is still dead.
  • Admins with warmup bypass (warmup = 0): Unaffected. Zero-warmup teleports execute immediately in startWarmup() via world.execute(onComplete) without going through the poller at all. This makes the bug appear player-specific when it's actually global.
  • Restarting the game client: Does not help. The bug is server-side — the poller thread is dead and the pending map state is on the server.

Evidence From Production

This bug was observed on a 50+ player Hytale server (build 2026.03.26) running EliteEssentials 2.0.1. The server had heavy dungeon instance churn (instances being created and destroyed every 30-60 seconds) with frequent cross-world player transfers.

At approximately 16:25 UTC, all player teleports stopped working simultaneously. Server logs show:

  1. World.consumeTaskQueue throwing IllegalStateException: Window id 1 is invalid! — the world task queue was actively failing tasks during this period
  2. Frequent cross-world thread mismatches during dungeon instance transfers — world.execute() calls racing with world destruction
  3. Multiple players reporting the issue simultaneously — confirming it's a global system failure, not per-player
  4. Admin with instant-tp perms could teleport fine — confirming the warmup path specifically is broken
  5. Player reports matching exactly: "it says 'teleporting to spawn in 5 seconds... stand still!' and thats it" (warmup starts, never completes) and "i cant teleport to spawn lol i have you already have a teleport in proGress" (stale pending entry blocking)

No [Warmup] error was logged because the exception was swallowed silently by ScheduledExecutorService.

The Fix

Three changes, all in WarmupService.java:

1. pollWarmups() — Two layers of try/catch

An inner per-warmup try/catch ensures one bad warmup (e.g., a destroyed world) can't kill processing for other players. The failed entry is removed and logged. An outer try/catch acts as an absolute last resort so the poller can never die.

2. ensurePollerRunning() — Check isDone() in addition to isCancelled()

// Before (broken):
if (pollTask != null && !pollTask.isCancelled()) { return; }

// After (fixed):
if (pollTask != null && !pollTask.isCancelled() && !pollTask.isDone()) { return; }

This is a defense-in-depth measure. With the try/catch fix, the poller should never die. But if it somehow does, the next startWarmup() call will now correctly detect the dead task and restart it.

3. Stale warmup cleanup — 60-second maximum lifetime

A MAX_WARMUP_LIFETIME_NANOS constant (60 seconds). During each poll cycle, any warmup entry older than 60 seconds is force-removed with a warning log. No legitimate warmup should ever take 60 seconds. This catches edge cases where world.execute() silently drops a task without throwing, leaving the entry stuck in pending forever.

A createdAtNanos field is added to PendingWarmup to support this check.

Impact

  • Risk: Very low. The fix only adds defensive error handling around existing code paths. No behavioral changes for the happy path.
  • Backwards compatible: No config changes, no API changes, no message changes.
  • Files changed: WarmupService.java only.

How to Reproduce

The bug requires world.execute() to throw inside pollWarmups(). The most reliable trigger is heavy dungeon instance churn with cross-world player transfers — a world being destroyed between the world == null check and the world.execute() call. On a busy server with 40+ players running instances, this is a matter of when, not if.

Added a maximum warmup lifetime constant and improved error handling in the polling mechanism to prevent task cancellation due to uncaught exceptions.
@EliteScouter EliteScouter merged commit aa07904 into EliteScouter:main Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants