Skip to content

Desync on Apple Silicon (ARM64): deferred stack tracer can't parse ARM64 method prologues → corrupted RNG stream → "Wrong random state" #944

@samstreet

Description

@samstreet

Summary

On Apple Silicon Macs running the native ARM64 build of RimWorld 1.6, the Multiplayer mod's deferred stack tracing subsystem (DeferredStackTracingImpl / DeferredStackTracing.Postfix) fails to parse the native machine code of patched methods, because the parser is written for x86-64 instruction layout. The Rand postfix then throws on essentially every RNG call. The original Rand value is still produced (RNG advances), but the throwing postfix aborts the consuming logic, so the ARM64 client performs a different number of subsequent RNG draws than an x86-64 host. The result is a guaranteed Wrong random state desync within a few thousand ticks.

This is a desync-detector failure that itself corrupts the RNG stream, not a content-level desync. The x86-64 host is unaffected.

Environment

  • Affected client: Apple Silicon Mac (M3 Pro), macOS, running RimWorld's native ARM64 binary
  • Host: Windows (x86-64) — unaffected
  • RimWorld: 1.6.4633 rev1269
  • Multiplayer: 0.11.5+4a3be27
  • Topology: LAN, Windows host + Mac client

Key log evidence

Multiplayer version 0.11.5+4a3be27
Arch: Arm64/OS: Arm64

The tracer reports it cannot find the expected prologue and falls back:

MpServerLog Error: Unexpected GetRpb asm structure. Couldn't find a magic bytes match. Using fallback offset (-8). Asm dump for the method:
0000: FD 7B BD A9 FD 03 00 91 BF 0B 00 F9 E0 BD 99 D2
0010: 60 35 B1 F2 E0 AC C8 F2 60 24 E0 F2 A0 0B 00 F9
0020: A0 43 00 91 A0 13 00 F9 00 08 81 D2 A0 7E A7 F2
0030: 60 00 C0 F2 10 B4 80 39 50 00 00 B5 0C 00 00 94
0040: A0 13 40 F9 01 BB 85 D2 61 44 A5 F2 61 00 C0 F2
0050: 21 00 80 39 21 7C 40 93 00 00 01 CB 00 00 40 F9
0060: BF 03 00 91 FD 7B C3 A8 C0 03 5F D6

That dump is an ARM64 (AArch64) function, not x86-64. FD 7B BD A9 is stp x29, x30, [sp, #-0x30]! (the standard AArch64 frame setup), and the final C0 03 5F D6 is ret. The x86-64 parser is looking for an x86 prologue (e.g. 55 48 89 E5), never finds its magic bytes, takes the fallback offset, then throws when it tries to walk the "stack" it computed from misinterpreted bytes.

The throw surfaces as:

System.Exception: Deferred stack tracing: Unknown function header 2847767549
  at Multiplayer.Client.Desyncs.DeferredStackTracing.Postfix () in Multiplayer/Desyncs/DeferredStackTracing.cs:42
  at Multiplayer.Client.Desyncs.DeferredStackTracingImpl.GetStackUsage (System.Int64 addr) in MultiplayerCommon/DeferredStackTracingImpl.cs:243
  at Multiplayer.Client.Desyncs.DeferredStackTracingImpl.UpdateNewElement (...) in MultiplayerCommon/DeferredStackTracingImpl.cs:197
  at Multiplayer.Client.Desyncs.DeferredStackTracingImpl.TraceImpl (...) in MultiplayerCommon/DeferredStackTracingImpl.cs:132
  at Multiplayer.Client.Desyncs.DeferredStackTracing.Postfix () in Multiplayer/Desyncs/DeferredStackTracing.cs:37
  at Verse.Rand.get_Int ()
    - POSTFIX multiplayer: Void Multiplayer.Client.Desyncs.DeferredStackTracing:Postfix()

After the first throw, it degrades to a NullReferenceException at DeferredStackTracingImpl.cs:142 on every RNG call, fired from Verse.Rand.get_Int / get_Value postfixes across thousands of pawn/thing ticks (Pawn_CallTracker.ResetTicksToNextCall, JobGiver_Wander, HediffGiver_RandomAgeCurved, CompThrownFleckEmitter, etc.). Pawns then fail to get jobs ("issued IdleError wait job"), and the tick ends with:

xGRAFIKx 109380 Desynced after last valid tick 109114: Wrong random state on map 0

Rejoining immediately re-desyncs because the underlying condition is unchanged.

Root cause

RimWorld 1.6 ships native ARM64 binaries on Apple Silicon (the Mac executable is a universal binary, and on Apple Silicon the ARM64 slice runs by default). Mono therefore JITs methods to AArch64. The deferred stack tracer reads the raw native bytes of patched methods to hash call stacks cheaply, and that byte-level reader assumes x86-64:

  • It looks for an x86-64 prologue / "magic bytes" that never appear in AArch64 code.
  • The fallback offset produces a garbage address, and walking it throws.
  • Because the throw happens in a postfix on Rand.get_Int / get_Value, the RNG itself has already advanced, but the code that would have consumed that value (and made further RNG calls) is aborted.
  • The x86-64 host completes that logic and makes the additional RNG draws, so the two RNG streams diverge → Wrong random state.

This matches the known situation that Harmony/MonoMod's native-code handling was written for x86-64 and does not understand AArch64 prologues; the MP mod's bespoke tracer has the same assumption baked in.

Reproduction

  1. Host a multiplayer session from an x86-64 machine (Windows or Intel Mac).
  2. Join from an Apple Silicon Mac running the native ARM64 RimWorld build.
  3. Let the game tick. The Mac client logs the "magic bytes" error and the ARM64 asm dump immediately, then desyncs within a few thousand ticks with Wrong random state.

Confirmed workaround (user side)

Forcing the Mac to run RimWorld's x86-64 slice under Rosetta 2 fixes it completely — Mono then JITs to x86-64, the tracer's parser matches, the postfix stops throwing, and sync holds.

The reliable way to force this (per-game Steam launch options and the "Open using Rosetta" Get Info checkbox both fail, because Steam execs the inner binary directly and arch/%command% get passed as game arguments rather than used to launch) is to strip the ARM64 slice from the binary so macOS has no native option:

cd "$HOME/Library/Application Support/Steam/steamapps/common/RimWorld/RimWorldMac.app/Contents/MacOS"
cp "RimWorld by Ludeon Studios" "RimWorld by Ludeon Studios.universal.bak"
lipo "RimWorld by Ludeon Studios" -thin x86_64 -output "RimWorld_x86"
mv -f "RimWorld_x86" "RimWorld by Ludeon Studios"
codesign --force --sign - "RimWorld by Ludeon Studios"
file "RimWorld by Ludeon Studios"   # should now report x86_64 only

(Reverted by any Steam update or "Verify integrity of game files"; restore from the .universal.bak.)

This works but it's a per-user hack and shouldn't be the answer for Apple Silicon players generally.

Suggested fixes (mod side)

In rough order of effort/robustness:

  1. Add an AArch64 code path to the deferred stack tracer. Detect architecture at runtime (the mod already logs Arch: Arm64) and parse the AArch64 prologue/epilogue instead of the x86-64 one in DeferredStackTracingImpl.GetStackUsage / UpdateNewElement. The prologue to recognize is the standard stp x29, x30, [sp, #imm]! frame setup (FD 7B ?? A9 family), with stack-size decoding from the stp/sub sp, sp, #imm immediates, rather than the x86 push rbp; mov rbp, rsp pattern. This is the real fix.

  2. Fail safe instead of throwing in the postfix. Whatever the architecture, a tracer that can't parse a method should not let an exception escape the Rand postfix, because the throw is what desyncs the client (RNG advanced, consumer aborted). Catching internally and skipping the trace for that frame would at minimum stop the parser bug from corrupting the RNG stream — it would lose stack-hash fidelity in the desync report but keep clients in sync.

  3. Gate/disable deferred stack tracing on unsupported architectures. If Arch == Arm64 and no AArch64 parser is present, automatically fall back to the non-native tracing path (or disable stack tracing) rather than running the x86 parser against AArch64 code. A user-facing toggle for "deferred stack tracing" in MP settings would also let affected players self-mitigate without the Rosetta hack.

Options 2 and 3 are small and would immediately unblock Apple Silicon multiplayer; option 1 is the proper long-term fix.

Notes

  • Same arch mismatch in the other direction (ARM64 host + x86-64 client) would presumably push the fault onto whichever peer runs the native ARM64 build, since the throwing postfix fires wherever ticks are simulated — so host-swapping is not a fix.
  • Obfuscated Player.log provided

Player.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    1.6Fixes or bugs relating to 1.6 (Not Odyssey).help wantedExtra attention is needed

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions