Description
MLX's automatic element-wise kernel fusion (Compiled kernels) crashes with SIGSEGV (KERN_INVALID_ADDRESS at 0x0) when any input array has a null MTLBuffer. This happens when weights are loaded via mmap (lazy loading) and MLX auto-fuses an element-wise operation that includes those weights before they have been promoted to a Metal buffer.
The crash is inside CommandEncoder::set_input_array, which dereferences the Metal buffer pointer without checking for null.
Environment
- MLX version: 0.31.1
- macOS: 26.4 (Darwin 25.4.0)
- Hardware: Apple M1 Max (MacBookPro18,2)
- Python: 3.14.3
Stack trace (from macOS crash reporter)
Thread 0 Crashed:
0 libmlx.dylib mlx::core::metal::CommandEncoder::set_input_array(mlx::core::array const&, int, long long) + 48
1 libmlx.dylib mlx::core::Compiled::eval_gpu(std::vector<mlx::core::array> const&, std::vector<mlx::core::array>&) + 1032
2 libmlx.dylib mlx::core::gpu::eval(mlx::core::array&) + 204
3 libmlx.dylib mlx::core::eval_impl(std::vector<mlx::core::array>, bool) + 5020
4 libmlx.dylib mlx::core::async_eval(std::vector<mlx::core::array>) + 108
5 core.cpython-314-darwin.so [mlx Python binding for async_eval]
Exception: EXC_BAD_ACCESS (SIGSEGV) — KERN_INVALID_ADDRESS at 0x0000000000000000
Background threads at crash time are in ParallelFileReader::read → Load::eval_cpu (actively reading mmap'd weights from disk via pread).
Reproduction scenario
The crash happens when:
- Weights are loaded from a
.safetensors file using MLX's mmap/lazy loading (ParallelFileReader)
mx.eval(weights) is called to pre-materialize them (this appears to complete without error)
- A forward pass is run through the model — the automatic element-wise fusion creates
Compiled kernels
- When those kernels execute,
set_input_array receives an array whose MTLBuffer is still null
Root cause
MLX's auto-compilation fuses chains of element-wise operations into Compiled Metal kernels. Compiled::eval_gpu calls CommandEncoder::set_input_array for each input, which reads array.buffer().ptr() directly without checking for null. If any input array has a null MTLBuffer (e.g., because it was lazy/mmap'd and its Metal buffer was not yet allocated), this dereferences a null pointer.
The surprising part is that calling mx.eval(weights) beforehand does not reliably prevent the crash — the MTLBuffer appears to be null in the context of the compiled kernel's execution even after eval.
Workaround
This prevents auto-fusion and the creation of Compiled kernels entirely, so set_input_array is never called with lazy arrays.
Expected behavior
Compiled::eval_gpu should either:
- Check for null
MTLBuffer and trigger a synchronous eval/promotion of the input array before proceeding, or
- Fall back to a non-compiled evaluation path
Or, mx.eval(array) should guarantee that all subsequent GPU operations on that array (including as inputs to Compiled kernels) have a valid MTLBuffer.
Notes
mflux already works around a related issue in its flux2_klein.py via AppleSiliconUtil.is_m1_or_m2() — it avoids mx.compile(predict) on M1/M2, but inadvertently excludes M1 Max/Ultra (which contain "max" or "ultra" in the chip name) and thus still hits this crash on those chips.
- The auto-compilation version of this bug (not explicit
mx.compile) was observed with MLX 0.31.1, which was believed to have fixed an earlier explicit-compile version of the same crash.
Description
MLX's automatic element-wise kernel fusion (Compiled kernels) crashes with
SIGSEGV(KERN_INVALID_ADDRESS at 0x0) when any input array has a nullMTLBuffer. This happens when weights are loaded via mmap (lazy loading) and MLX auto-fuses an element-wise operation that includes those weights before they have been promoted to a Metal buffer.The crash is inside
CommandEncoder::set_input_array, which dereferences the Metal buffer pointer without checking for null.Environment
Stack trace (from macOS crash reporter)
Exception:
EXC_BAD_ACCESS (SIGSEGV)—KERN_INVALID_ADDRESS at 0x0000000000000000Background threads at crash time are in
ParallelFileReader::read→Load::eval_cpu(actively reading mmap'd weights from disk viapread).Reproduction scenario
The crash happens when:
.safetensorsfile using MLX's mmap/lazy loading (ParallelFileReader)mx.eval(weights)is called to pre-materialize them (this appears to complete without error)Compiledkernelsset_input_arrayreceives an array whoseMTLBufferis still nullRoot cause
MLX's auto-compilation fuses chains of element-wise operations into
CompiledMetal kernels.Compiled::eval_gpucallsCommandEncoder::set_input_arrayfor each input, which readsarray.buffer().ptr()directly without checking for null. If any input array has a nullMTLBuffer(e.g., because it was lazy/mmap'd and its Metal buffer was not yet allocated), this dereferences a null pointer.The surprising part is that calling
mx.eval(weights)beforehand does not reliably prevent the crash — theMTLBufferappears to be null in the context of the compiled kernel's execution even aftereval.Workaround
This prevents auto-fusion and the creation of
Compiledkernels entirely, soset_input_arrayis never called with lazy arrays.Expected behavior
Compiled::eval_gpushould either:MTLBufferand trigger a synchronous eval/promotion of the input array before proceeding, orOr,
mx.eval(array)should guarantee that all subsequent GPU operations on that array (including as inputs toCompiledkernels) have a validMTLBuffer.Notes
mfluxalready works around a related issue in itsflux2_klein.pyviaAppleSiliconUtil.is_m1_or_m2()— it avoidsmx.compile(predict)on M1/M2, but inadvertently excludes M1 Max/Ultra (which contain "max" or "ultra" in the chip name) and thus still hits this crash on those chips.mx.compile) was observed with MLX 0.31.1, which was believed to have fixed an earlier explicit-compile version of the same crash.