Expose CoreML compute units as gpu device IDs in onnx-coreml backend by ChinChangYang · Pull Request #2401 · LeelaChessZero/lc0

ChinChangYang · 2026-03-16T10:18:25Z

Summary

Exposes CoreML compute unit selection through the standard gpu device ID parameter in the onnx-coreml backend, allowing users to control which Apple hardware accelerators CoreML uses for inference.
Removes ProfileComputePlan=1 from CoreML provider options (previously caused ~111 s warm session creation; without it warm startup is ~46 s — a ~58% improvement).

Usage

The gpu parameter selects which CoreML compute units to use:

gpu=0 (default) — CPU + GPU (CPUAndGPU)
gpu=1 — CPU + Neural Engine (CPUAndNeuralEngine)
gpu=2 or higher — all available hardware (ALL: CPU, GPU, Neural Engine)

This is most useful with the multiplexing backend to run separate instances targeting different compute units simultaneously:

./lc0 backendbench -w <model> \
  -b multiplexing \
  -o "gpu(backend=onnx-coreml,gpu=0,threads=2,max_batch=64),npu(backend=onnx-coreml,gpu=1,threads=2,max_batch=64)" \
  --batch-step=16 --start-batch-size=16 --max-batch-size=64 -t 4

Motivation

On Apple Silicon, the GPU and Neural Engine are separate hardware units that can run simultaneously. By running two onnx-coreml instances via the multiplexing backend — one targeting gpu=0 (CPUAndGPU) and one targeting gpu=1 (CPUAndNeuralEngine) — both accelerators can be kept busy in parallel, increasing overall inference throughput compared to a single instance.

Using the existing gpu parameter (instead of a custom compute_units string option) keeps the interface consistent with other backends like CUDA, where gpu=N selects a device.

Test plan

Build on macOS with ONNX Runtime CoreML provider
Run onnx-coreml backend with gpu=0 and gpu=1 and verify inference produces valid results
Confirm session creation time is reduced compared to ProfileComputePlan=1
Verify error histogram between gpu=0 and gpu=1 shows acceptable numerical differences

🤖 Generated with Claude Code

Replace the hardcoded ProfileComputePlan=1 provider option with a configurable MLComputeUnits option (default: ALL). Accepts the same values as the CoreML MLComputeUnits enum: ALL, CPU_AND_NE, CPU_ONLY, CPU_AND_GPU. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gsobala · 2026-03-16T11:40:03Z

On a MacBook Pro M5 Max this PR gives a substantial improvement in nps using multiplexing compared to simple onnx-coreml.

Results for nets 792013, 11248, 771473 and BT4-1024x15x32h-swa-6147500.pb.gz :

       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/792013
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,    10648,   1.503, 0.1334, 0.0888,   11449,   11138,    8433
  32,    17628,   1.815, 0.0348, 0.0192,   18145,   17641,   16285
  48,    22268,   2.156, 0.0303, 0.0141,   22661,   22360,   21211
  64,    24411,   2.622, 0.0522, 0.0199,   25129,   24600,   23583
(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w ~/nets/792013 \                                                -b multiplexing \
  -o "gpu(backend=onnx-coreml,compute_units=CPUAndGPU,threads=2,max_batch=64),npu(backend=onnx-coreml,compute_units=CPUAndNeuralEngine,threads=2,max_batch=64)" \
  --batch-step=16 --start-batch-size=16 --max-batch-size=64 -t 4
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/792013
Creating backend [onnx-coreml]...
Creating backend [onnx-coreml]...
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,    16133,  0.9917, 0.9198, 0.9275,   31425,   21211,    1816
  32,    24198,   1.322, 0.6204, 0.4691,   36078,   35202,   14128
  48,    29909,   1.605, 0.9525, 0.5935,   45265,   44699,   14475
  64,    31363,   2.041, 1.4780, 0.7243,   50340,   49783,   12861

===============

(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w ~/nets/11248 -b onnx-coreml --batch-step=16 --start-batch-size=16 --max-batch-size=64
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/11248
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,     9847,   1.625, 0.1290, 0.0794,   10421,   10238,    7198
  32,    12766,   2.507, 0.0372, 0.0148,   12996,   12838,   12095
  48,    17450,   2.751, 0.0312, 0.0114,   17883,   17465,   16786
  64,    17649,   3.626, 0.0264, 0.0073,   17941,   17633,   17239
(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w ~/nets/11248 \                                                                       
  -b multiplexing \
  -o "gpu(backend=onnx-coreml,compute_units=CPUAndGPU,threads=2,max_batch=64),npu(backend=onnx-coreml,compute_units=CPUAndNeuralEngine,threads=2,max_batch=64)" \
  --batch-step=16 --start-batch-size=16 --max-batch-size=64 -t 4
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/11248
Creating backend [onnx-coreml]...
Creating backend [onnx-coreml]...
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,    15351,   1.042, 0.4065, 0.3900,   34930,   20180,    4857
  32,    18855,   1.697, 0.6864, 0.4044,   26197,   25834,   11683
  48,    23482,   2.044, 1.1654, 0.5701,   35785,   35317,   11934
  64,    23723,   2.698, 1.5842, 0.5872,   36116,   35769,   11721

==================

(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w ~/nets/771473 -b onnx-coreml --batch-step=16 --start-batch-size=16 --max-batch-size=64 
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/771473
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,    11455,   1.397, 0.1726, 0.1235,   12496,   12255,    8396
  32,    19503,   1.641, 0.0287, 0.0175,   19862,   19597,   17893
  48,    24508,   1.959, 0.0357, 0.0182,   24920,   24621,   21452
  64,    26660,   2.401, 0.0547, 0.0228,   27665,   26388,   25113
(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w ~/nets/771473 \                                                                       
  -b multiplexing \
  -o "gpu(backend=onnx-coreml,compute_units=CPUAndGPU,threads=2,max_batch=64),npu(backend=onnx-coreml,compute_units=CPUAndNeuralEngine,threads=2,max_batch=64)" \
  --batch-step=16 --start-batch-size=16 --max-batch-size=64 -t 4
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: /Users/george/nets/771473
Creating backend [onnx-coreml]...
Creating backend [onnx-coreml]...
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,    20916,   0.765, 0.3384, 0.4424,   39735,   20115,    4313
  32,    28857,   1.109, 0.3778, 0.3407,   69628,   35032,   14486
  48,    34207,   1.403, 0.6503, 0.4634,   49860,   49017,   19835
  64,    37409,   1.711, 0.8957, 0.5236,   55543,   54635,   19939

==================

(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w BT4 -b onnx-coreml --batch-step=16 --start-batch-size=16 --max-batch-size=64
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: BT4
Weights file has multihead format, updating format flag
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,     1979,   8.083, 0.0379, 0.0047,    1992,    1982,    1918
  32,     2252,   14.21, 0.0314, 0.0022,    2259,    2254,    2227
  48,     2421,   19.83, 0.0561, 0.0028,    2431,    2422,    2385
  64,     2463,   25.99, 0.0942, 0.0036,    2475,    2464,    2413
(venv) george@MacBook-Pro-M5 lc0 % build/release/lc0 backendbench -w BT4 \                                                                       
  -b multiplexing \
  -o "gpu(backend=onnx-coreml,compute_units=CPUAndGPU,threads=2,max_batch=64),npu(backend=onnx-coreml,compute_units=CPUAndNeuralEngine,threads=2,max_batch=64)" \
  --batch-step=16 --start-batch-size=16 --max-batch-size=64 -t 4
       _
|   _ | |
|_ |_ |_| v0.33.0-dev+git.cb97caf built Mar 16 2026
Loading weights file from: BT4
Weights file has multihead format, updating format flag
Creating backend [onnx-coreml]...
Creating backend [onnx-coreml]...
size, mean nps, mean ms,   sdev,     cv, max nps,  median, min nps,
  16,     2249,   7.115, 6.9841, 0.9816,    7927,    3985,     535
  32,     2624,    12.2,11.4768, 0.9410,    8893,    4472,     841
  48,     2763,   17.37,16.9928, 0.9781,    4779,    4752,     846
  64,     2857,    22.4,22.0272, 0.9832,    4816,    4799,     848

Menkib64 · 2026-03-20T22:30:17Z

  batch_size_ = opts.GetOrDefault<int>("batch", default_batch);
  steps_ = opts.GetOrDefault<int>("steps", default_steps);
  min_batch_size_ = opts.GetOrDefault<int>("min_batch", default_min_batch);
+  compute_units_ = opts.GetOrDefault<std::string>("compute_units", "ALL");


Backend can expose different compute units as separate GPUs. GPU should get id 0 and neural engine id 1. This would use an existing option in a logical way.

Fixed 634ad6e.

Replace the custom `compute_units` string option with the standard `gpu` integer option used by all other GPU backends. gpu=0 (default) selects CPUAndGPU, gpu=1 selects CPUAndNeuralEngine, and any other value uses ALL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ChinChangYang marked this pull request as ready for review March 16, 2026 11:46

Menkib64 reviewed Mar 20, 2026

View reviewed changes

Menkib64 approved these changes Mar 21, 2026

View reviewed changes

ChinChangYang changed the title ~~Add configurable MLComputeUnits option to onnx-coreml backend~~ Expose CoreML compute units as gpu device IDs in onnx-coreml backend Mar 22, 2026

borg323 merged commit 5d8b306 into LeelaChessZero:master Mar 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose CoreML compute units as gpu device IDs in onnx-coreml backend#2401

Expose CoreML compute units as gpu device IDs in onnx-coreml backend#2401
borg323 merged 2 commits intoLeelaChessZero:masterfrom
ChinChangYang:coreml-ml-compute-units

ChinChangYang commented Mar 16, 2026 •

edited

Loading

Uh oh!

gsobala commented Mar 16, 2026

Uh oh!

Menkib64 Mar 20, 2026

Uh oh!

ChinChangYang Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ChinChangYang commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Motivation

Test plan

Uh oh!

gsobala commented Mar 16, 2026

Uh oh!

Menkib64 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

ChinChangYang Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChinChangYang commented Mar 16, 2026 •

edited

Loading