[CoreML EP] Support bool Cast in ML Program by maxwbuckley · Pull Request #28595 · microsoft/onnxruntime

maxwbuckley · 2026-05-20T17:56:13Z

Summary

Two changes to the ML Program Cast builder:

Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The
ML Program cast op already handles bool, and AddToModelBuilderImpl already
maps to == BOOL; only the input/output type gate omitted it.
Move the "no preceding node" check after the ML Program early-return. That
check is legacy gating for the NeuralNetwork ArgMax-only path (which
dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly
by a graph input is fine, and rejecting it forced needless CPU fallback.

Why

This is the first of a 4-PR series giving the CoreML EP the op coverage to run
transformer and diffusion graphs as a single CoreML partition instead of
fragmenting across CPU.

Transformer attention-mask graphs are a Cast → GatherND → And → Where chain over
bool tensors. A CoreML partition cannot have a bool input/output (CoreML
MLMultiArray has no bool type), so bool must stay internal — which makes Cast
(the int↔bool boundary) the prerequisite for the rest of the series.

Combined impact of the series

With all four PRs plus #28278 (scalar-Gather), every model below goes from 2
CoreML partitions to 1, with zero graph breaks — the whole graph runs on
CoreML. Measured on an Apple M3 Max, ML Program format:

Model	partitions (before → after)	CoreML vs CPU
BERT-large (340M)	2 → 1	7.3× (fp32) / 11.0× (fp16)
ViT-large (304M)	2 → 1	8.5× (fp32) / 10.3× (fp16)
GPT-2-large (774M)	2 → 1	11.4× (fp16)
SD-1.5 UNet (860M)	2 → 1	9.7× (fp16)

The op builders eliminate the graph breaks (deterministic); the speedups are what
CoreML already delivers once a model is no longer fragmented.

Tests (`coreml_basic_test.cc`)

CastNonArgMaxNeuralNetworkNotSupported — an int64 → bool → float cast chain
falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl
reordering.

Positive bool-Cast coverage is in the dependent PRs: Cast → GatherND → Cast
(#28598's GatherNDBoolData_MLProgram) and Cast → And → Cast (#28597's
And_MLProgram). Both place a non-Cast op between the int↔bool casts and check
the result against the CPU EP. A standalone int64 → Cast(bool) → Cast(float)
round-trip can't be verified here — CoreML's compiler fuses back-to-back cast
ops and drops the bool clamp — so the pattern needs that intervening op, which
only the dependent PRs provide.

Series — CoreML EP coverage for transformer / diffusion graphs

[CoreML EP] Support bool Cast in ML Program #28595 — Support bool Cast in ML Program (this PR — prerequisite)
[CoreML EP] Add Sin and Cos unary ops #28596 — Add Sin and Cos unary ops (independent)
[CoreML EP] Add Where and And builders #28597 — Add Where and And builders (depends on [CoreML EP] Support bool Cast in ML Program #28595)
[CoreML EP] Add GatherND builder #28598 — Add GatherND builder (depends on [CoreML EP] Support bool Cast in ML Program #28595)

Together with #28278 (scalar-Gather), the series takes BERT / GPT-2 / ViT /
diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.

Two changes to the ML Program Cast builder: 1. Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The ML Program `cast` op already handles bool, and AddToModelBuilderImpl already maps `to == BOOL`; only the input/output type gate omitted it. This lets int64<->bool<->float casts (transformer attention-mask graphs) stay on CoreML. 2. Move the "no preceding node" check after the ML Program early-return. It was legacy gating for the NeuralNetwork ArgMax-only path (which dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. Tests (coreml_basic_test.cc): - CastBoolRoundTrip_MLProgram: an int64->bool->float cast chain runs fully on CoreML and matches the CPU reference. The bool tensor is internal (a CoreML partition cannot have bool I/O) and the first Cast is graph-input fed. - CastNonArgMaxNeuralNetworkNotSupported: the same chain falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CastBoolRoundTrip_MLProgram exercised int64 -> Cast(bool) -> Cast(float). CoreML's compiler fuses the two back-to-back `cast` ops and drops the bool clamp (cast(cast(x,bool),fp32) collapses to cast(x,fp32)), so the round-trip produces the raw input value instead of 0/1 -- the test can't be numerically verified standalone. The bool-Cast support itself is correct: it is exercised end to end by the dependent PRs, where a non-Cast op sits between the int<->bool casts so no fusion occurs -- Cast->And->Cast (Where/And PR) and Cast->GatherND->Cast (GatherND PR), both numerically verified against the CPU EP. CastNonArgMaxNeuralNetworkNotSupported (the NeuralNetwork-format negative test) is kept; it guards the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 20, 2026

[CoreML EP] Add Sin and Cos unary ops #28596

Draft

[CoreML EP] Add Where and And builders #28597

Draft

[CoreML EP] Add GatherND builder #28598

Draft

maxwbuckley and others added 2 commits May 21, 2026 09:34

Merge remote-tracking branch 'origin/main' into coreml-cast-bool

56ce3ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Support bool Cast in ML Program#28595

[CoreML EP] Support bool Cast in ML Program#28595
maxwbuckley wants to merge 3 commits into
microsoft:mainfrom
maxwbuckley:coreml-cast-bool

maxwbuckley commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxwbuckley commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Combined impact of the series

Tests (coreml_basic_test.cc)

Series — CoreML EP coverage for transformer / diffusion graphs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxwbuckley commented May 20, 2026 •

edited

Loading

Tests (`coreml_basic_test.cc`)