Skip to content

[CoreML] Add ONE_BLOB multimethod weight sharing strategy#18531

Open
metascroy wants to merge 2 commits intomainfrom
weight-sharing
Open

[CoreML] Add ONE_BLOB multimethod weight sharing strategy#18531
metascroy wants to merge 2 commits intomainfrom
weight-sharing

Conversation

@metascroy
Copy link
Copy Markdown
Contributor

Adds a new ONE_BLOB weight sharing strategy for CoreML multifunction models that combines all partitions from all methods into a single multifunction model, stored as one entry in NamedDataStore.

Motivation

The existing POSITIONAL strategy requires all methods to have the same number of partitions and creates one multifunction model per partition index. This works if each method has the same number of partitions, and these partitions are aligned to naturally share weights. But we want to relax this restriction.

Design
POSITIONAL (existing): N blobs, one per partition index. Each blob contains that partition from every method. Requires partition count alignment.

combined_partition_0.mlpackage → functions: {forward, prefill}
combined_partition_1.mlpackage → functions: {forward, prefill}
combined_partition_2.mlpackage → functions: {forward, prefill}

ONE_BLOB (new): 1 blob containing all partition × method combinations. No partition count alignment required. Function names use {method}__{partition_idx} encoding for CoreML dispatch; metadata is keyed by method name for runtime compatibility.

combined_all.mlpackage → functions: {forward__0, forward__1, forward__2,
prefill__0, prefill__1, prefill__2}

No runtime changes required. The existing CMJR JSON reference mechanism (functionName field) was designed to support arbitrary function names.

Test plan

Existing CI + new unit test

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18531

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 8 Unrelated Failures

As of commit ea5ee0a with merge base e0e10cc (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy
Copy link
Copy Markdown
Contributor Author

any issues @lucylq ?


def test_multifunction_one_blob_simple_model(self):
"""Test exporting a simple model using ONE_BLOB weight sharing strategy."""
model = self.SimpleModel()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ONE_BLOB, I think it's also worth testing a model with different partitions in forward/prefill and make sure it still works - I guess that is the major difference between it and the POSITIONAL strategy.

f"'{first_method}' has {num_partitions}. POSITIONAL weight sharing "
"strategy requires all methods to have the same number of partitions. "
"Use MULTIMETHOD_WEIGHT_SHARING_STRATEGY.DISABLED if methods should "
"be processed independently."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: user can also select MULTIMETHOD_WEIGHT_SHARING_STRATEGY.ONE_BLOB now

method_spec = method_model.get_spec()
input_names = [inp.name for inp in method_spec.description.input]
output_names = [out.name for out in method_spec.description.output]
methods_metadata[method_name] = MethodMetadata(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem? If a single method has multiple partitions, this is overwritten.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

)

return MULTIMETHOD_WEIGHT_SHARING_STRATEGY.POSITIONAL
return MULTIMETHOD_WEIGHT_SHARING_STRATEGY.DISABLED
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import to internal ? See if this change breaks anything.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll import to internal. I think RL is explicitly setting this, though.

Comment thread backends/apple/coreml/compiler/coreml_preprocess.py
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 7, 2026

@metascroy has imported this pull request. If you are a Meta employee, you can view this in D99755766.


std::string method_name_str = [methodName UTF8String];
const MethodMetadata* method_metadata = metadataValue.get_method_metadata(method_name_str);
if (functionName == nil || functionName.length == 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make sure these changes are tested against new cache system stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants