patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug)#11
Open
NoahBPeterson wants to merge 1 commit into
Conversation
Now that pedronaugusto/mtlmesh#1 ships an Apple7/8 (M1/M2) atomic-fallback for the simplify and atlas kernels, cumesh runs end-to-end on every Apple Silicon GPU family — including the M1 hardware where it previously crashed with: [MtlMesh] Failed to create pipeline for 'propagate_cost_kernel': Unsupported float atomic operation for given target. This previously forced the patcher to unconditionally skip cumesh's three decode-time ops (fill_holes, remove_faces, simplify) on Apple Silicon. Skipping fill_holes specifically had a measurable visual cost: the decoder mesh ships with ~4128 boundary loops worth of small holes, and the bake-time Metal stack only partially closes them after decimation, so the output GLB rendered with see-through cylinders / mesh holes in any glTF viewer. Changes: - patch_mesh_base() no longer inserts an unconditional `return # Skip` at the top of fill_holes / remove_faces / simplify. Instead it: * Replaces `.cuda()` with `.to(self.device)` (the original bug — bare `.cuda()` crashes on MPS regardless of cumesh availability). * Adds an `if cumesh is None: return` guard so CPU-only builds without the Metal stack still skip cleanly. * Leaves the actual cumesh body intact so it runs normally when cumesh is importable. - new patch_pipeline_runtime_fallback() wraps the decode-time `m.fill_holes()` call in pipelines/trellis2_image_to_3d.py in a try/except RuntimeError. Users on unpatched mtlmesh (Apple7/8 without the atomic-fallback fix) get a one-time `warnings.warn` and degrade gracefully to "ship mesh with small holes" rather than crashing the whole pipeline. This is the safety net for users who pip-install before the upstream mtlmesh PR lands. Validation on Apple M1 Pro 16 GB (Apple7) with a patched mtlmesh: - cumesh.fill_holes runs in <1s on a 1,438,207-vertex / 2,988,108-face decoder mesh, closing 4128 boundary loops and adding ~26K faces. - Final GLB face count: 994,484 vs upstream CUDA's 997,498 — within 0.3%. - Visual: see-through artifacts gone; renders match upstream upstream_ref.glb in donmccurdy.com/three.js viewer. Without an mtlmesh patch (legacy install), pipeline still completes; output ships with the original small holes and a one-line warning. Depends on: pedronaugusto/mtlmesh#1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the unconditional skip of cumesh's decode-time
fill_holes/remove_faces/simplifyon Apple Silicon. The skip was a workaroundfor a real bug — Metal cumesh used
atomic_min/atomic_maxonfloat,which is an Apple9-only feature that crashed on M1/M2 — but the
workaround has been measurably hurting output quality.
With pedronaugusto/mtlmesh#1 in place (Apple7/8 atomic-fallback for the
simplify and atlas kernels), cumesh runs end-to-end on every Apple
Silicon GPU family. There's no longer a reason to skip these ops.
Why this matters
Skipping
fill_holesspecifically had a visible cost. On the referenceT.png input on an M1 Pro:
o_voxel.postprocess.to_glb) does its ownhole-fill on the decimated 200K-face mesh, but it can only close holes
that survived the decimation step — many of the original decoder holes
are gone before that point.
and z-sorting artifacts in any glTF viewer (donmccurdy, three.js,
Blender's importer).
After re-enabling decode-time
fill_holes:upstream_ref.glbin donmccurdy's viewer.Changes
patches/mps_compat.py: patch_mesh_base()return # Skip — Metal cumesh segfaults...tothe three methods. Instead:
.cuda()with.to(self.device)— fixes the originalbare-
.cuda()bug for MPS, which the skip masked.if cumesh is None: returnguard so CPU-only buildswithout the Metal stack continue to skip cleanly.
cumesh is importable.
patches/mps_compat.py: patch_pipeline_runtime_fallback()(new)m.fill_holes()call site inpipelines/trellis2_image_to_3d.pyin atry/except RuntimeError.Users on unpatched mtlmesh (Apple7/8 without the atomic-fallback
fix) get a one-time
warnings.warnand degrade gracefully —"ship mesh with small holes" rather than crash the whole pipeline.
upstream mtlmesh PR lands.
Calls the new patcher in
main().Validation
Apple M1 Pro 16 GB (Apple7), patched mtlmesh installed:
cumesh.fill_holesruns in <1s on the 1,438,207-vertex /2,988,108-face decoder mesh, closing 4128 boundary loops, adding
~26K faces.
1.000 × 0.923 × 0.381vsupstream CUDA's
1.000 × 0.918 × 0.386. Visual parity confirmedin donmccurdy's viewer.
Apple M1 Pro 16 GB, unpatched mtlmesh (legacy install simulating a user
who hasn't updated):
UserWarning: cumesh.fill_holes failed (...); skipping. On M1/M2 update mtlmesh to a build with pedronaugusto/mtlmesh#1 (Apple7/8 atomic fallback).behavior).
Dependency
the old behavior on unpatched mtlmesh.
Related
alphaMode='OPAQUE'default — separatebug fix needed for end-to-end correctness on Apple Silicon)
surfaced both fixes)