patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug) by NoahBPeterson · Pull Request #11 · shivampkumar/trellis-mac

NoahBPeterson · 2026-05-01T18:44:22Z

Summary

Removes the unconditional skip of cumesh's decode-time fill_holes /
remove_faces / simplify on Apple Silicon. The skip was a workaround
for a real bug — Metal cumesh used atomic_min / atomic_max on float,
which is an Apple9-only feature that crashed on M1/M2 — but the
workaround has been measurably hurting output quality.

With pedronaugusto/mtlmesh#1 in place (Apple7/8 atomic-fallback for the
simplify and atlas kernels), cumesh runs end-to-end on every Apple
Silicon GPU family. There's no longer a reason to skip these ops.

Why this matters

Skipping fill_holes specifically had a visible cost. On the reference
T.png input on an M1 Pro:

Decoder mesh ships with 167,529 boundary edges / 4,128 hole loops.
The bake-time Metal stack (o_voxel.postprocess.to_glb) does its own
hole-fill on the decimated 200K-face mesh, but it can only close holes
that survived the decimation step — many of the original decoder holes
are gone before that point.
Final GLB renders with see-through cylinders, gear-edge fragmentation,
and z-sorting artifacts in any glTF viewer (donmccurdy, three.js,
Blender's importer).

After re-enabling decode-time fill_holes:

Closes the 4128 hole loops in <1s on the full 1.44M-vertex mesh.
Final GLB face count: 994,484 vs upstream CUDA's 997,498 (within 0.3%).
Visual parity with upstream_ref.glb in donmccurdy's viewer.

Changes

patches/mps_compat.py: patch_mesh_base()
- No longer prepends return # Skip — Metal cumesh segfaults... to
  the three methods. Instead:
  - Replaces .cuda() with .to(self.device) — fixes the original
    bare-.cuda() bug for MPS, which the skip masked.
  - Adds an if cumesh is None: return guard so CPU-only builds
    without the Metal stack continue to skip cleanly.
  - Leaves the actual cumesh body in place so it runs normally when
    cumesh is importable.
patches/mps_compat.py: patch_pipeline_runtime_fallback() (new)
- Wraps the decode-time m.fill_holes() call site in
  pipelines/trellis2_image_to_3d.py in a try/except RuntimeError.
  Users on unpatched mtlmesh (Apple7/8 without the atomic-fallback
  fix) get a one-time warnings.warn and degrade gracefully —
  "ship mesh with small holes" rather than crash the whole pipeline.
- This is the safety net for users who pip-install before the
  upstream mtlmesh PR lands.
Calls the new patcher in main().

Validation

Apple M1 Pro 16 GB (Apple7), patched mtlmesh installed:

cumesh.fill_holes runs in <1s on the 1,438,207-vertex /
2,988,108-face decoder mesh, closing 4128 boundary loops, adding
~26K faces.
Final GLB: 994,484 faces, extents 1.000 × 0.923 × 0.381 vs
upstream CUDA's 1.000 × 0.918 × 0.386. Visual parity confirmed
in donmccurdy's viewer.

Apple M1 Pro 16 GB, unpatched mtlmesh (legacy install simulating a user
who hasn't updated):

Pipeline completes successfully.
One-time warning emitted:
UserWarning: cumesh.fill_holes failed (...); skipping. On M1/M2 update mtlmesh to a build with pedronaugusto/mtlmesh#1 (Apple7/8 atomic fallback).
Output GLB ships with the original small holes (matches pre-PR
behavior).

Dependency

Hard requirement for the quality improvement: Float-atomic + per-face propagation fallback for Apple7/8 GPUs (M1/M2) pedronaugusto/mtlmesh#1
Soft requirement: this PR works regardless — the try/except just keeps
the old behavior on unpatched mtlmesh.

Float-atomic + per-face propagation fallback for Apple7/8 GPUs (M1/M2) pedronaugusto/mtlmesh#1 (Apple7/8 atomic-fallback)
to_glb: percentile-based alphaMode auto-detection (default OPAQUE-friendly) pedronaugusto/trellis2-apple#1 (alphaMode='OPAQUE' default — separate
bug fix needed for end-to-end correctness on Apple Silicon)
bug (or my mistake) #5 (FotisK's original investigation that
surfaced both fixes)

Now that pedronaugusto/mtlmesh#1 ships an Apple7/8 (M1/M2) atomic-fallback for the simplify and atlas kernels, cumesh runs end-to-end on every Apple Silicon GPU family — including the M1 hardware where it previously crashed with: [MtlMesh] Failed to create pipeline for 'propagate_cost_kernel': Unsupported float atomic operation for given target. This previously forced the patcher to unconditionally skip cumesh's three decode-time ops (fill_holes, remove_faces, simplify) on Apple Silicon. Skipping fill_holes specifically had a measurable visual cost: the decoder mesh ships with ~4128 boundary loops worth of small holes, and the bake-time Metal stack only partially closes them after decimation, so the output GLB rendered with see-through cylinders / mesh holes in any glTF viewer. Changes: - patch_mesh_base() no longer inserts an unconditional `return # Skip` at the top of fill_holes / remove_faces / simplify. Instead it: * Replaces `.cuda()` with `.to(self.device)` (the original bug — bare `.cuda()` crashes on MPS regardless of cumesh availability). * Adds an `if cumesh is None: return` guard so CPU-only builds without the Metal stack still skip cleanly. * Leaves the actual cumesh body intact so it runs normally when cumesh is importable. - new patch_pipeline_runtime_fallback() wraps the decode-time `m.fill_holes()` call in pipelines/trellis2_image_to_3d.py in a try/except RuntimeError. Users on unpatched mtlmesh (Apple7/8 without the atomic-fallback fix) get a one-time `warnings.warn` and degrade gracefully to "ship mesh with small holes" rather than crashing the whole pipeline. This is the safety net for users who pip-install before the upstream mtlmesh PR lands. Validation on Apple M1 Pro 16 GB (Apple7) with a patched mtlmesh: - cumesh.fill_holes runs in <1s on a 1,438,207-vertex / 2,988,108-face decoder mesh, closing 4128 boundary loops and adding ~26K faces. - Final GLB face count: 994,484 vs upstream CUDA's 997,498 — within 0.3%. - Visual: see-through artifacts gone; renders match upstream upstream_ref.glb in donmccurdy.com/three.js viewer. Without an mtlmesh patch (legacy install), pipeline still completes; output ships with the original small holes and a one-line warning. Depends on: pedronaugusto/mtlmesh#1

NoahBPeterson mentioned this pull request May 1, 2026

generate.py: --save-intermediate / --load-intermediate for fast bake iteration #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug)#11

patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug)#11
NoahBPeterson wants to merge 1 commit into
shivampkumar:mainfrom
NoahBPeterson:reenable-cumesh-on-apple-silicon

NoahBPeterson commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NoahBPeterson commented May 1, 2026

Summary

Why this matters

Changes

Validation

Dependency

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant