Skip to content

patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug)#11

Open
NoahBPeterson wants to merge 1 commit into
shivampkumar:mainfrom
NoahBPeterson:reenable-cumesh-on-apple-silicon
Open

patches/mps_compat: re-enable decode-time cumesh on Apple Silicon (closes the see-through-mesh bug)#11
NoahBPeterson wants to merge 1 commit into
shivampkumar:mainfrom
NoahBPeterson:reenable-cumesh-on-apple-silicon

Conversation

@NoahBPeterson

Copy link
Copy Markdown

Summary

Removes the unconditional skip of cumesh's decode-time fill_holes /
remove_faces / simplify on Apple Silicon. The skip was a workaround
for a real bug — Metal cumesh used atomic_min / atomic_max on float,
which is an Apple9-only feature that crashed on M1/M2 — but the
workaround has been measurably hurting output quality.

With pedronaugusto/mtlmesh#1 in place (Apple7/8 atomic-fallback for the
simplify and atlas kernels), cumesh runs end-to-end on every Apple
Silicon GPU family. There's no longer a reason to skip these ops.

Why this matters

Skipping fill_holes specifically had a visible cost. On the reference
T.png input on an M1 Pro:

  • Decoder mesh ships with 167,529 boundary edges / 4,128 hole loops.
  • The bake-time Metal stack (o_voxel.postprocess.to_glb) does its own
    hole-fill on the decimated 200K-face mesh, but it can only close holes
    that survived the decimation step — many of the original decoder holes
    are gone before that point.
  • Final GLB renders with see-through cylinders, gear-edge fragmentation,
    and z-sorting artifacts in any glTF viewer (donmccurdy, three.js,
    Blender's importer).

After re-enabling decode-time fill_holes:

  • Closes the 4128 hole loops in <1s on the full 1.44M-vertex mesh.
  • Final GLB face count: 994,484 vs upstream CUDA's 997,498 (within 0.3%).
  • Visual parity with upstream_ref.glb in donmccurdy's viewer.

Changes

  1. patches/mps_compat.py: patch_mesh_base()

    • No longer prepends return # Skip — Metal cumesh segfaults... to
      the three methods. Instead:
      • Replaces .cuda() with .to(self.device) — fixes the original
        bare-.cuda() bug for MPS, which the skip masked.
      • Adds an if cumesh is None: return guard so CPU-only builds
        without the Metal stack continue to skip cleanly.
      • Leaves the actual cumesh body in place so it runs normally when
        cumesh is importable.
  2. patches/mps_compat.py: patch_pipeline_runtime_fallback() (new)

    • Wraps the decode-time m.fill_holes() call site in
      pipelines/trellis2_image_to_3d.py in a try/except RuntimeError.
      Users on unpatched mtlmesh (Apple7/8 without the atomic-fallback
      fix) get a one-time warnings.warn and degrade gracefully —
      "ship mesh with small holes" rather than crash the whole pipeline.
    • This is the safety net for users who pip-install before the
      upstream mtlmesh PR lands.
  3. Calls the new patcher in main().

Validation

Apple M1 Pro 16 GB (Apple7), patched mtlmesh installed:

  • cumesh.fill_holes runs in <1s on the 1,438,207-vertex /
    2,988,108-face decoder mesh, closing 4128 boundary loops, adding
    ~26K faces.
  • Final GLB: 994,484 faces, extents 1.000 × 0.923 × 0.381 vs
    upstream CUDA's 1.000 × 0.918 × 0.386. Visual parity confirmed
    in donmccurdy's viewer.

Apple M1 Pro 16 GB, unpatched mtlmesh (legacy install simulating a user
who hasn't updated):

  • Pipeline completes successfully.
  • One-time warning emitted:
    UserWarning: cumesh.fill_holes failed (...); skipping. On M1/M2 update mtlmesh to a build with pedronaugusto/mtlmesh#1 (Apple7/8 atomic fallback).
  • Output GLB ships with the original small holes (matches pre-PR
    behavior).

Dependency

Related

Now that pedronaugusto/mtlmesh#1 ships an Apple7/8 (M1/M2) atomic-fallback
for the simplify and atlas kernels, cumesh runs end-to-end on every
Apple Silicon GPU family — including the M1 hardware where it previously
crashed with:

    [MtlMesh] Failed to create pipeline for 'propagate_cost_kernel':
    Unsupported float atomic operation for given target.

This previously forced the patcher to unconditionally skip cumesh's three
decode-time ops (fill_holes, remove_faces, simplify) on Apple Silicon.
Skipping fill_holes specifically had a measurable visual cost: the
decoder mesh ships with ~4128 boundary loops worth of small holes, and
the bake-time Metal stack only partially closes them after decimation,
so the output GLB rendered with see-through cylinders / mesh holes in
any glTF viewer.

Changes:

  - patch_mesh_base() no longer inserts an unconditional `return  # Skip`
    at the top of fill_holes / remove_faces / simplify. Instead it:
      * Replaces `.cuda()` with `.to(self.device)` (the original bug —
        bare `.cuda()` crashes on MPS regardless of cumesh availability).
      * Adds an `if cumesh is None: return` guard so CPU-only builds
        without the Metal stack still skip cleanly.
      * Leaves the actual cumesh body intact so it runs normally when
        cumesh is importable.

  - new patch_pipeline_runtime_fallback() wraps the decode-time
    `m.fill_holes()` call in pipelines/trellis2_image_to_3d.py in a
    try/except RuntimeError. Users on unpatched mtlmesh (Apple7/8 without
    the atomic-fallback fix) get a one-time `warnings.warn` and degrade
    gracefully to "ship mesh with small holes" rather than crashing the
    whole pipeline. This is the safety net for users who pip-install
    before the upstream mtlmesh PR lands.

Validation on Apple M1 Pro 16 GB (Apple7) with a patched mtlmesh:

  - cumesh.fill_holes runs in <1s on a 1,438,207-vertex / 2,988,108-face
    decoder mesh, closing 4128 boundary loops and adding ~26K faces.
  - Final GLB face count: 994,484 vs upstream CUDA's 997,498 — within 0.3%.
  - Visual: see-through artifacts gone; renders match upstream
    upstream_ref.glb in donmccurdy.com/three.js viewer.

Without an mtlmesh patch (legacy install), pipeline still completes;
output ships with the original small holes and a one-line warning.

Depends on: pedronaugusto/mtlmesh#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant