Skip to content

generate.py: --save-intermediate / --load-intermediate for fast bake iteration#13

Open
NoahBPeterson wants to merge 1 commit into
shivampkumar:mainfrom
NoahBPeterson:save-load-intermediate
Open

generate.py: --save-intermediate / --load-intermediate for fast bake iteration#13
NoahBPeterson wants to merge 1 commit into
shivampkumar:mainfrom
NoahBPeterson:save-load-intermediate

Conversation

@NoahBPeterson

Copy link
Copy Markdown

Summary

Adds two flags that decouple the expensive sampling+decode phase from
the texture bake, so bake-only iterations cost ~1 minute instead of
~20+ minutes.

--save-intermediate PATH
    After pipeline.run() succeeds, dump the decoded mesh and voxel
    attributes to a torch .pt file. Same run still produces the GLB;
    this is a side effect that captures the state the bake needs.

--load-intermediate PATH
    Skip pipeline construction (the 90-200s cold load) and the full
    sampling+decode loop (~15-20+ min on M1 Pro). Load a previously
    saved blob and proceed straight to texture bake. The image arg
    and any --pipeline-type / --seed / --steps values are ignored —
    the cached mesh already represents whatever those values produced
    when the cache was made.

Why

The bake has three knobs worth iterating on (--texture-size,
alpha-mode heuristic, decimation target) but is gated on a 15-20 min
sampling phase. With a saved intermediate, bake-only iterations drop
from ~20 min to ~1 min on M1 Pro.

Use cases:

  • Comparing texture resolutions (--texture-size 1024 vs 2048)
    on the same generated mesh.
  • A/B testing the alpha-mode auto-detection heuristic (relevant after
    to_glb: percentile-based alphaMode auto-detection (default OPAQUE-friendly) pedronaugusto/trellis2-apple#1).
  • Trying different o_voxel.postprocess.to_glb(decimation_target=...)
    values without resampling.
  • Fast iteration on any future post-bake processing (e.g. our work
    finding the appropriate alphaMode defaults; this would have been
    a 1 hr loop instead of 12+ hrs without these flags).

Smoke test

On a saved 1.44M-vertex intermediate (M1 Pro 16 GB):

Loading intermediate from .../T_mac.intermediate.filled.pt...
Loaded intermediate in 0.1s
Mesh: 1,442,251 vertices, 3,014,498 triangles
Bake time: 21s
Saved: .../T_mac_pr5_smoke.glb

vs ~22 minutes for the equivalent end-to-end run.

Implementation

  • --load-intermediate makes args.image effectively optional. We
    keep the argparse signature backward-compatible (image stays
    positional required) but gate the existence check on
    args.load_intermediate is None.
  • The saved blob is intentionally minimal — just the post-decoder
    mesh + voxel attributes that to_glb consumes. No pipeline weights,
    no RNG state, no input image. Typical size: ~250 MB for a
    1.44M-vertex / 2.99M-face mesh + 6-channel voxel attrs at fp32.
  • Layout (the attr_layout slice dict) is preserved as-is. On load
    the blob is wrapped in a SimpleNamespace so downstream code
    accessing mesh_out.vertices, mesh_out.attrs, etc., works
    unchanged.
  • Existing watchdog-error handling and empty-mesh checks are preserved
    on the generation path; the load path bypasses them since by
    definition the cached mesh is non-empty.

API impact

No breakage: both flags default to None, so existing scripts run
unchanged.

Related

Pairs nicely with #11 (re-enable cumesh on M1)
and pedronaugusto/trellis2-apple#1 (alphaMode auto-detection) for
quick verification of bake-side improvements without paying sampling
cost on every iteration.

…ake iteration

Adds two flags that decouple the expensive sampling+decode phase from the
texture bake:

  --save-intermediate PATH
      After pipeline.run() succeeds, dump the decoded mesh and voxel
      attributes (vertices/faces/attrs/coords/origin/voxel_size/layout)
      to a torch .pt file. Same run still produces the GLB; this is a
      side effect that captures the state the bake needs.

  --load-intermediate PATH
      Skip pipeline construction (the 90-200s cold load) and the full
      sampling+decode loop (~15-20+ min on M1 Pro). Load a previously
      saved blob and proceed straight to texture bake. The image
      argument and any --pipeline-type / --seed / --steps / --dit-dtype
      values are ignored on this path — the cached mesh already
      represents whatever those values produced when the cache was made.

Why
---

The bake has three knobs worth iterating on (--texture-size, alpha-mode
heuristic, decimation target) but is gated on a 15-20 min sampling
phase. With a saved intermediate, bake-only iterations drop from ~20
min to ~1 min on M1 Pro.

Use case: comparing 1024² vs 2048² textures, A/B-ing alpha-mode
thresholds (relevant after pedronaugusto/trellis2-apple#1), trying
different mesh decimation targets.

Smoke test
----------

On a saved 1.44M-vertex intermediate:

    Loading intermediate from artifacts/trellis_mac/T_mac.intermediate.filled.pt...
    Loaded intermediate in 0.1s
    Mesh: 1,442,251 vertices, 3,014,498 triangles
    Bake time: 21s
    Saved: artifacts/trellis_mac/T_mac_pr5_smoke.glb

vs ~22 minutes for the equivalent end-to-end run. The bake reproduces
exactly what the original generation+bake would have produced (same
mesh, same textures), modulo whatever bake-time flags were changed.

Implementation
--------------

- The flag pair is mutually exclusive in spirit but not enforced (only
  one path runs at a time): --load-intermediate makes args.image
  optional and skips the pipeline-load + .to(MPS) + run() block. We
  keep the argparse signature backward-compatible by leaving the image
  positional required at the parser level and gating the existence
  check on `args.load_intermediate is None`.
- The blob saved is intentionally minimal (no pipeline weights, no
  RNG state, no input image). Just the post-decoder mesh + voxel
  attributes that to_glb consumes. Typical size: ~250 MB for a
  1.44M-vertex / 2.99M-face mesh + 6-channel voxel attrs at fp32.
- Layout is preserved as-is; loaded back into a SimpleNamespace so
  downstream code accessing `mesh_out.vertices`, `mesh_out.attrs`,
  etc., works unchanged.

API impact
----------

No breakage: both flags default to None, so existing scripts run
unchanged. Existing watchdog-error handling and the empty-mesh check
are preserved on the generation path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant