generate.py: --save-intermediate / --load-intermediate for fast bake iteration#13
Open
NoahBPeterson wants to merge 1 commit into
Open
generate.py: --save-intermediate / --load-intermediate for fast bake iteration#13NoahBPeterson wants to merge 1 commit into
NoahBPeterson wants to merge 1 commit into
Conversation
…ake iteration
Adds two flags that decouple the expensive sampling+decode phase from the
texture bake:
--save-intermediate PATH
After pipeline.run() succeeds, dump the decoded mesh and voxel
attributes (vertices/faces/attrs/coords/origin/voxel_size/layout)
to a torch .pt file. Same run still produces the GLB; this is a
side effect that captures the state the bake needs.
--load-intermediate PATH
Skip pipeline construction (the 90-200s cold load) and the full
sampling+decode loop (~15-20+ min on M1 Pro). Load a previously
saved blob and proceed straight to texture bake. The image
argument and any --pipeline-type / --seed / --steps / --dit-dtype
values are ignored on this path — the cached mesh already
represents whatever those values produced when the cache was made.
Why
---
The bake has three knobs worth iterating on (--texture-size, alpha-mode
heuristic, decimation target) but is gated on a 15-20 min sampling
phase. With a saved intermediate, bake-only iterations drop from ~20
min to ~1 min on M1 Pro.
Use case: comparing 1024² vs 2048² textures, A/B-ing alpha-mode
thresholds (relevant after pedronaugusto/trellis2-apple#1), trying
different mesh decimation targets.
Smoke test
----------
On a saved 1.44M-vertex intermediate:
Loading intermediate from artifacts/trellis_mac/T_mac.intermediate.filled.pt...
Loaded intermediate in 0.1s
Mesh: 1,442,251 vertices, 3,014,498 triangles
Bake time: 21s
Saved: artifacts/trellis_mac/T_mac_pr5_smoke.glb
vs ~22 minutes for the equivalent end-to-end run. The bake reproduces
exactly what the original generation+bake would have produced (same
mesh, same textures), modulo whatever bake-time flags were changed.
Implementation
--------------
- The flag pair is mutually exclusive in spirit but not enforced (only
one path runs at a time): --load-intermediate makes args.image
optional and skips the pipeline-load + .to(MPS) + run() block. We
keep the argparse signature backward-compatible by leaving the image
positional required at the parser level and gating the existence
check on `args.load_intermediate is None`.
- The blob saved is intentionally minimal (no pipeline weights, no
RNG state, no input image). Just the post-decoder mesh + voxel
attributes that to_glb consumes. Typical size: ~250 MB for a
1.44M-vertex / 2.99M-face mesh + 6-channel voxel attrs at fp32.
- Layout is preserved as-is; loaded back into a SimpleNamespace so
downstream code accessing `mesh_out.vertices`, `mesh_out.attrs`,
etc., works unchanged.
API impact
----------
No breakage: both flags default to None, so existing scripts run
unchanged. Existing watchdog-error handling and the empty-mesh check
are preserved on the generation path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two flags that decouple the expensive sampling+decode phase from
the texture bake, so bake-only iterations cost ~1 minute instead of
~20+ minutes.
Why
The bake has three knobs worth iterating on (
--texture-size,alpha-mode heuristic, decimation target) but is gated on a 15-20 min
sampling phase. With a saved intermediate, bake-only iterations drop
from ~20 min to ~1 min on M1 Pro.
Use cases:
--texture-size 1024vs2048)on the same generated mesh.
to_glb: percentile-based alphaMode auto-detection (default OPAQUE-friendly) pedronaugusto/trellis2-apple#1).
o_voxel.postprocess.to_glb(decimation_target=...)values without resampling.
finding the appropriate
alphaModedefaults; this would have beena 1 hr loop instead of 12+ hrs without these flags).
Smoke test
On a saved 1.44M-vertex intermediate (M1 Pro 16 GB):
vs ~22 minutes for the equivalent end-to-end run.
Implementation
--load-intermediatemakesargs.imageeffectively optional. Wekeep the argparse signature backward-compatible (image stays
positional required) but gate the existence check on
args.load_intermediate is None.mesh + voxel attributes that
to_glbconsumes. No pipeline weights,no RNG state, no input image. Typical size: ~250 MB for a
1.44M-vertex / 2.99M-face mesh + 6-channel voxel attrs at fp32.
attr_layoutslice dict) is preserved as-is. On loadthe blob is wrapped in a
SimpleNamespaceso downstream codeaccessing
mesh_out.vertices,mesh_out.attrs, etc., worksunchanged.
on the generation path; the load path bypasses them since by
definition the cached mesh is non-empty.
API impact
No breakage: both flags default to None, so existing scripts run
unchanged.
Related
Pairs nicely with #11 (re-enable cumesh on M1)
and pedronaugusto/trellis2-apple#1 (alphaMode auto-detection) for
quick verification of bake-side improvements without paying sampling
cost on every iteration.