Skip to content

Add SPZ v4 (NGSP / ZSTD multi-stream) read and write support#332

Open
udwinj wants to merge 8 commits into
sparkjsdev:mainfrom
udwinj:add-spz-v4-support
Open

Add SPZ v4 (NGSP / ZSTD multi-stream) read and write support#332
udwinj wants to merge 8 commits into
sparkjsdev:mainfrom
udwinj:add-spz-v4-support

Conversation

@udwinj
Copy link
Copy Markdown

@udwinj udwinj commented May 4, 2026

Summary

Adds support for SPZ v4 to spark — both reading and writing — bringing parity with the latest nianticlabs/spz reference encoder.

SPZ v4 replaces the single gzip-wrapped payload (v1–v3) with a 32-byte NGSP header followed by per-attribute ZSTD-compressed streams. The wire format is identical to upstream's saveSpz() so files round-trip cleanly between the C++ encoder and spark.

Changes

Dependencies

  • Adds @bokuweb/zstd-wasm (~50 KB WASM blob, lazy-loaded). Used for both ZSTD compression and decompression. Native CompressionStream("zstd") was considered but dropped because it isn't yet supported in Firefox or Safari.

src/SplatLoader.ts

  • getSplatFileType recognizes the NGSP magic at offset 0 (v4) in addition to the existing gzip-wrapped detection (v1–v3).

src/spz.ts — read path

  • SpzReader detects v4 in the constructor by inspecting the first 4 bytes; legacy files continue to flow through GunzipReader unchanged.
  • For v4, parseHeader() parses the 32-byte NGSP header, awaits ZSTD WASM init, and decompresses every attribute stream up front into v4Streams: Uint8Array[].
  • parseSplats() uses a small read() abstraction so v3 (gzip stream) and v4 (pre-decompressed buffers) share the same decode logic.
  • The smallest-three quaternion branch now triggers on version >= 3 (was === 3) since v4 uses the same encoding as v3.
  • The LOD tail (a spark-only extension to the gzip format) is correctly skipped for v4 files.

src/spz.ts — write path

  • SPZ_VERSION bumped to 4.
  • SpzWriter now stores each attribute in its own Uint8Array (positions, alphas, colors, scales, rotations, sh) and assembles [32-byte header][TOC][ZSTD streams] in finalize().
  • Setter API (setCenter / setAlpha / setRgb / setScale / setQuat / setSh) is unchanged, so transcodeSpz and other callers don't need updates.
  • TOC entries are [u64 compressedSize LE][u64 uncompressedSize LE], matching the reference encoder.

Backward compatibility

  • v2, v3 file reads are unchanged (same code path).
  • v3 → v4 is the only writer behavior change; the writer no longer emits gzip files. Callers depending on writing legacy v3 files would need a flag added in a follow-up.

Known gaps (out of scope)

  • Extensions (FlagHasExtensions = 0x2): not read or written. Reader correctly skips over them via the tocByteOffset field, so files containing extensions still load.
  • SH degree 4 (upstream SH_MAX_DEGREE = 4): pre-existing spark limitation — SH_DEGREE_TO_VECS only goes up to 3. Files with shDegree == 4 already failed to load before this PR; behavior is unchanged.

judwin and others added 2 commits May 4, 2026 14:41
SPZ v4 files use a 32-byte NGSP header with per-attribute ZSTD-compressed
streams instead of a single gzip-wrapped payload (v1-v3). This adds
@bokuweb/zstd-wasm for ZSTD codec support (works in all browsers via
WASM, no CompressionStream("zstd") browser dependency) and updates both
the reader and writer to handle v4.

Read path:
- getSplatFileType: detect v4 files that start with NGSP magic directly
  (not gzip-wrapped) and return SplatFileType.SPZ
- SpzReader: detect v4 in constructor via magic bytes; parse the 32-byte
  header and decompress all attribute streams upfront in parseHeader()
- SpzReader: unified read() abstraction in parseSplats() so v3 and v4
  share identical decode logic
- SpzReader: extend smallest-three quaternion path to version >= 3
  (was === 3), since v4 uses the same encoding as v3
- SpzReader: legacy v1-v3 gzip path is unchanged

Write path:
- SPZ_VERSION bumped from 3 to 4
- SpzWriter rewritten to keep per-attribute Uint8Array buffers and emit
  the v4 file layout: [32-byte header][TOC][concatenated ZSTD streams]
- Setter API (setCenter, setAlpha, setRgb, setScale, setQuat, setSh) is
  unchanged, so transcodeSpz and other callers don't need updates
- finalize() ZSTD-compresses each attribute stream independently and
  assembles the output, mirroring the C++ saveSpz() reference encoder

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The viewer's load path goes through the spark-worker-rs WASM module,
which uses spark-lib's Rust SpzDecoder. Without this change, v4 files
fail with "Invalid gzip header" even though the TS SpzReader handles
them, because the worker never invokes the TS path.

This adds a parallel v4 path to SpzDecoder that mirrors the C++
reference implementation:

- New SpzFormat enum (Unknown / Gzip / Ngsp). The decoder detects which
  on the first 4 bytes of input — NGSP magic = v4, gzip magic = legacy.
- For v4: accumulate raw bytes, parse the 32-byte NgspFileHeader, walk
  the TOC (numStreams × [u64 compressedSize][u64 uncompressedSize]),
  ZSTD-decompress each attribute stream with ruzstd, concatenate the
  decompressed bytes in stream order, then run the existing per-stage
  state machine (Centers/Alphas/Rgb/Scales/Quats/Sh).
- For v1-v3: gzip path is unchanged.
- Smallest-three quaternion branch now triggers on version >= 3, since
  v4 uses the same encoding as v3.

decoder.rs: MultiDecoder routes files starting with NGSP magic directly
to SpzDecoder (in addition to the existing gzip-wrapped detection).

Dependencies:
- ruzstd 0.7 (pure-Rust ZSTD decoder; works in WASM with no C bindings)

The Rust SpzEncoder is intentionally untouched — only the build-lod
CLI uses it. SPZ writing from spark.js goes through the TypeScript
SpzWriter, which already produces v4 files.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@udwinj
Copy link
Copy Markdown
Author

udwinj commented May 4, 2026

Update: Rust WASM decoder now also supports SPZ v4

After initial testing, we found that the viewer's load path (SplatLoaderworkerPoolworker.ts) goes through the Rust WASM decoder (rust/spark-lib/src/spz.rs), not the TypeScript SpzReader. The first commit's TS changes only covered the transcodeSpz / legacy worker path. V4 files were failing with "Invalid gzip header" because the Rust decoder was hitting the NGSP magic bytes and treating them as malformed gzip.

A second commit adds full v4 support to the Rust decoder.


What the second commit changes

New dependency — ruzstd 0.7.3

Pure-Rust ZSTD decoder, no C bindings, compiles cleanly to WASM.

# rust/Cargo.toml (workspace)
ruzstd = { version = "0.7.3", default-features = false, features = ["std"] }

rust/spark-lib/src/decoder.rs

MultiDecoder now recognises NGSP magic at the start of a file (in addition to the existing gzip-wrapped detection):

if magic == SPZ_MAGIC {
    // NGSP magic at file start — SPZ v4 (ZSTD multi-stream, not gzip-wrapped)
    return self.init_file_type(SplatFileType::SPZ);
}

rust/spark-lib/src/spz.rs

Format detection — new SpzFormat enum; the decoder self-detects on the first 4 bytes of input:

enum SpzFormat { Unknown, Gzip, Ngsp }
  • First 4 bytes = NGSP → v4 path (accumulate raw bytes, decode via try_decode_v4())
  • First 4 bytes = gzip magic → v1–v3 path (existing streaming gzip, unchanged)

try_decode_v4() — once all bytes are buffered, parses the 32-byte NGSP header, walks the TOC, ZSTD-decompresses each attribute stream with ruzstd, then feeds decompressed bytes into the existing per-stage state machine (Centers → Alphas → Rgb → Scales → Quats → Sh):

fn try_decode_v4(&mut self) -> anyhow::Result<()> {
    // parse 32-byte header, validate magic + version
    // walk TOC: numStreams × [u64 compressedSize][u64 uncompressedSize]
    for (offset, size) in &compressed_offsets {
        let compressed = &self.raw[*offset..*offset + *size];
        let mut decoder = ruzstd::StreamingDecoder::new(compressed)?;
        decoder.read_to_end(&mut self.buffer)?;
    }
    self.init_state(version, num_splats, sh_degree, fractional_bits, flags)?;
    self.poll_sections()?;
    self.done = true;
    Ok(())
}

Quaternion branch extended to version >= 3 (was == 3), since v4 uses the same smallest-three encoding as v3.

The v1–v3 gzip path is byte-for-byte unchanged.

@lukeyreyno
Copy link
Copy Markdown

Would it be preferable to use the npm spz package?

https://www.npmjs.com/package/@adobe/spz

@udwinj
Copy link
Copy Markdown
Author

udwinj commented May 11, 2026

Thanks for the feedback @lukeyreyno. I looked at @adobe/spz carefully and concluded it isn't the right fit for this PR.

Two reasons:

1. It can't replace the Rust WASM path (the main viewer load path)

Spark has two SPZ decoders, one is in Rust:

SplatLoader → workerPool → worker.ts → rust/spark-lib/src/spz.rs (compiled to WASM)

This Rust decoder is baked into spark-worker-rs and runs inside the splat-loading worker. @adobe/spz is a JS+WASM package (~820 KB) designed to be called from the main thread / JS context. It can't be swapped into the Rust pipeline without gutting the worker architecture and shipping a second WASM module inside the worker. So the Rust-side v4 support added in this PR is necessary regardless of what we do on the TS side.

2. It would regress LOD support on the TS path

The TypeScript SpzReader / SpzWriter (used by oldWorker.ts and transcodeSpz) supports spark's flagLod extension, which is a numSplats × 2 byte childCounts block plus a numSplats × 4 byte childStarts block appended to the splat data inside the gzip stream of v1–v3 files. oldWorker.ts actively reads these via the childCounts / childStarts callbacks when spz.flagLod is set:

if (spz.flagLod) {
  const childCounts = new Uint16Array(numSplats);
  const childStarts = new Uint32Array(numSplats);
  // ... wired into parseSplats callbacks
}

flagLod is spark-specific, not part of the official SPZ spec, so @adobe/spz doesn't expose this trailer. Swapping the TS reader to @adobe/spz would silently drop child-count/child-start data for any LOD-encoded SPZ that goes through oldWorker. Working around it (gunzip ourselves with fflate, extract the trailer at a known offset, re-gzip a flag-cleared buffer before handing to @adobe/spz) is possible but adds a gunzip+regzip round-trip and a parallel code path. At that point we've kept most of the complexity and given up the simplification the swap was supposed to buy.

@asundqui
Copy link
Copy Markdown
Contributor

@dmarcos @mrxz could either of you also give this a look-through? Nothing stuck out to me... In principle I think it's great to add this support to Spark! I see there's both a Rust and TS implementation, nice. I almost feel like we should have a validation test set for file types + versions :)

Comment thread rust/spark-lib/src/spz.rs Outdated
out_pos: 0,
done: false,
raw: Vec::new(),
v4_decoded: false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag can be merged with the done flag.

Copy link
Copy Markdown
Author

@udwinj udwinj May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by merging the v4_decoded state into the existing done flow: c42cb9f

Comment thread rust/spark-lib/src/spz.rs Outdated
Comment on lines +168 to +170
if self.raw.len() < toc_end {
return Ok(()); // need more bytes
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now three points in this function that "need more bytes", the header, the table of contents and the compressed streams. Since they happen in sequence, the header and TOC are now read repeatedly while waiting on bytes from the compressed streams.

Similar to the poll_sections this could be handled as a state machine, going through thee above three parts (header, TOC, compressed streams).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5faeb6b by restructuring the v4 decoder into an explicit staged state machine:

try_decode_v4() now progresses through NeedHeader -> NeedToc -> NeedStreams -> Done, carrying parsed state forward between push() calls instead of reparsing the header and rewalking the TOC while waiting on stream bytes.

Comment thread rust/spark-lib/src/spz.rs Outdated
let fractional_bits = self.raw[13];
let flags = self.raw[14];
let num_streams = self.raw[15] as usize;
let toc_byte_offset = read_u32_le(&self.raw[16..20]) as usize;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the v4 header is building upon the header format from v1-3, it should be doable to extend and re-use the pre-existing poll_header method reducing code duplication. The num_streams and toc_byte_offset can be added to the state.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

37fdc4b - Extracted the shared 15-byte SPZ header prefix parsing into a common parse_common_header() helper that is now used by both the legacy gzip poll_header() path and the v4 parse_v4_header() flow.

That consolidates the shared magic/version/metadata parsing while still letting each path own its version validation and any format-specific fields (num_streams, toc_byte_offset, gzip-specific handling, etc.) beyond the common prefix.

I'd prefer not to add num_streams and toc_byte_offset to the shared state. They're v4-internal plumbing used only by walk_v4_toc to compute compressed stream offsets, then never referenced again. The downstream section state machine (SpzDecoderState) doesn't need them to decode Centers/Alphas/Rgb/Scales/Quats/SH.

Putting them on SpzDecoderState would add two permanentlyNone/zero fields on every v1 to v3 decode and logically couple "where the v4 file lays out its streams" to "how to decode splat bytes" without functional benefit. Both fields currently live on V4HeaderInfo (inside V4Stage::NeedToc / V4Stage::NeedStreams), which matches their natural lifetime.

Happy to revisit if you feel strongly, or if there is something I'm missing.

Comment thread rust/spark-lib/src/spz.rs Outdated
self.format = SpzFormat::Ngsp;
// Try to decode if we already have enough bytes.
self.try_decode_v4()?;
return Ok(());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling try_decode_v4 (and poll_decompress in the GZIP path) here, it'd be cleaner if this SpzFormat::Unknown block would focus on format detection, set self.format and simply fall-through to the match self.format code below.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b836655 - Restructured push() into explicit append/advance phases per your suggestion. The Unknown block now only handles format detection and buffer placement, then falls through to a single dispatch match.

Comment thread rust/spark-lib/src/spz.rs Outdated
Comment on lines +719 to +720
// Force a decode attempt; will error if file is truncated.
self.try_decode_v4()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this method call? AFAICT between the last push and this finish there isn't going to be any new data to try and decode/parse. Either the stream is incomplete or the decoder stage isn't in a terminal state, both cases will already be handled.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as this is a no-op: ecc2ae5

@mrxz
Copy link
Copy Markdown
Collaborator

mrxz commented May 18, 2026

New dependency — ruzstd 0.7.3

Pure-Rust ZSTD decoder, no C bindings, compiles cleanly to WASM.

Since I'm not too familiar with the Rust ecosystem, why prefer ruzstd over the zstd crate? Does zstd not compile to WASM? From what I gather from ruzstd's own description it's still somewhat work in progress and hasn't reached performance parity yet (emphasis mine):

Measuring with the 'time' utility the original zstd and my decoder both decoding the same enwik9.zst file from a ramfs, my decoder is about 3.5 times slower. Enwik9 is highly compressible, for less compressible data (like a ubuntu installation .iso) my decoder comes close to only being 1.4 times slower.

Of course the performance characteristics might be different when compiled to WASM, but if there is a clear speed benefit to the zstd crate, I'd rather go for that.


SPZv4 has added support for 4th degree spherical harmonics. Spark does not support this, so these files should either be rejected or handled in such a way that the 0-3 SH coefficients are extracted from the stream. Haven't tested it myself with this PR (don't have a sample .spz file with SH degree 4 at hand), but based on the code I believe it'll go wrong at the moment.

As @asundqui mentions, ideally we'd have a set of validation files. The spz repo does have sample files, but these don't appear to be v4. I did see this issue on their repo: nianticlabs/spz#87. If such files are added in the future we might be able to use them.

Comment thread rust/spark-lib/src/spz.rs Outdated
let num_splats = read_u32_le(&self.raw[8..12]) as usize;
let sh_degree = self.raw[12] as usize;
let fractional_bits = self.raw[13];
let flags = self.raw[14];
Copy link
Copy Markdown
Collaborator

@mrxz mrxz May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the official spz library, we should detect if extensions are used for the file and log a warning if so, see https://github.com/nianticlabs/spz/blob/7ae1621e54e4b42c3c9c192b366d09116e558e19/src/cc/load-spz.cc#L676-L680

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in - a6ed593

udwinj added 6 commits May 18, 2026 09:54
The two booleans always carried the same value on the v4 path, so the
extra flag was dead weight. finish() and try_decode_v4() now both check
self.done for both gzip and v4 streams.
Previously try_decode_v4() reparsed the 32-byte header and rewalked the
TOC on every push() while bytes were still trickling in for the
compressed streams. Refactor to mirror poll_sections: a V4Stage enum
(NeedHeader / NeedToc / NeedStreams / Done) carries parsed outputs
between calls, so each stage runs exactly once.

Header parsing and TOC walking move into free helpers parse_v4_header
and walk_v4_toc, sitting next to parse_gzip_header.
The first 15 bytes of the v1-v3 (gzip) and v4 (NGSP) SPZ headers share
an identical layout: magic, version, numSplats, shDegree, fractionalBits,
flags. Pull the magic check + 5-field read into a single
parse_common_header() helper used by both poll_header (legacy gzip path)
and parse_v4_header. Each caller still owns its own version-range check
and any header-specific fields beyond byte 15.
The format-detection block previously called try_decode_v4 (or
poll_decompress) and returned early, duplicating the decode-advance
calls in the steady-state match below. Restructure into two phases:
(1) get incoming bytes into the format-appropriate buffer, with the
Unknown branch focused solely on format detection and buffer placement,
and (2) advance the decoder via a single match that dispatches to
poll_decompress or try_decode_v4. Both functions are now called from
exactly one site within push().
No new bytes arrive between the last push() and finish(), and every
push() already calls try_decode_v4() at the end of its dispatch match.
The v4 state machine is therefore as advanced as the buffered data
permits by the time finish() runs, so the extra call was a no-op. The
self.done check alone is sufficient to detect truncation.
Mirror the reference SPZ library's behaviour: detect the
FLAG_HAS_EXTENSIONS (0x02) bit in the v4 header's flag byte and emit an
[SPZ WARNING] to stderr if it is set, then continue decoding.
@udwinj
Copy link
Copy Markdown
Author

udwinj commented May 21, 2026

New dependency — ruzstd 0.7.3

Pure-Rust ZSTD decoder, no C bindings, compiles cleanly to WASM.

Since I'm not too familiar with the Rust ecosystem, why prefer ruzstd over the zstd crate? Does zstd not compile to WASM? From what I gather from ruzstd's own description it's still somewhat work in progress and hasn't reached performance parity yet (emphasis mine):

Measuring with the 'time' utility the original zstd and my decoder both decoding the same enwik9.zst file from a ramfs, my decoder is about 3.5 times slower. Enwik9 is highly compressible, for less compressible data (like a ubuntu installation .iso) my decoder comes close to only being 1.4 times slower.

Of course the performance characteristics might be different when compiled to WASM, but if there is a clear speed benefit to the zstd crate, I'd rather go for that.

SPZv4 has added support for 4th degree spherical harmonics. Spark does not support this, so these files should either be rejected or handled in such a way that the 0-3 SH coefficients are extracted from the stream. Haven't tested it myself with this PR (don't have a sample .spz file with SH degree 4 at hand), but based on the code I believe it'll go wrong at the moment.

As @asundqui mentions, ideally we'd have a set of validation files. The spz repo does have sample files, but these don't appear to be v4. I did see this issue on their repo: nianticlabs/spz#87. If such files are added in the future we might be able to use them.

ruzstd vs zstd

@mrxz could use your input here. I originally went with ruzstd because of toolchain simplicity: it's pure Rust and compiles cleanly to wasm32-unknown-unknown with no additional build dependencies. rustup target add wasm32-unknown-unknown is all that's needed. The zstd crate's libzstd-sys calls a C compiler in its build script and for the wasm32-unknown-unknown target requires a clang that targets wasm32 (either wasi-sdk or Emscripten), which isn't currently set up in spark's rust/build_wasm.js / wasm-pack flow.

Are we okay with switching to the zstd crate (which will make the toolchain more complex but likely speed up SPZ?). The proposed toolchain changes:

  • Add wasi-sdk or Emscripten as a build prerequisite documented in the repo
  • Update rust/build_wasm.js / CI to make the C-to-wasm32 compiler available before invoking wasm-pack

SH degree 4

This is already handled in the Rust path. SpzDecoderState::new rejects sh_degree > 3 with anyhow::anyhow!("Invalid SH degree: {}", sh_degree), and init_state calls into that on the v4 path before any SH bytes are touched. So a v4 file with sh_degree == 4 fails fast with an error.

I'll improve the error message to be more specific about v4/SH degree 4 (something like "SPZ SH degree 4 is not supported by this decoder (handles 0–3)").

Extracting degrees 0–3 from a degree-4 file by consuming and discarding the degree-4 bytes in the section state machine is doable but non-trivial and not something I'd want to bundle into this PR. Happy to follow up with a separate PR for that if there's appetite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants