Skip to content

Butterfly warp-reduction for packet scatter-reduce#500

Open
lnuic wants to merge 1 commit into
masterfrom
scatter_packet_local
Open

Butterfly warp-reduction for packet scatter-reduce#500
lnuic wants to merge 1 commit into
masterfrom
scatter_packet_local

Conversation

@lnuic
Copy link
Copy Markdown
Contributor

@lnuic lnuic commented May 18, 2026

Adds scatter_reduce_packet_dynamic, the runtime-N analogue of gather_packet_dynamic.

Bumps ext/drjit-core for the butterfly packet path.

@wjakob
Copy link
Copy Markdown
Member

wjakob commented May 19, 2026

Some (unordered) feedback:

I think that the packet atomics in OptiX aren't supported in all driver versions.
I was surprised that the guards just gate on "ts->compute_capability >= 70" -- are you sure that's correct?

Some of the comments feel a bit AI-generated, e.g. the "amortizing .. scaffolding" and "mirrors" .. are mentioned several times. One comment even mentions pixels in the Dr.Jit-Core PTX generation:

// Butterfly warp-reduction over the whole packet collapses up to 32
// lanes' contributions targeting the same pixel down to a single set

@wjakob
Copy link
Copy Markdown
Member

wjakob commented May 20, 2026

(Updated the comment). Note that this is actually mostly about the Dr.Jit-Core parts.

@lnuic lnuic force-pushed the scatter_packet_local branch from d096ba3 to df57807 Compare May 25, 2026 12:56
@lnuic lnuic force-pushed the scatter_packet_local branch from df57807 to 8304ded Compare May 25, 2026 13:07
@lnuic
Copy link
Copy Markdown
Contributor Author

lnuic commented May 25, 2026

The cc >= 70 / ptx >= 63 gate is for match.any.sync (used for pre-aggregation), not for the vector atomics.
Vector atomics use the same supports_vector_reduction check as before; if they're unavailable the butterfly still runs and the leader just emits N scalar atomics.
Also cleaned up the comments.

@lnuic lnuic marked this pull request as ready for review May 25, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants