-
Notifications
You must be signed in to change notification settings - Fork 80
Expand file tree
/
Copy pathchallenge.html
More file actions
116 lines (94 loc) · 6.26 KB
/
challenge.html
File metadata and controls
116 lines (94 loc) · 6.26 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
<p>
In Mixture-of-Experts (MoE) neural networks, a router assigns each input token to one of
<code>E</code> expert networks. Before those experts can execute in parallel, tokens must be
<em>dispatched</em>: reorganized from their original sequence order into contiguous per-expert
buffers. Given <code>T</code> tokens of feature dimension <code>D</code> and their expert
assignments, fill each expert's slice of the output buffer with the tokens routed to that
expert, preserving the original token order within each expert's group.
</p>
<svg width="540" height="230" viewBox="0 0 540 230" style="display:block; margin:20px auto;" xmlns="http://www.w3.org/2000/svg">
<rect width="540" height="230" rx="8" fill="#222"/>
<!-- Title -->
<text x="270" y="22" text-anchor="middle" fill="#ccc" font-family="monospace" font-size="12">MoE Token Dispatch (T=4, E=2)</text>
<!-- Left: input tokens -->
<text x="80" y="48" text-anchor="middle" fill="#aaa" font-family="monospace" font-size="11">x [T, D]</text>
<!-- Token 0 → expert 0 (blue) -->
<rect x="20" y="55" width="120" height="28" rx="4" fill="#1a3a5c" stroke="#4a9eff" stroke-width="1.5"/>
<text x="80" y="73" text-anchor="middle" fill="#4a9eff" font-family="monospace" font-size="11">token 0 → expert 0</text>
<!-- Token 1 → expert 1 (green) -->
<rect x="20" y="90" width="120" height="28" rx="4" fill="#1a3a2a" stroke="#4adf7f" stroke-width="1.5"/>
<text x="80" y="108" text-anchor="middle" fill="#4adf7f" font-family="monospace" font-size="11">token 1 → expert 1</text>
<!-- Token 2 → expert 0 (blue) -->
<rect x="20" y="125" width="120" height="28" rx="4" fill="#1a3a5c" stroke="#4a9eff" stroke-width="1.5"/>
<text x="80" y="143" text-anchor="middle" fill="#4a9eff" font-family="monospace" font-size="11">token 2 → expert 0</text>
<!-- Token 3 → expert 1 (green) -->
<rect x="20" y="160" width="120" height="28" rx="4" fill="#1a3a2a" stroke="#4adf7f" stroke-width="1.5"/>
<text x="80" y="178" text-anchor="middle" fill="#4adf7f" font-family="monospace" font-size="11">token 3 → expert 1</text>
<!-- Arrows for expert 0 tokens -->
<line x1="140" y1="69" x2="195" y2="90" stroke="#4a9eff" stroke-width="1.5" marker-end="url(#arrowblue)"/>
<line x1="140" y1="139" x2="195" y2="110" stroke="#4a9eff" stroke-width="1.5" marker-end="url(#arrowblue)"/>
<!-- Arrows for expert 1 tokens -->
<line x1="140" y1="104" x2="195" y2="155" stroke="#4adf7f" stroke-width="1.5" marker-end="url(#arrowgreen)"/>
<line x1="140" y1="174" x2="195" y2="170" stroke="#4adf7f" stroke-width="1.5" marker-end="url(#arrowgreen)"/>
<!-- Arrow markers -->
<defs>
<marker id="arrowblue" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
<path d="M0,0 L0,6 L8,3 z" fill="#4a9eff"/>
</marker>
<marker id="arrowgreen" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
<path d="M0,0 L0,6 L8,3 z" fill="#4adf7f"/>
</marker>
</defs>
<!-- Right: expert 0 buffer -->
<text x="370" y="48" text-anchor="middle" fill="#aaa" font-family="monospace" font-size="11">dispatched_x [E, capacity, D]</text>
<text x="280" y="78" fill="#4a9eff" font-family="monospace" font-size="10">expert 0</text>
<rect x="200" y="83" width="160" height="24" rx="3" fill="#1a3a5c" stroke="#4a9eff" stroke-width="1.2"/>
<text x="280" y="99" text-anchor="middle" fill="#4a9eff" font-family="monospace" font-size="10">slot 0: token 0 features</text>
<rect x="200" y="108" width="160" height="24" rx="3" fill="#1a3a5c" stroke="#4a9eff" stroke-width="1.2"/>
<text x="280" y="124" text-anchor="middle" fill="#4a9eff" font-family="monospace" font-size="10">slot 1: token 2 features</text>
<!-- Right: expert 1 buffer -->
<text x="280" y="148" fill="#4adf7f" font-family="monospace" font-size="10">expert 1</text>
<rect x="200" y="153" width="160" height="24" rx="3" fill="#1a3a2a" stroke="#4adf7f" stroke-width="1.2"/>
<text x="280" y="169" text-anchor="middle" fill="#4adf7f" font-family="monospace" font-size="10">slot 0: token 1 features</text>
<rect x="200" y="178" width="160" height="24" rx="3" fill="#1a3a2a" stroke="#4adf7f" stroke-width="1.2"/>
<text x="280" y="194" text-anchor="middle" fill="#4adf7f" font-family="monospace" font-size="10">slot 1: token 3 features</text>
<!-- token_counts label -->
<text x="280" y="218" text-anchor="middle" fill="#888" font-family="monospace" font-size="10">token_counts = [2, 2]</text>
</svg>
<h2>Implementation Requirements</h2>
<p>
Your <code>solve</code> function signature must not be changed. No external libraries beyond
those already imported. Write outputs to <code>dispatched_x</code> and
<code>token_counts</code> in place.
</p>
<p>
For each expert <code>e</code>, <code>dispatched_x[e, :token_counts[e], :]</code> must
contain exactly the tokens assigned to expert <code>e</code>, in the same relative order
they appear in the original token sequence (i.e., token index ascending). Entries at
positions <code>token_counts[e]</code> and beyond may be left as zero.
</p>
<h2>Example</h2>
<p>
Given <code>T</code> = 4 tokens of dimension <code>D</code> = 3 routed to
<code>E</code> = 2 experts (capacity = 4):
</p>
<p>Input tokens \(x\):</p>
<p>$$x = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix}$$</p>
<p>Expert assignments:</p>
<pre>expert_idx = [0, 1, 0, 1]</pre>
<p>Output token counts:</p>
<pre>token_counts = [2, 2]</pre>
<p>\(\texttt{dispatched\_x}[0]\) — expert 0 receives tokens 0 and 2 (in original order):</p>
<p>$$\begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$</p>
<p>\(\texttt{dispatched\_x}[1]\) — expert 1 receives tokens 1 and 3 (in original order):</p>
<p>$$\begin{bmatrix} 0 & 1 & 0 \\ 1 & 1 & 0 \end{bmatrix}$$</p>
<h2>Constraints</h2>
<ul>
<li>1 ≤ <code>T</code> ≤ 65,536</li>
<li>1 ≤ <code>D</code> ≤ 2,048</li>
<li>2 ≤ <code>E</code> ≤ 64</li>
<li><code>capacity</code> = <code>T</code> (always sufficient for any routing assignment)</li>
<li>All values in <code>expert_idx</code> are in the range [0, <code>E</code> − 1]</li>
<li><code>x</code> values are <code>float32</code>; <code>expert_idx</code> and <code>token_counts</code> are <code>int32</code></li>
<li>Performance is measured with <code>T</code> = 16,384, <code>D</code> = 512, <code>E</code> = 8</li>
</ul>