Skip to content

Add challenge 91: MoE Token Dispatch (Medium)#242

Open
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-91-moe-token-dispatch
Open

Add challenge 91: MoE Token Dispatch (Medium)#242
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-91-moe-token-dispatch

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Apr 7, 2026

Summary

  • Adds challenge 91: MoE Token Dispatch (Medium difficulty)
  • Teaches stable parallel scatter — given T tokens and their expert assignments, pack each token's feature vector into the correct per-expert buffer [E, capacity, D], preserving the original token ordering within each expert's group
  • Natural sequel to challenge 67 (MoE Top-K Gating): that challenge picks which expert each token goes to; this challenge dispatches the tokens there
  • Requires solvers to think about: parallel histogram (atomic counts), work distribution across heterogeneous token loads, and stable scatter without race conditions

Challenge design

  • Input: x [T, D] float32, expert_idx [T] int32 (0 … E-1), scalar dims T, D, E, capacity = T
  • Output: dispatched_x [E, capacity, D] float32, token_counts [E] int32
  • Ordering constraint: within each expert's buffer, tokens appear in ascending original-index order (stable dispatch)
  • Difficulty: Medium — empty solve stub for all 6 frameworks
  • Performance test: T = 16,384, D = 512, E = 8 (fits well within 16 GB T4 VRAM)

Test plan

  • All 6 starter files present (.cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo)
  • 10 functional test cases: single token, empty experts, one-per-expert, skewed distribution, power-of-2 sizes, non-power-of-2 sizes, realistic size, zero inputs
  • Validated with run_challenge.py --action submit → ✓ All tests passed
  • pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
  • Checklist in CLAUDE.md fully reviewed

🤖 Generated with Claude Code

Teaches stable scatter / parallel dispatch — a key MoE inference
building block that follows naturally from challenge 67 (MoE Top-K Gating).
Solvers must pack T tokens into per-expert buffers [E, capacity, D],
preserving original token order within each expert group.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants