Skip to content

Add multi-GPU vector addition challenge + script support#236

Open
kunal-mansukhani wants to merge 2 commits intomainfrom
kunal/multi-gpu-challenges
Open

Add multi-GPU vector addition challenge + script support#236
kunal-mansukhani wants to merge 2 commits intomainfrom
kunal/multi-gpu-challenges

Conversation

@kunal-mansukhani
Copy link
Copy Markdown
Contributor

Summary

New Pro-only multi-GPU test challenge at challenges/easy/100_multi_gpu_vector_add (num_gpus=2), with starter + solution for all 5 accelerated languages (PyTorch, CUDA, Triton, CuTe, JAX). Used to validate the multi-GPU runner path end-to-end in the companion infra PR (AlphaGPU/leetgpu-infra#333).

Plus two script updates that were needed to make multi-GPU submissions work via run_challenge.py.

Changes

New challenge: 100_multi_gpu_vector_add

  • challenge.py: num_gpus=2, access_tier="pro". Reference implementation is single-device (torch.add(A, B, out=C)) — the runner runs it independently on each rank and validates per-rank outputs.
  • challenge.html: spec explaining the fully-replicated input/output model and the env vars exposed to solve() (RANK, WORLD_SIZE, LOCAL_RANK, LEETGPU_NCCL_ID_FILE).
  • Starters + solutions for PyTorch (dist.all_reduce), CUDA (NCCL via LEETGPU_NCCL_ID_FILE), Triton (@triton.jit slice kernel + dist.all_reduce), CuTe (plain Python host + dist.all_reduce), JAX (trivial, validates jax.distributed.initialize).

scripts/update_challenges.py

  • Forward num_gpus from ChallengeBase to the backend (field was declared but never propagated).

scripts/run_challenge.py

  • Auto-detect num_gpus from challenge.py via regex and send gpuCount in the submission payload; new --gpu-count override flag.
  • Prefer language-tagged solution filenames (solution.triton.py, solution.cute.py, solution.jax.py) so multiple Python-based languages can coexist in one solution/ dir. Falls back to solution.<ext>.
  • Print all stdout + stderr WebSocket frames (was silently dropping type=stderr output).
  • Terminate on test-case-failed / compilation-failed / tampering-detected / out-of-memory in addition to the previous terminal statuses.

Test plan

All validated end-to-end against the companion infra branch on real Modal hardware:

  • PyTorch — run + full submit on 2× T4 and 4× T4
  • CUDA — run + full submit on 2× T4
  • Triton — run + full submit on 2× T4
  • CuTe — run + full submit on 2× T4
  • JAX — run + full submit on 2× T4

Notes

  • The JAX solution is intentionally trivial (each rank computes the full result locally). It validates that jax.distributed.initialize succeeds and the output flows back correctly; a real distributed JAX pattern would use jax.experimental.multihost_utils / pjit.
  • This PR depends on AlphaGPU/leetgpu-infra#333 — without the infra changes, the script will try to submit with gpuCount=2 and the server will reject/coerce it.

🤖 Generated with Claude Code

New Pro-only challenge challenges/easy/100_multi_gpu_vector_add
(num_gpus=2) with starter + solution for all 5 accelerated languages
(PyTorch, CUDA, Triton, CuTe, JAX). Validates the multi-GPU runner
path end-to-end on 2x and 4x T4 via ncclAllReduce / dist.all_reduce.

scripts/update_challenges.py: forward num_gpus from ChallengeBase to
the backend (was previously unused).

scripts/run_challenge.py:
- Auto-detect num_gpus from challenge.py and send gpuCount in the
  submission payload (with a --gpu-count override flag).
- Prefer language-tagged solution filenames (solution.triton.py,
  solution.cute.py, etc.) so multiple Python-based languages can
  coexist in one solution/ directory.
- Print all stdout + stderr websocket frames, and terminate on
  test-case-failed / compilation-failed / out-of-memory.

Companion PR on leetgpu-infra: AlphaGPU/leetgpu-infra#333.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses a code review finding: the previous `parse_num_gpus` helper
matched any `num_gpus=N` occurrence in the source, which breaks on
comments, docstrings, or non-literal assignments. Mirror the loading
dance from scripts/update_challenges.py: importlib.spec the module,
instantiate Challenge(), read num_gpus as an attribute.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant