Add multi-GPU vector addition challenge + script support#236
Open
kunal-mansukhani wants to merge 2 commits intomainfrom
Open
Add multi-GPU vector addition challenge + script support#236kunal-mansukhani wants to merge 2 commits intomainfrom
kunal-mansukhani wants to merge 2 commits intomainfrom
Conversation
New Pro-only challenge challenges/easy/100_multi_gpu_vector_add (num_gpus=2) with starter + solution for all 5 accelerated languages (PyTorch, CUDA, Triton, CuTe, JAX). Validates the multi-GPU runner path end-to-end on 2x and 4x T4 via ncclAllReduce / dist.all_reduce. scripts/update_challenges.py: forward num_gpus from ChallengeBase to the backend (was previously unused). scripts/run_challenge.py: - Auto-detect num_gpus from challenge.py and send gpuCount in the submission payload (with a --gpu-count override flag). - Prefer language-tagged solution filenames (solution.triton.py, solution.cute.py, etc.) so multiple Python-based languages can coexist in one solution/ directory. - Print all stdout + stderr websocket frames, and terminate on test-case-failed / compilation-failed / out-of-memory. Companion PR on leetgpu-infra: AlphaGPU/leetgpu-infra#333. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses a code review finding: the previous `parse_num_gpus` helper matched any `num_gpus=N` occurrence in the source, which breaks on comments, docstrings, or non-literal assignments. Mirror the loading dance from scripts/update_challenges.py: importlib.spec the module, instantiate Challenge(), read num_gpus as an attribute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New Pro-only multi-GPU test challenge at
challenges/easy/100_multi_gpu_vector_add(num_gpus=2), with starter + solution for all 5 accelerated languages (PyTorch, CUDA, Triton, CuTe, JAX). Used to validate the multi-GPU runner path end-to-end in the companion infra PR (AlphaGPU/leetgpu-infra#333).Plus two script updates that were needed to make multi-GPU submissions work via
run_challenge.py.Changes
New challenge:
100_multi_gpu_vector_addchallenge.py:num_gpus=2,access_tier="pro". Reference implementation is single-device (torch.add(A, B, out=C)) — the runner runs it independently on each rank and validates per-rank outputs.challenge.html: spec explaining the fully-replicated input/output model and the env vars exposed tosolve()(RANK,WORLD_SIZE,LOCAL_RANK,LEETGPU_NCCL_ID_FILE).dist.all_reduce), CUDA (NCCL viaLEETGPU_NCCL_ID_FILE), Triton (@triton.jitslice kernel +dist.all_reduce), CuTe (plain Python host +dist.all_reduce), JAX (trivial, validatesjax.distributed.initialize).scripts/update_challenges.pynum_gpusfromChallengeBaseto the backend (field was declared but never propagated).scripts/run_challenge.pynum_gpusfromchallenge.pyvia regex and sendgpuCountin the submission payload; new--gpu-countoverride flag.solution.triton.py,solution.cute.py,solution.jax.py) so multiple Python-based languages can coexist in onesolution/dir. Falls back tosolution.<ext>.type=stderroutput).test-case-failed/compilation-failed/tampering-detected/out-of-memoryin addition to the previous terminal statuses.Test plan
All validated end-to-end against the companion infra branch on real Modal hardware:
run+ fullsubmiton 2× T4 and 4× T4run+ fullsubmiton 2× T4run+ fullsubmiton 2× T4run+ fullsubmiton 2× T4run+ fullsubmiton 2× T4Notes
jax.distributed.initializesucceeds and the output flows back correctly; a real distributed JAX pattern would usejax.experimental.multihost_utils/pjit.gpuCount=2and the server will reject/coerce it.🤖 Generated with Claude Code