Add challenge 93: Llama Transformer Block (Hard) by claude[bot] · Pull Request #244 · AlphaGPU/leetgpu-challenges

claude · 2026-04-11T04:29:44Z

Summary

Adds challenge 93: Llama Transformer Block at Hard difficulty
Implements a complete Llama-style transformer decoder block as a self-contained inference kernel challenge

What the challenge covers

The solver must implement a full Llama-style transformer block with:

RMSNorm (no mean subtraction, no bias) — distinguishes Llama from GPT-2
Grouped Query Attention (GQA) with 8 Q heads and 2 KV heads (4x compression) — standard in Llama 2/3, Mistral, Gemma
Rotary Position Embeddings (RoPE) applied to Q and K — precomputed cos/sin tables passed as inputs
Causal masking — upper-triangular -inf mask
SwiGLU FFN (gate × up with SiLU activation, then down projection) — distinguishes Llama from GPT-2's GELU FFN
Two residual connections

Why this is interesting

This is the spiritual successor to challenge 74 (GPT-2 Transformer Block) updated to reflect the 2023–2024 state-of-the-art architecture. It forces solvers to implement all the building blocks of modern open-weight LLMs (Llama 2/3, Mistral, Gemma) in a single kernel. Key GPU programming concepts exercised:

Memory layout design (packed weight buffer, GQA head broadcasting)
Parallel reduction (online softmax for causal attention)
Fused computation (avoid materializing intermediate tensors)
Warp-level primitives (optional, for the attention kernel)

Architecture constants

Parameter	Value
`d_model`	512
`n_q_heads`	8
`n_kv_heads`	2
`head_dim`	64
`ffn_hidden`	1,408
Performance `seq_len`	2,048

Test plan

All 6 starter files compile/run without errors
pre-commit run --all-files passes
Validated with run_challenge.py --action run (example test) ✓
Validated with run_challenge.py --action submit (all functional + performance tests) ✓
Checklist in CLAUDE.md verified (html starts with <p>, has <h2> sections, correct example, constraints bullet matches performance test, SVG with dark theme, no solution file committed)

🤖 Generated with Claude Code

Adds a complete Llama-style transformer decoder block challenge that requires implementing RMSNorm, Grouped Query Attention with RoPE, causal masking, and a SwiGLU FFN — mirroring modern LLM inference kernels (Llama 2/3, Mistral, Gemma). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners April 11, 2026 04:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 93: Llama Transformer Block (Hard)#244

Add challenge 93: Llama Transformer Block (Hard)#244
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-93-llama-transformer-block

claude bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude bot commented Apr 11, 2026

Summary

What the challenge covers

Why this is interesting

Architecture constants

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants