Skip to content

Add challenge 93: Llama Transformer Block (Hard)#244

Open
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-93-llama-transformer-block
Open

Add challenge 93: Llama Transformer Block (Hard)#244
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-93-llama-transformer-block

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Apr 11, 2026

Summary

  • Adds challenge 93: Llama Transformer Block at Hard difficulty
  • Implements a complete Llama-style transformer decoder block as a self-contained inference kernel challenge

What the challenge covers

The solver must implement a full Llama-style transformer block with:

  • RMSNorm (no mean subtraction, no bias) — distinguishes Llama from GPT-2
  • Grouped Query Attention (GQA) with 8 Q heads and 2 KV heads (4x compression) — standard in Llama 2/3, Mistral, Gemma
  • Rotary Position Embeddings (RoPE) applied to Q and K — precomputed cos/sin tables passed as inputs
  • Causal masking — upper-triangular -inf mask
  • SwiGLU FFN (gate × up with SiLU activation, then down projection) — distinguishes Llama from GPT-2's GELU FFN
  • Two residual connections

Why this is interesting

This is the spiritual successor to challenge 74 (GPT-2 Transformer Block) updated to reflect the 2023–2024 state-of-the-art architecture. It forces solvers to implement all the building blocks of modern open-weight LLMs (Llama 2/3, Mistral, Gemma) in a single kernel. Key GPU programming concepts exercised:

  • Memory layout design (packed weight buffer, GQA head broadcasting)
  • Parallel reduction (online softmax for causal attention)
  • Fused computation (avoid materializing intermediate tensors)
  • Warp-level primitives (optional, for the attention kernel)

Architecture constants

Parameter Value
d_model 512
n_q_heads 8
n_kv_heads 2
head_dim 64
ffn_hidden 1,408
Performance seq_len 2,048

Test plan

  • All 6 starter files compile/run without errors
  • pre-commit run --all-files passes
  • Validated with run_challenge.py --action run (example test) ✓
  • Validated with run_challenge.py --action submit (all functional + performance tests) ✓
  • Checklist in CLAUDE.md verified (html starts with <p>, has <h2> sections, correct example, constraints bullet matches performance test, SVG with dark theme, no solution file committed)

🤖 Generated with Claude Code

Adds a complete Llama-style transformer decoder block challenge that
requires implementing RMSNorm, Grouped Query Attention with RoPE,
causal masking, and a SwiGLU FFN — mirroring modern LLM inference
kernels (Llama 2/3, Mistral, Gemma).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants