Add challenge 92: Decaying Causal Attention (Medium) by claude[bot] · Pull Request #243 · AlphaGPU/leetgpu-challenges

claude · 2026-04-09T04:47:31Z

Summary

Adds challenge 92: Decaying Causal Attention (Medium difficulty)
Implements the parallel form of the Retention mechanism from RetNet — causal unnormalized attention where each position attends to all past positions with geometrically decaying weights: output[n] = Σ_{m≤n} γ^(n-m) · (Q[n]·K[m]/√d) · V[m]
Unlike softmax attention, there is no normalization step, and the decay is exponential rather than learned

What solvers learn

Triangular memory access patterns — causal computation with position-dependent weight decay
On-the-fly decay computation — computing γ^(n-m) efficiently without materializing the full decay matrix
Tiled accumulation strategies — processing the causal sum in tiles to exploit shared memory

Challenge details

Number: 92
Difficulty: Medium
Inputs: Q, K, V each [seq_len, d_model] float32, scalar gamma ∈ (0, 1]
Performance test: seq_len=4,096, d_model=64 (typical LLM head dimension)
10 functional test cases covering edge cases, power-of-2, non-power-of-2, zero inputs, negative values, gamma=1.0 (no decay), and a realistic 100-token sequence

Relation to existing challenges

Distinct from all existing and pending attention challenges:

Challenge 6 (Softmax Attention): uses softmax normalization, no decay
Challenge 53 (Causal Attention): causal binary mask, softmax normalized
Challenge 82 (Linear Recurrence): 1D scalar state, not matrix attention
PR [Feat] Check Duplicated Indexs CI before deployment #89 (Flash Attention): standard scaled dot-product attention, prefill phase

Test plan

pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
run_challenge.py validation passed on NVIDIA TESLA T4

🤖 Generated with Claude Code

Implements the core computation of the Retention mechanism (RetNet): causal unnormalized attention with geometric decay weights. Each position n attends to all past positions m <= n with weight gamma^(n-m), requiring solvers to reason about triangular memory access patterns, on-the-fly decay factor computation, and tiled accumulation strategies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners April 9, 2026 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 92: Decaying Causal Attention (Medium)#243

Add challenge 92: Decaying Causal Attention (Medium)#243
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-92-decaying-causal-attention

claude bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude bot commented Apr 9, 2026

Summary

What solvers learn

Challenge details

Relation to existing challenges

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants