Skip to content

Add challenge 92: Decaying Causal Attention (Medium)#243

Open
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-92-decaying-causal-attention
Open

Add challenge 92: Decaying Causal Attention (Medium)#243
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-92-decaying-causal-attention

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Apr 9, 2026

Summary

  • Adds challenge 92: Decaying Causal Attention (Medium difficulty)
  • Implements the parallel form of the Retention mechanism from RetNet — causal unnormalized attention where each position attends to all past positions with geometrically decaying weights: output[n] = Σ_{m≤n} γ^(n-m) · (Q[n]·K[m]/√d) · V[m]
  • Unlike softmax attention, there is no normalization step, and the decay is exponential rather than learned

What solvers learn

  • Triangular memory access patterns — causal computation with position-dependent weight decay
  • On-the-fly decay computation — computing γ^(n-m) efficiently without materializing the full decay matrix
  • Tiled accumulation strategies — processing the causal sum in tiles to exploit shared memory

Challenge details

  • Number: 92
  • Difficulty: Medium
  • Inputs: Q, K, V each [seq_len, d_model] float32, scalar gamma ∈ (0, 1]
  • Performance test: seq_len=4,096, d_model=64 (typical LLM head dimension)
  • 10 functional test cases covering edge cases, power-of-2, non-power-of-2, zero inputs, negative values, gamma=1.0 (no decay), and a realistic 100-token sequence

Relation to existing challenges

Distinct from all existing and pending attention challenges:

  • Challenge 6 (Softmax Attention): uses softmax normalization, no decay
  • Challenge 53 (Causal Attention): causal binary mask, softmax normalized
  • Challenge 82 (Linear Recurrence): 1D scalar state, not matrix attention
  • PR [Feat] Check Duplicated Indexs CI before deployment #89 (Flash Attention): standard scaled dot-product attention, prefill phase

Test plan

  • pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
  • run_challenge.py validation passed on NVIDIA TESLA T4

🤖 Generated with Claude Code

Implements the core computation of the Retention mechanism (RetNet):
causal unnormalized attention with geometric decay weights. Each position
n attends to all past positions m <= n with weight gamma^(n-m), requiring
solvers to reason about triangular memory access patterns, on-the-fly
decay factor computation, and tiled accumulation strategies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants