triton-kernels

Star

Here are 22 public repositories matching this topic...

linkedin / Liger-Kernel

Star

Efficient Triton Kernels for LLM Training

triton llama hacktoberfest mistral finetuning llms llm-training llama3 phi3 gemma2 triton-kernels

Updated Apr 18, 2026
Python

flagos-ai / FlagGems

Star

FlagGems is an operator library for large language models implemented in the Triton Language.

pytorch triton triton-kernels

Updated Apr 22, 2026
Python

harleyszhang / llm_note

Star

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

cuda-programming transformer-models kv-cache llm vllm llm-inference triton-kernels

Updated Apr 16, 2026
Python

harleyszhang / lite_llama

Star

A light llama-like llm inference framework based on the triton kernel.

python3 attention llama llm llm-inference llama3 llava-llama3 triton-kernels qwen2-5

Updated Jan 5, 2026
Python

NX-AI / mlstm_kernels

Star

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

deep-learning rnn llm xlstm triton-kernels

Updated Mar 27, 2026
Jupyter Notebook

ddickmann / vllm-factory

Star

Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment

retrieval encoder inference embeddings ner serving rag colbert ml-infra-deployments vllm gliner multimodal-rag ai-infrastructure colpali triton-kernels vllm-plugins

Updated Apr 21, 2026
Python

stackav-oss / conch

Star

A "standard library" of Triton kernels.

amd cuda inference nvidia rocm triton-lang vllm bitsandbytes triton-kernels

Updated Oct 2, 2025
Python

WithNucleusAI / mHC-triton

Star

Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training

deepseek triton-kernels hyper-connections

Updated Mar 10, 2026
Python

kyolebu / triton-misadventures

Star

Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.

gpu-acceleration triton-lang triton-kernels

Updated Feb 26, 2026
Jupyter Notebook

High-performance late-interaction retrieval engine for on-prem AI. ColBERT/ColPali multi-vector search with Rust fused MaxSim, Triton GPU kernels, ROQ quantization, LEMUR routing, WAL-backed CRUD, and a FastAPI server — single machine, CPU or GPU.

semantic-search-engine on-premise colbert-ai multivector quantizations retrieval-augmented-generation vector-databases rag-pipeline multimodal-rag colpali triton-kernels late-interaction multivector-search multivector-embeddings

Updated Apr 21, 2026
Python

NeuroBrix / neurobrix

Star

Universal AI Runtime — Execute any model on any hardware

ai pytorch inference-engine aten triton-kernels

Updated Apr 20, 2026
Python

xmc-aalto / elmo

Star

Official Code for the paper ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces (in ICML 2025)

chunking multi-label-classification extreme-classification low-precision-training large-output-space float8 triton-kernels gradient-fusion

Updated Feb 27, 2026
Python

debashishc / kernelheim

Star

KernelHeim – development ground of custom Triton and CUDA kernel functions designed to optimize and accelerate machine learning workloads on NVIDIA GPUs. Inspired by the mythical stronghold of the gods, KernelHeim is a forge where high-performance kernels are crafted to unlock the full potential of the hardware.

cuda-kernels parallel-programming triton-kernels

Updated Apr 15, 2026
Python

LessUp / diy-flash-attention

Star

🚀 Learn GPU Programming by Implementing FlashAttention from Scratch | 从零实现 FlashAttention，掌握 Triton GPU 编程

Updated Apr 21, 2026
Python

ayoussf / TritonTorch

Star

A container of various PyTorch neural network modules written in Triton.

deep-learning cuda torch pytorch openai triton triton-lang triton-kernels

Updated Mar 31, 2026
Python

simboco / flash-linear-attention

Star

💥 Optimize linear attention models with efficient Triton-based implementations in PyTorch, compatible across NVIDIA, AMD, and Intel platforms.

reinforcement-learning computer-vision deep-learning transformers pytorch rnn attention language-model flame machine-learning-systems sequence-modeling fast-weight-programmers large-language-models llm triton-lang xlstm triton-kernels key-value-memory

Updated Apr 22, 2026
Python

tmjoshi / triton_FA

Star

FlashAttention2 Analysis in Triton

cuda-kernels triton-kernels flashattention

Updated Oct 24, 2025
Python

RGenDiff / rgd-triton

Star

Collection of Triton operators for transformer models.

pytorch transformer triton attention normalization gpu-optimization pytorch-implementation triton-kernels pytorch-optimization

Updated Jun 25, 2025
Python

lucascogrossi / triton

Star

Repository for learning Triton GPU programming

pytorch triton gpu-programming triton-lang triton-kernels

Updated Oct 24, 2025
Python

kirsten-1 / hilda-kernel

Star

High-performance Triton kernel library for LLM training with 12 fused operators (AttnRes, RMSNorm, RoPE, CrossEntropy, GRPO, JSD, FusedLinear, etc.) — up to 24x faster than PyTorch with 78% memory savings, outperforming Liger-Kernel on RTX 5090

triton triton-kernels triton-inference triton-train

Updated Mar 21, 2026
Python

Improve this page

Add a description, image, and links to the triton-kernels topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the triton-kernels topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton-kernels

Here are 22 public repositories matching this topic...

linkedin / Liger-Kernel

flagos-ai / FlagGems

harleyszhang / llm_note

harleyszhang / lite_llama

NX-AI / mlstm_kernels

ddickmann / vllm-factory

stackav-oss / conch

WithNucleusAI / mHC-triton

kyolebu / triton-misadventures

ddickmann / colsearch

NeuroBrix / neurobrix

xmc-aalto / elmo

debashishc / kernelheim

LessUp / diy-flash-attention

ayoussf / TritonTorch

simboco / flash-linear-attention

tmjoshi / triton_FA

RGenDiff / rgd-triton

lucascogrossi / triton

kirsten-1 / hilda-kernel

Improve this page

Add this topic to your repo