A comprehensive visualization and analysis tool for Triton kernel compilation and launch — helping developers analyze, debug, and understand Triton kernel compilation processes.
- 🚀 Launch Difference Analysis - Detect and visualize kernel launch parameter variations
- 📊 IR Code View - Side-by-side IR viewing with synchronized highlighting and line mapping
- 🔄 File Diff View - Compare kernels across different trace files side-by-side
- 📝 Multi-format IR Support - View TTGIR, TTIR, LLIR, PTX, and AMDGCN
- 🎯 Interactive Code Views - Click-to-highlight corresponding lines across IR stages
- 🔄 Standalone Script Generation - Extract any kernel into a self-contained Python script
- 💾 Tensor Data Reconstruction - Preserve actual tensor data or use statistical approximation
- 🎯 Custom Templates - Flexible reproducer templates for different workflows
- 🐛 Bug Isolation - Share reproducible test cases for debugging and collaboration
- 📝 Compilation & Launch Tracing - Capture detailed events with source mapping
- 🔍 Stack Trace Integration - Full Python stack traces for debugging
- 📈 Metadata Extraction - Comprehensive kernel statistics
- 🌐 Browser-based Interface - No installation required, works in your browser
- 🔒 Privacy-first - All processing happens locally, no data uploaded
Four options to install:
# install nightly version
pip install -U --pre tritonparse
# install stable version
pip install tritonparse
# install from source
git clone https://github.com/meta-pytorch/tritonparse.git
cd tritonparse
pip install -e .
# pip install the latest version from github
pip install git+https://github.com/meta-pytorch/tritonparse.gitPrerequisites: Python ≥ 3.10, Triton ≥ 3.4.0, GPU required (NVIDIA/AMD)
TritonParse relies on new features in Triton. If you're using nightly PyTorch, Triton is already included. Otherwise, install the latest Triton:
pip install tritonimport tritonparse.structured_logging
import tritonparse.parse.utils
# Initialize logging with full tracing options
tritonparse.structured_logging.init(
"./logs/",
enable_trace_launch=True, # Capture kernel launch events (enables torch.compile tracing automatically)
enable_more_tensor_information=True, # Optional: collect tensor statistics (min/max/mean/std)
)
# Your Triton/PyTorch code here
# ... your kernels ...
# Parse and generate trace files
tritonparse.parse.utils.unified_parse("./logs/", out="./parsed_output")💡 Note:
enable_trace_launch=Trueautomatically enables tracing for both native Triton kernels (@triton.jit) andtorch.compile/ TorchInductor kernels.
📝 Example output (click to expand)
================================================================================
📁 TRITONPARSE PARSING RESULTS
================================================================================
📂 Parsed files directory: /scratch/findhao/tritonparse/tests/parsed_output
📊 Total files generated: 2
📄 Generated files:
1. 📝 dedicated_log_triton_trace_findhao__mapped.ndjson.gz (7.2KB)
2. 📝 log_file_list.json (181B)
================================================================================
✅ Parsing completed successfully!
================================================================================Visit https://meta-pytorch.org/tritonparse/ and open your local trace files (.ndjson.gz format).
🔒 Privacy Note: Your trace files are processed entirely in your browser - nothing is uploaded to any server!
Extract any kernel into a standalone, executable Python script for debugging or testing:
# Generate reproducer for the first launch event
# (--line is 0-based: line 0 is compilation event, line 1 is first launch event)
tritonparseoss reproduce ./parsed_output/trace.ndjson.gz --line 1 --out-dir repro_output
# Run the generated reproducer
cd repro_output/<kernel_name>/
python repro_*.pyPython API:
from tritonparse.reproducer.orchestrator import reproduce
result = reproduce(
input_path="./parsed_output/trace.ndjson.gz",
line_index=0, # 0-based index (first event is 0)
out_dir="repro_output"
)🎯 Common Reproducer Use Cases (click to expand)
- 🐛 Bug Isolation: Extract a failing kernel into a minimal standalone script
- ⚡ Performance Testing: Benchmark specific kernels without running the full application
- 🤝 Team Collaboration: Share reproducible test cases with colleagues or in bug reports
- 📊 Regression Testing: Compare kernel behavior and performance across different versions
- 🔍 Deep Debugging: Modify and experiment with kernel parameters in isolation
| 📖 Guide | Description |
|---|---|
| 🏠 Wiki Home | Complete documentation and quick navigation |
| 📦 Installation | Setup guide for all scenarios |
| 📋 Usage Guide | Complete workflow, reproducer generation, and examples |
| 🌐 Web Interface | Master the visualization interface |
| 🔧 Developer Guide | Contributing and architecture overview |
| 📝 Code Formatting | Formatting standards and tools |
| ❓ FAQ | Quick answers and troubleshooting |
| ⚙️ Environment Variables | Complete environment variable reference |
| 📖 Python API Reference | Full API documentation |
| 🔄 Reproducer Guide | Comprehensive kernel reproducer guide |
TritonParse visualizes the complete Triton compilation pipeline:
Python Source → TTIR → TTGIR → LLIR → PTX/AMDGCN
Each stage can be inspected and compared to understand optimization transformations.
We welcome contributions! Please see our Developer Guide for:
- Development setup and prerequisites
- Code formatting standards (Formatting Guide)
- Pull request and code review process
- Testing guidelines
- Architecture overview
- 🐛 Report Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📚 Documentation: TritonParse Wiki
This project is licensed under the BSD-3 License - see the LICENSE file for details.
✨ Ready to get started? Visit our Installation Guide or try the online tool directly!