〰️
A collective for compression innovation, benchmarks, and modern corpus engineering.
- Tokenization & Benchmarks — pushing the edge on LLM prep, efficiency, and comparability.
- Open tools — simple, auditable tokenizers and compression workflows.
- Corpus tech — scalable, transparent data processing for research and production.
Efficiency is our baseline. Clarity is our default. Openness is our method.
Curious?
Explore our code, open an issue, or suggest a benchmark.
Collaboration welcomed.
contact [sombrero.verde+gitr] at the protonic mail system. (quantum particle name) .me