Skip to content
@OKCompressor

OKCompressor

〰️

OKC

A collective for compression innovation, benchmarks, and modern corpus engineering.


Projects

  • Tokenization & Benchmarks — pushing the edge on LLM prep, efficiency, and comparability.
  • Open tools — simple, auditable tokenizers and compression workflows.
  • Corpus tech — scalable, transparent data processing for research and production.

Efficiency is our baseline. Clarity is our default. Openness is our method.


Curious?
Explore our code, open an issue, or suggest a benchmark.
Collaboration welcomed.

contact [sombrero.verde+gitr] at the protonic mail system. (quantum particle name) .me

Pinned Loading

  1. core core Public

    Python

  2. dumb_pre dumb_pre Public

    Python 1

  3. redumb redumb Public

    rust port of OKC dumb_pre

    Rust 1

  4. tiktoken_benchmark tiktoken_benchmark Public

    Python

  5. bench-pre-v2 bench-pre-v2 Public

    Python

Repositories

Showing 8 of 8 repositories

Top languages

Loading…

Most used topics

Loading…