CLI utility to transcribe audio or video to text using ASR (Automatic Speech Recognition).
A single self-contained script that auto-detects GPU availability and uses the appropriate backend.
| Backend | Hardware | Models |
|---|---|---|
| OpenAI Whisper | CPU/GPU | tiny, base, small, medium, large |
| Distil-Whisper | CPU/GPU | distil-large-v3, distil-medium.en |
| Cohere Transcribe 2B | CPU/GPU | 14 languages |
| NVIDIA NeMo ASR | GPU | Canary-1B-v2, Canary-Qwen-2.5B, Parakeet-0.6B |
Input can be a local file (any format ffmpeg supports — mp3, wav, flac, mp4, mkv, webm, ...), a URL, or a YouTube video ID.
- Single command — auto-detects GPU and uses the right backend
- Accepts audio or video in any format ffmpeg can read
- Transcribe YouTube videos by URL or video ID
- Multiple models: Whisper, Cohere Transcribe, NeMo ASR
- Multilingual transcription (14 languages with Cohere, 25 with NeMo Canary)
- Speech translation (NeMo Canary models only)
- Caches downloaded audio and transcripts for instant repeat lookups
- Transcript on stdout, diagnostics on stderr (pipe-friendly)
The only prerequisite is uv. The script is self-contained and manages all Python dependencies automatically on first run. A shortcut to the script can be found at https://tinyurl.com/typeout.
# Install uv
cargo install --locked uv # or: pip install uv
# Download typeout
curl -O https://raw.githubusercontent.com/miku/typeout/refs/heads/main/typeout
# Make executable and put somewhete into PATH
chmod +x typeout
mv typeout ~/.local/bin/That's it. No virtualenvs wrangling necessary.
# Transcribe a local file
typeout recording.mp3
typeout lecture.mp4
# Write to file
typeout podcast.flac -o transcript.txt
# Use different Whisper model (CPU)
typeout recording.mp3 --model small
# Use Cohere Transcribe (CPU/GPU, requires Hugging Face login)
typeout recording.mp3 --model cohere-transcribe --lang en
typeout lecture.mp4 --model cohere-transcribe --lang ja
# GPU models (auto-detected on systems with NVIDIA GPU)
typeout recording.mp3 --model canary-qwen-2.5b
typeout lecture.mp4 --model parakeet-0.6b
# Use Whisper on GPU (fp16 acceleration)
typeout recording.mp3 --model large
# Multilingual: set source language
typeout interview.wav --lang de
# Translation: German audio to English text (NeMo Canary only)
typeout interview.wav --lang de --target-lang en
# From a URL or YouTube ID
typeout "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
typeout dQw4w9WgXcQ
# Check external tools
typeout --check
# Clear cache
typeout --clear-cache
# List models
typeout --list-models| Model | Size | Languages | Notes |
|---|---|---|---|
base · CPU default |
~140MB | multilingual | Whisper, good balance |
tiny |
~40MB | multilingual | Whisper, fastest |
small |
~460MB | multilingual | Whisper |
medium |
~1.5GB | multilingual | Whisper |
large |
~2.9GB | multilingual | Whisper, highest accuracy |
distil-large-v3 |
~750MB | multilingual | Distil-Whisper, 6x faster than large |
distil-medium.en |
~400MB | English only | Distil-Whisper, fast |
cohere-transcribe |
~4.1GB | 14 languages | Cohere, high accuracy, requires HF login |
canary-1b-v2 · GPU default |
~6.4GB | 25 languages | NVIDIA only, NeMo, multilingual, translation |
canary-qwen-2.5b |
~5.1GB | multilingual | NVIDIA only, NeMo, highest quality, SLM |
parakeet-0.6b |
~2.5GB | English only | NVIDIA only, NeMo, fast and lightweight |
Cohere Transcribe setup (gated model):
# 1. Accept terms at: https://huggingface.co/CohereLabs/cohere-transcribe-03-2026
# 2. Login to Hugging Face
huggingface-cli loginNote: the
cohere-transcribemodel is loaded withtrust_remote_code=True, which executes custom Python shipped in the model repository on your machine. Only enable it if you trust that repo. The Whisper and NeMo models do not use it.
The typeout script is an amalgamation — it contains both CPU and GPU Python scripts embedded within it. On first run:
- Detects if NVIDIA GPU is available, via
[nvidia-smi](https://docs.nvidia.com/deploy/nvidia-smi/) - Extracts the appropriate Python script to
~/.cache/typeout/(or your custom cache home) - Runs it with
uv run, which installs dependencies automatically
Downloaded audio (for URLs) and transcripts are cached in ~/.cache/typeout/ (respects $XDG_CACHE_HOME).
- URLs: keyed by URL — same URL hits cache instantly
- Local files: keyed by path + modification time + size — cache invalidates on edit
- Transcripts: keyed by source + model + language — different models/languages get separate entries
Use --no-cache to bypass, --clear-cache to remove all cached data.
- uv — runs the script and manages Python dependencies
ffmpeg— audio extraction and normalization (check with--check)nvidia-smi— GPU detection (auto-detected)huggingface-cli login— required for Cohere Transcribe (gated model)
Models are cached in ~/.local/share/typeout/ (respects $XDG_DATA_HOME).
$ typeout --lang de \
https://swr-pd.ard-mcdn.de/swr/swrkultur/hoerspiel/ard-hoerspiel-speicher/2303264.mp3Transcribing a 7h+ audio book [UBIK] takes about 14 minutes on a 70W RTX 4000 SFF with canary-1b-v2.
$ typeout https://www.youtube.com/watch?v=P1qMKFMrpro # UBIK audiobook