|
| 1 | +# TLDR |
| 2 | + |
| 3 | +**Transcribe an audio file** |
| 4 | + |
| 5 | +```faster-whisper [audio.mp3]``` |
| 6 | + |
| 7 | +**Transcribe with a specific model** |
| 8 | + |
| 9 | +```faster-whisper [audio.mp3] --model [large-v3]``` |
| 10 | + |
| 11 | +**Transcribe with language hint** |
| 12 | + |
| 13 | +```faster-whisper [audio.mp3] --language [en]``` |
| 14 | + |
| 15 | +**Output as SRT subtitles** |
| 16 | + |
| 17 | +```faster-whisper [audio.mp3] --output_format [srt]``` |
| 18 | + |
| 19 | +**Translate to English** |
| 20 | + |
| 21 | +```faster-whisper [audio.mp3] --task [translate]``` |
| 22 | + |
| 23 | +**Save output to directory** |
| 24 | + |
| 25 | +```faster-whisper [audio.mp3] --output_dir [/path/to/output]``` |
| 26 | + |
| 27 | +**Transcribe with word timestamps** |
| 28 | + |
| 29 | +```faster-whisper [audio.mp3] --word_timestamps [true]``` |
| 30 | + |
| 31 | +# SYNOPSIS |
| 32 | + |
| 33 | +**faster-whisper** _audio_ [_--model size_] [_--language lang_] [_--task task_] [_options_] |
| 34 | + |
| 35 | +# PARAMETERS |
| 36 | + |
| 37 | +**--model** _SIZE_ |
| 38 | +> Model size: tiny, base, small, medium, large-v1, large-v2, large-v3 (default: small). |
| 39 | +
|
| 40 | +**--language** _LANG_ |
| 41 | +> Language code (en, de, fr, etc.) or auto-detect. |
| 42 | +
|
| 43 | +**--task** _TASK_ |
| 44 | +> Task: transcribe or translate. |
| 45 | +
|
| 46 | +**--output_format** _FORMAT_ |
| 47 | +> Output format: txt, vtt, srt, tsv, json, all. |
| 48 | +
|
| 49 | +**--output_dir** _DIR_ |
| 50 | +> Output directory for results. |
| 51 | +
|
| 52 | +**--word_timestamps** _BOOL_ |
| 53 | +> Include word-level timestamps. |
| 54 | +
|
| 55 | +**--device** _DEVICE_ |
| 56 | +> Device: cpu, cuda, auto (default: auto). |
| 57 | +
|
| 58 | +**--compute_type** _TYPE_ |
| 59 | +> Compute type: int8, float16, float32 (default: int8 on CPU). |
| 60 | +
|
| 61 | +**--beam_size** _N_ |
| 62 | +> Beam search size (default: 5). |
| 63 | +
|
| 64 | +**--vad_filter** _BOOL_ |
| 65 | +> Enable voice activity detection filter. |
| 66 | +
|
| 67 | +**--threads** _N_ |
| 68 | +> Number of CPU threads. |
| 69 | +
|
| 70 | +# DESCRIPTION |
| 71 | + |
| 72 | +**faster-whisper** is a reimplementation of OpenAI's Whisper using **CTranslate2**, a fast inference engine for Transformer models. It provides up to 4x faster transcription than the original Whisper while using less memory. |
| 73 | + |
| 74 | +The tool supports all Whisper model sizes. Larger models are more accurate but slower. The compute type parameter controls precision: int8 is fastest and most memory-efficient, float16 is a good balance on GPU, float32 is highest precision. |
| 75 | + |
| 76 | +Voice activity detection (VAD) filtering skips silent sections, improving both speed and accuracy. Language detection is automatic but specifying the language avoids detection overhead. |
| 77 | + |
| 78 | +Install via pip (`pip install faster-whisper`). CTranslate2 handles model conversion automatically. GPU acceleration requires CUDA toolkit. |
| 79 | + |
| 80 | +# CAVEATS |
| 81 | + |
| 82 | +Large models require significant memory. CUDA toolkit needed for GPU. First run downloads and converts models. Accuracy varies by audio quality. No speaker diarization in CLI (available via API). |
| 83 | + |
| 84 | +# HISTORY |
| 85 | + |
| 86 | +**faster-whisper** was created by **Guillaume Klein** (SYSTRAN) in **2023** using CTranslate2 to optimize Whisper inference. It became the preferred Whisper implementation for production use due to its speed and memory advantages. The project achieved wide adoption in transcription workflows. |
| 87 | + |
| 88 | +# SEE ALSO |
| 89 | + |
| 90 | +[whisper](/man/whisper)(1), [deepspeech](/man/deepspeech)(1), [ffmpeg](/man/ffmpeg)(1) |
0 commit comments