A tool to transcribe audio files with speaker diarization using Faster Whisper and Pyannote.
- Fast transcription with optimized Whisper models
- Speaker diarization to identify different speakers
- Multiple output formats (TXT, SRT)
- Jupyter interface for interactive use
- CLI tool for global compatibility
Interface Preview
The Jupyter-based interface provides an intuitive way to upload audio files, configure transcription settings, and download results in multiple formats.
This tool requires FFmpeg for audio processing:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Windows
choco install ffmpeggit clone https://github.com/Global-Health-Engineering/ghe_transcribe.git
cd ghe_transcribe
python -m venv venv
source venv/bin/activate
pip install -e .Note
See the detailed installation guide.
This tool uses gated models from Hugging Face that require authentication. You need to:
- Join Hugging Face, to access Pyannote
- Accept User Conditions, to use Pyannote
- Create Access Token, to use ghe_transcribe
See the detailed documentation for Renkulab
Open app.ipynb and run the cell:
from ghe_transcribe.app import execute
execute()from ghe_transcribe.core import transcribe
result = transcribe("media/test01.mp3")# Simplest call
transcribe media/test01.mp3
# Multiple files
transcribe media/test01.mp3 media/test02.m4a --trim 5
# See all options
transcribe --help - For SRT files subtitle-editor.org/, runs locally on your browser
- For TXT files note-taking apps, Word, MAXQDA, QualCoder, ...
We welcome contributions! Please use conventional commits. See our contributions guidelines.
MIT License - see LICENSE.
