Skip to content

Global-Health-Engineering/ghe_transcribe

Repository files navigation

ghe_transcribe

Python Versions License launch - renku

A tool to transcribe audio files with speaker diarization using Faster Whisper and Pyannote.

  • Fast transcription with optimized Whisper models
  • Speaker diarization to identify different speakers
  • Multiple output formats (TXT, SRT)
  • Jupyter interface for interactive use
  • CLI tool for global compatibility
Interface Preview

The Jupyter-based interface provides an intuitive way to upload audio files, configure transcription settings, and download results in multiple formats.

GHE Transcribe App Interface

Installation

System Dependencies

This tool requires FFmpeg for audio processing:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows
choco install ffmpeg

ghe_transcribe

git clone https://github.com/Global-Health-Engineering/ghe_transcribe.git
cd ghe_transcribe
python -m venv venv
source venv/bin/activate
pip install -e .

Hugging Face Authentication

This tool uses gated models from Hugging Face that require authentication. You need to:

  1. Join Hugging Face, to access Pyannote
  2. Accept User Conditions, to use Pyannote
  3. Create Access Token, to use ghe_transcribe

Usage

Renkulab

See the detailed documentation for Renkulab

Jupyter Interface (Local)

Open app.ipynb and run the cell:

from ghe_transcribe.app import execute
execute()

Python API

from ghe_transcribe.core import transcribe
result = transcribe("media/test01.mp3")

Command Line

# Simplest call
transcribe media/test01.mp3

# Multiple files
transcribe media/test01.mp3 media/test02.m4a --trim 5

# See all options
transcribe --help 

Editors

  • For SRT files subtitle-editor.org/, runs locally on your browser
  • For TXT files note-taking apps, Word, MAXQDA, QualCoder, ...

Contributing

We welcome contributions! Please use conventional commits. See our contributions guidelines.

License

MIT License - see LICENSE.