A GUI for transcribing audio and video files using WhisperX, with CUDA acceleration for NVIDIA RTX 4070.
Updated to support the latest version of whisperx as of 4/1/2026.
Transcripts are saved as both .srt and .txt files in a transcripts/ folder next to the script.
- Windows 11 (AMD64)
- NVIDIA RTX 4070 (or compatible GPU)
- Git Bash
- Python 3.12.0
- CUDA 12.8.0
- cuDNN 9.x (I tested with 9.20 and 9.15)
Download and run the installer:
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/network_installers/cuda_12.8.1_windows_network.exe
After installing, set the CUDA_PATH environment variable if it wasn't set automatically:
- Value:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
Download and run the installer:
wget https://developer.download.nvidia.com/compute/cudnn/9.20.0/local_installers/cudnn_9.20.0_windows_x86_64.exe
Add the cuDNN bin folder to your system PATH:
C:\Program Files\NVIDIA\CUDNN\v9.20\bin\12.9
Download from python.org or use the Python Launcher:
py install 3.12.0Open Git Bash in the project directory and run:
source install_4070_complete.bashThis script will:
- Create and activate a Python 3.12.0 virtual environment at
.venv - Install PyTorch and torchaudio 2.8.0cu128
- Verify CUDA 12.8 and cuDNN 9.x are detected correctly — if not, it will download the installers for you and exit
- Install WhisperX with all its dependencies.
Note: If CUDA/cuDNN are not installed when you run the script, it will download the installers to the project directory and exit. Install them, then re-run the script.
With the virtual environment activated:
source .venv/Scripts/activate
python whisperx-gui.py- Click Add Files to add audio or video files (
.mp4,.mkv,.mov,.wmv,.avi,.flv,.mp3,.wav,.aac,.flac,.ogg) - Select a Model —
large-v2is the most accurate,tinyis the fastest - Select the Language of the audio
- Select a Compute Type —
float16is recommended for CUDA - Click Transcribe All
Output files are saved to transcripts/<filename>_<timestamp>/ and the folder opens automatically when transcription completes.