Fine-tuning OWSM-CTC with Mixture-of-Experts for efficient speech recognition.
git clone <repo-url>
cd owsm-ctc-moeFrom the root of the repository:
bash scripts/setup_env.shThis will:
- Create a conda environment named
owsm-ctc-moe - Install PyTorch with CUDA support
- Install ESPnet and dependencies
- Install CUDA toolkit for compiling extensions
conda activate owsm-ctc-moeFlash Attention provides 2-4x training speedup. Skip this step if using V100 or older GPUs.
In your conda environment run:
(owsm-ctc-moe) user$ pip install --index-url https://download.pytorch.org/whl/cu121 \
torch==2.4.0 \
torchaudio==2.4.0Then run:
# Installs FlashAttention from prebuilt wheel
pip install flash-attn==2.8.3 --no-build-isolationpython scripts/verify_setup.py- Download LibriSpeech:
bash scripts/download_librispeech.sh