Skip to content

naturalcandy/OWSM-CTC-MoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OWSM-CTC MoE

Fine-tuning OWSM-CTC with Mixture-of-Experts for efficient speech recognition.

Requirements

Setup

1. Clone Repository

git clone <repo-url>
cd owsm-ctc-moe

2. Create Environment

From the root of the repository:

bash scripts/setup_env.sh

This will:

  • Create a conda environment named owsm-ctc-moe
  • Install PyTorch with CUDA support
  • Install ESPnet and dependencies
  • Install CUDA toolkit for compiling extensions

3. Activate Environment

conda activate owsm-ctc-moe

4. Install Flash Attention (Optional)

Flash Attention provides 2-4x training speedup. Skip this step if using V100 or older GPUs.

In your conda environment run:

  (owsm-ctc-moe) user$ pip install --index-url https://download.pytorch.org/whl/cu121 \
  torch==2.4.0 \
  torchaudio==2.4.0

Then run:

# Installs FlashAttention from prebuilt wheel
pip install flash-attn==2.8.3 --no-build-isolation

5. Verify Installation

python scripts/verify_setup.py

Additional Steps

  • Download LibriSpeech: bash scripts/download_librispeech.sh

About

CMU 11751 Final Project: Fine-tuning OWSM-CTC with Mixture-of-Experts for efficient speech recognition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors