Skip to content

romular21/jetson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

jetson-ctranslate2:cuda

GPU-ready CTranslate2 base image for Jetson Orin NX / AGX Orin / Xavier (ARM64).

This image is intended as a base for production-grade GPU accelerated Faster Whisper pipelines on NVIDIA Jetson (Orin/Xavier, JetPack 5.1.x).

Pre-built CTranslate2 v4.5.0 (CUDA/cuDNN) for ARM64, Python 3.8, tested on Orin NX 16GB (JetPack 5.1.1, CUDA 11.4, cuDNN 8.6).

Optimized to run Faster-Whisper with CUDA via CT2 (v4.5.0) without rebuild headaches.

Based on NVIDIA dusty-nv/l4t-pytorch:2.2-r35.4.1 (the official Jetson PyTorch base from jetson-containers project).


Versions

  • JetPack: 5.1.1 – 5.1.2 (L4T r35.3.1 and r35.4.1)
  • CUDA: 11.4
  • cuDNN: 8.6.x
  • CTranslate2: 4.5.0 (built from source with CUDA/cuDNN)
  • Python: 3.8
  • PyTorch: 2.2 (from dusty-nv/l4t-pytorch)
  • faster-whisper:

Usage (base image)

Requirements:

  • Jetson ARM64 device (Orin NX / AGX Orin / Xavier)
  • NVIDIA Container Runtime installed and enabled

Extra Python dependencies typically needed by faster-whisper (CT2-only base):

  • requests==2.31.0
  • urllib3==1.26.18
  • charset-normalizer==3.3.2
  • av
  • tqdm
  • tokenizers==0.13.3
  • huggingface-hub==0.19.4
docker pull ghcr.io/romular21/jetson-ctranslate2:cuda
docker run -it --rm \
    --runtime nvidia \
    -e LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 \
    ghcr.io/romular21/jetson-ctranslate2:cuda

Note:

  • If NVIDIA Container Runtime is the default on your Jetson, no GPU flags are needed. Just run the image.
  • If it is not default, use --runtime nvidia (most compatible on Jetson) or --gpus all on newer setups.
  • Check default: docker info | grep -i 'Default Runtime' (should show nvidia).

Optional: create a short local alias

docker tag ghcr.io/romular21/jetson-ctranslate2:cuda jetson-ctranslate2:cuda
# Then run using the short name (no extra layers created):
docker run -it --rm \
    --runtime nvidia \
    -e LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 \
    jetson-ctranslate2:cuda

Install Faster-Whisper inside the container (typical extras in one line):

pip3 install --no-deps --index-url https://pypi.org/simple \
  requests==2.31.0 urllib3==1.26.18 charset-normalizer==3.3.2 \
  av tqdm tokenizers==0.13.3 huggingface-hub==0.19.4 faster-whisper

Quick Check

python3 -c "import ctranslate2; print(ctranslate2.__version__)"
# Should print: 4.5.0

python3 -c "from faster_whisper import WhisperModel; m = WhisperModel('tiny', device='cuda', compute_type='float16'); print('faster-whisper CUDA OK')"
# Should print: faster-whisper CUDA OK

Example: GPU inference in Python

from faster_whisper import WhisperModel

model = WhisperModel("tiny", device="cuda", compute_type="float16")
segments, info = model.transcribe("your_audio.wav")
for seg in segments:
    print(seg.text)

This gives the most accurate GPU inference speed for Jetson/ARM.


Troubleshooting

  • If you see: ImportError: undefined symbol: omp_get_thread_num

    • Make sure LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 is set (already set in this Docker image).
  • Do not add ctranslate2 to requirements.txt!

    • The custom CUDA version is already pre-installed.
  • Always instantiate the WhisperModel once per process (do not recreate for every inference).


About

  • Built and tested by romular21
  • Dockerfile and build notes are available in this repository
  • No private tokens, data, or configs inside the image

Image Variants and Naming

  • Base (this image): ghcr.io/romular21/jetson-ctranslate2:cuda

PRs and suggestions welcome!

About

Faster Whisper based on GPU-ready CTranslate2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors