GPU-ready CTranslate2 base image for Jetson Orin NX / AGX Orin / Xavier (ARM64).
This image is intended as a base for production-grade GPU accelerated Faster Whisper pipelines on NVIDIA Jetson (Orin/Xavier, JetPack 5.1.x).
Pre-built CTranslate2 v4.5.0 (CUDA/cuDNN) for ARM64, Python 3.8, tested on Orin NX 16GB (JetPack 5.1.1, CUDA 11.4, cuDNN 8.6).
Optimized to run Faster-Whisper with CUDA via CT2 (v4.5.0) without rebuild headaches.
Based on NVIDIA dusty-nv/l4t-pytorch:2.2-r35.4.1 (the official Jetson PyTorch base from jetson-containers project).
- JetPack: 5.1.1 – 5.1.2 (L4T r35.3.1 and r35.4.1)
- CUDA: 11.4
- cuDNN: 8.6.x
- CTranslate2: 4.5.0 (built from source with CUDA/cuDNN)
- Python: 3.8
- PyTorch: 2.2 (from dusty-nv/l4t-pytorch)
- faster-whisper:
Requirements:
- Jetson ARM64 device (Orin NX / AGX Orin / Xavier)
- NVIDIA Container Runtime installed and enabled
- requests==2.31.0
- urllib3==1.26.18
- charset-normalizer==3.3.2
- av
- tqdm
- tokenizers==0.13.3
- huggingface-hub==0.19.4
docker pull ghcr.io/romular21/jetson-ctranslate2:cuda
docker run -it --rm \
--runtime nvidia \
-e LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 \
ghcr.io/romular21/jetson-ctranslate2:cudaNote:
- If NVIDIA Container Runtime is the default on your Jetson, no GPU flags are needed. Just run the image.
- If it is not default, use
--runtime nvidia(most compatible on Jetson) or--gpus allon newer setups. - Check default:
docker info | grep -i 'Default Runtime'(should shownvidia).
Optional: create a short local alias
docker tag ghcr.io/romular21/jetson-ctranslate2:cuda jetson-ctranslate2:cuda
# Then run using the short name (no extra layers created):
docker run -it --rm \
--runtime nvidia \
-e LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 \
jetson-ctranslate2:cudaInstall Faster-Whisper inside the container (typical extras in one line):
pip3 install --no-deps --index-url https://pypi.org/simple \
requests==2.31.0 urllib3==1.26.18 charset-normalizer==3.3.2 \
av tqdm tokenizers==0.13.3 huggingface-hub==0.19.4 faster-whisperpython3 -c "import ctranslate2; print(ctranslate2.__version__)"
# Should print: 4.5.0
python3 -c "from faster_whisper import WhisperModel; m = WhisperModel('tiny', device='cuda', compute_type='float16'); print('faster-whisper CUDA OK')"
# Should print: faster-whisper CUDA OKfrom faster_whisper import WhisperModel
model = WhisperModel("tiny", device="cuda", compute_type="float16")
segments, info = model.transcribe("your_audio.wav")
for seg in segments:
print(seg.text)This gives the most accurate GPU inference speed for Jetson/ARM.
-
If you see:
ImportError: undefined symbol: omp_get_thread_num- Make sure
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1is set (already set in this Docker image).
- Make sure
-
Do not add
ctranslate2to requirements.txt!- The custom CUDA version is already pre-installed.
-
Always instantiate the WhisperModel once per process (do not recreate for every inference).
- Built and tested by romular21
- Dockerfile and build notes are available in this repository
- No private tokens, data, or configs inside the image
- Base (this image):
ghcr.io/romular21/jetson-ctranslate2:cuda
PRs and suggestions welcome!