End-to-end video dubbing pipeline. Download a video, transcribe it, translate the subtitles, clone a voice, and produce a fully dubbed audio or video file — in one command.
![]()
Mazinger chains ten stages into a single pipeline:
- Download — fetch a video from a URL or ingest a local file, extract the audio track
- Transcribe — convert speech to SRT subtitles (OpenAI Whisper API, faster-whisper, WhisperX, MLX Whisper, or Deepgram Nova 3)
- Thumbnails — use an LLM to pick key frames from the video for visual context
- Describe — analyze the transcript and thumbnails to produce a structured summary (title, key points, keywords)
- Review — optionally refine ASR output: fix typos, reshape punctuation, and convert technical terms to English
- Translate — translate the SRT into another language with duration-aware word budgets
- Re-segment — merge fragments and split oversized subtitles for readability
- Speak — synthesize voice-cloned speech for every subtitle entry (Qwen3-TTS, Chatterbox, or MLX), with 16 pre-defined voice themes or your own voice sample
- Assemble — place each audio segment on the original timeline with optional tempo adjustment, loudness matching, and background audio mixing
- Subtitle — burn styled subtitles into the video and/or mux the new audio track
Every stage can run independently or as part of the full pipeline. Interrupted runs resume automatically — completed stages and individual TTS segments are cached and skipped.
- Python 3.10 or later
- ffmpeg installed and on
PATH(apt install ffmpeg/brew install ffmpeg) - An OpenAI API key for LLM-powered stages (transcription, translation, thumbnails, description)
- A CUDA GPU for local transcription and TTS (not needed for cloud-only workflows)
- Apple Silicon (M1/M2/M3/M4/M5) for MLX-accelerated TTS and transcription (optional)
The base install covers download, transcription (cloud), thumbnails, description, translation, re-segmentation, and subtitle embedding. No GPU needed.
pip install mazingerAdd local transcription or TTS as optional extras:
# Local transcription
pip install "mazinger[transcribe-faster]" # faster-whisper (default, recommended)
pip install "mazinger[transcribe-whisperx]" # WhisperX (optional, word-level alignment)
# Cloud transcription (no GPU needed)
pip install "mazinger[transcribe-deepgram]" # Deepgram Nova 3 (cloud, free $200 credit)
# Voice synthesis
pip install "mazinger[tts]" # Qwen3-TTS (voice sample + transcript)
pip install "mazinger[tts-chatterbox]" # Chatterbox (voice sample only, emotion control)
pip install "mazinger[tts-mlx]" # MLX Qwen3-TTS (Apple Silicon)
# MLX transcription (Apple Silicon)
pip install "mazinger[transcribe-mlx]" # MLX Whisper (Apple Silicon)
# Full bundles
pip install "mazinger[all-qwen]" # faster-whisper + Qwen3-TTS
pip install "mazinger[all-chatterbox]" # faster-whisper + Chatterbox
pip install "mazinger[all-mlx]" # MLX Whisper + MLX Qwen3-TTSQwen and Chatterbox require different
transformersversions and cannot share an environment. WhisperX is available as an optional extra but is not installed by default due to complex dependencies.
See the Installation Guide for venv recipes, Colab setup, and uv overrides.
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--voice-sample speaker.m4a \
--voice-script speaker_transcript.txt \
--target-language Spanish \
--base-dir ./outputVoice profiles are hosted on HuggingFace and downloaded automatically. Several ready-made profiles are available out of the box:
abubakr · daheeh-v1 · 3b1b · italian-v1 · morgan-freeman · trump-v1
See the full list with descriptions in the Available Voice Profiles doc.
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--clone-profile abubakr \
--target-language ArabicChoose from 16 pre-defined voice themes — no voice sample or profile download required:
narrator-m/f · young-m/f · deep-m/f · warm-m/f · news-m/f · storyteller-m/f · kid-m/f · teen-m/f
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--voice-theme narrator-m \
--target-language SpanishList all themes with mazinger profile list. Generate a reusable profile with mazinger profile generate. See Voice Profiles for details.
When no voice option is provided, Mazinger automatically clones the speaker directly from the source audio. The pipeline picks the best 20–60 s segment from the transcription and uses it as the cloning reference — no files or configuration needed.
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--target-language Spanishproj = dubber.dub(
source="https://youtube.com/watch?v=VIDEO_ID",
target_language="Spanish",
)mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--clone-profile abubakr \
--output-type video \
--embed-subtitles \
--subtitle-google-font "Noto Sans Arabic" \
--subtitle-font-size 24Every stage has its own sub-command:
mazinger download "https://youtube.com/watch?v=VIDEO_ID" --base-dir ./output
mazinger slice "https://youtube.com/watch?v=VIDEO_ID" --start 00:01:00 --end 00:04:00
mazinger transcribe ./output/projects/my-video/source/audio.mp3 -o subs.srt
mazinger translate --srt subs.srt --target-language French -o translated.srt
mazinger subtitle video.mp4 --srt translated.srt -o output.mp4Deepgram Nova 3 offers strong multilingual quality (including Arabic) and gives you $200 in free credits on sign-up with no credit card required — enough to transcribe many hours of audio for free. Get your key at deepgram.com, then:
export DEEPGRAM_API_KEY=your_key_here
# Standalone transcription
mazinger transcribe "https://youtube.com/watch?v=VIDEO_ID" \
--method deepgram --language ar -o subs.srt
# Full dubbing pipeline with Deepgram for STT
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
--transcribe-method deepgram \
--voice-theme narrator-m \
--target-language Englishfrom mazinger import MazingerDubber
dubber = MazingerDubber(openai_api_key="sk-...", base_dir="./output")
# With a voice theme (simplest)
proj = dubber.dub(
source="https://youtube.com/watch?v=VIDEO_ID",
voice_theme="narrator-m",
target_language="Spanish",
output_type="video",
)
# Auto-clone the speaker's voice (no voice option needed)
proj = dubber.dub(
source="https://youtube.com/watch?v=VIDEO_ID",
target_language="Spanish",
)
# Or with explicit voice files
proj = dubber.dub(
source="https://youtube.com/watch?v=VIDEO_ID",
voice_sample="speaker.m4a",
voice_script="speaker_transcript.txt",
target_language="Spanish",
output_type="video",
embed_subtitles=True,
)
print(proj.final_video) # ./output/projects/<slug>/tts/dubbed.mp4Full documentation lives in the docs/ directory:
| Chapter | Contents |
|---|---|
| Installation | All install methods, extras, compatibility matrix, Colab and venv recipes |
| Quick Start | Common workflows with copy-paste examples |
| Pipeline Overview | How the nine stages connect, data flow, and resume behavior |
| CLI Reference | Every command, flag, and default value |
| Python API | Classes, functions, and parameters for programmatic use |
| Voice Profiles | Using, creating, and uploading voice profiles |
| Subtitle Styling | Fonts, colors, positioning, RTL support, Google Fonts |
| Configuration | Environment variables, caching, tempo control, LLM usage tracking |
| Project Structure | Output directory layout and file naming conventions |
| YouTube Cookies | How to export and pass cookies for age-restricted or region-locked videos |
MIT
