Mazinger Dubber

End-to-end video dubbing pipeline. Download a video, transcribe it, translate the subtitles, clone a voice, and produce a fully dubbed audio or video file — in one command.

▶️ Watch Demo Video (with audio)

What It Does

Mazinger chains ten stages into a single pipeline:

Download — fetch a video from a URL or ingest a local file, extract the audio track
Transcribe — convert speech to SRT subtitles (OpenAI Whisper API, faster-whisper, WhisperX, MLX Whisper, or Deepgram Nova 3)
Thumbnails — use an LLM to pick key frames from the video for visual context
Describe — analyze the transcript and thumbnails to produce a structured summary (title, key points, keywords)
Review — optionally refine ASR output: fix typos, reshape punctuation, and convert technical terms to English
Translate — translate the SRT into another language with duration-aware word budgets
Re-segment — merge fragments and split oversized subtitles for readability
Speak — synthesize voice-cloned speech for every subtitle entry (Qwen3-TTS, Chatterbox, or MLX), with 16 pre-defined voice themes or your own voice sample
Assemble — place each audio segment on the original timeline with optional tempo adjustment, loudness matching, and background audio mixing
Subtitle — burn styled subtitles into the video and/or mux the new audio track

Every stage can run independently or as part of the full pipeline. Interrupted runs resume automatically — completed stages and individual TTS segments are cached and skipped.

Prerequisites

Python 3.10 or later
ffmpeg installed and on PATH (apt install ffmpeg / brew install ffmpeg)
An OpenAI API key for LLM-powered stages (transcription, translation, thumbnails, description)
A CUDA GPU for local transcription and TTS (not needed for cloud-only workflows)
Apple Silicon (M1/M2/M3/M4/M5) for MLX-accelerated TTS and transcription (optional)

Installation

The base install covers download, transcription (cloud), thumbnails, description, translation, re-segmentation, and subtitle embedding. No GPU needed.

pip install mazinger

Add local transcription or TTS as optional extras:

# Local transcription
pip install "mazinger[transcribe-faster]"      # faster-whisper (default, recommended)
pip install "mazinger[transcribe-whisperx]"    # WhisperX (optional, word-level alignment)

# Cloud transcription (no GPU needed)
pip install "mazinger[transcribe-deepgram]"    # Deepgram Nova 3 (cloud, free $200 credit)

# Voice synthesis
pip install "mazinger[tts]"                    # Qwen3-TTS (voice sample + transcript)
pip install "mazinger[tts-chatterbox]"         # Chatterbox (voice sample only, emotion control)
pip install "mazinger[tts-mlx]"                # MLX Qwen3-TTS (Apple Silicon)

# MLX transcription (Apple Silicon)
pip install "mazinger[transcribe-mlx]"         # MLX Whisper (Apple Silicon)

# Full bundles
pip install "mazinger[all-qwen]"              # faster-whisper + Qwen3-TTS
pip install "mazinger[all-chatterbox]"        # faster-whisper + Chatterbox
pip install "mazinger[all-mlx]"               # MLX Whisper + MLX Qwen3-TTS

Qwen and Chatterbox require different transformers versions and cannot share an environment. WhisperX is available as an optional extra but is not installed by default due to complex dependencies.

See the Installation Guide for venv recipes, Colab setup, and uv overrides.

Quick Start

Dub a video in one command

mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --voice-sample speaker.m4a \
    --voice-script speaker_transcript.txt \
    --target-language Spanish \
    --base-dir ./output

Use a voice profile instead of local files

Voice profiles are hosted on HuggingFace and downloaded automatically. Several ready-made profiles are available out of the box:

abubakr · daheeh-v1 · 3b1b · italian-v1 · morgan-freeman · trump-v1

See the full list with descriptions in the Available Voice Profiles doc.

mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --clone-profile abubakr \
    --target-language Arabic

Use a voice theme (no files needed)

Choose from 16 pre-defined voice themes — no voice sample or profile download required:

narrator-m/f · young-m/f · deep-m/f · warm-m/f · news-m/f · storyteller-m/f · kid-m/f · teen-m/f

mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --voice-theme narrator-m \
    --target-language Spanish

List all themes with mazinger profile list. Generate a reusable profile with mazinger profile generate. See Voice Profiles for details.

Auto-clone the original speaker's voice

When no voice option is provided, Mazinger automatically clones the speaker directly from the source audio. The pipeline picks the best 20–60 s segment from the transcription and uses it as the cloning reference — no files or configuration needed.

mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --target-language Spanish

proj = dubber.dub(
    source="https://youtube.com/watch?v=VIDEO_ID",
    target_language="Spanish",
)

Produce a video with burned subtitles

mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --clone-profile abubakr \
    --output-type video \
    --embed-subtitles \
    --subtitle-google-font "Noto Sans Arabic" \
    --subtitle-font-size 24

Run a single stage

Every stage has its own sub-command:

mazinger download   "https://youtube.com/watch?v=VIDEO_ID" --base-dir ./output
mazinger slice      "https://youtube.com/watch?v=VIDEO_ID" --start 00:01:00 --end 00:04:00
mazinger transcribe ./output/projects/my-video/source/audio.mp3 -o subs.srt
mazinger translate  --srt subs.srt --target-language French -o translated.srt
mazinger subtitle   video.mp4 --srt translated.srt -o output.mp4

Cloud transcription with Deepgram (no GPU required)

Deepgram Nova 3 offers strong multilingual quality (including Arabic) and gives you $200 in free credits on sign-up with no credit card required — enough to transcribe many hours of audio for free. Get your key at deepgram.com, then:

export DEEPGRAM_API_KEY=your_key_here

# Standalone transcription
mazinger transcribe "https://youtube.com/watch?v=VIDEO_ID" \
    --method deepgram --language ar -o subs.srt

# Full dubbing pipeline with Deepgram for STT
mazinger dub "https://youtube.com/watch?v=VIDEO_ID" \
    --transcribe-method deepgram \
    --voice-theme narrator-m \
    --target-language English

Python API

from mazinger import MazingerDubber

dubber = MazingerDubber(openai_api_key="sk-...", base_dir="./output")

# With a voice theme (simplest)
proj = dubber.dub(
    source="https://youtube.com/watch?v=VIDEO_ID",
    voice_theme="narrator-m",
    target_language="Spanish",
    output_type="video",
)

# Auto-clone the speaker's voice (no voice option needed)
proj = dubber.dub(
    source="https://youtube.com/watch?v=VIDEO_ID",
    target_language="Spanish",
)

# Or with explicit voice files
proj = dubber.dub(
    source="https://youtube.com/watch?v=VIDEO_ID",
    voice_sample="speaker.m4a",
    voice_script="speaker_transcript.txt",
    target_language="Spanish",
    output_type="video",
    embed_subtitles=True,
)

print(proj.final_video)   # ./output/projects/<slug>/tts/dubbed.mp4

Documentation

Full documentation lives in the docs/ directory:

Chapter	Contents
Installation	All install methods, extras, compatibility matrix, Colab and venv recipes
Quick Start	Common workflows with copy-paste examples
Pipeline Overview	How the nine stages connect, data flow, and resume behavior
CLI Reference	Every command, flag, and default value
Python API	Classes, functions, and parameters for programmatic use
Voice Profiles	Using, creating, and uploading voice profiles
Subtitle Styling	Fonts, colors, positioning, RTL support, Google Fonts
Configuration	Environment variables, caching, tempo control, LLM usage tracking
Project Structure	Output directory layout and file naming conventions
YouTube Cookies	How to export and pass cookies for age-restricted or region-locked videos

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
docs		docs
mazinger		mazinger
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mazinger Dubber

What It Does

Prerequisites

Installation

Quick Start

Dub a video in one command

Use a voice profile instead of local files

Use a voice theme (no files needed)

Auto-clone the original speaker's voice

Produce a video with burned subtitles

Run a single stage

Cloud transcription with Deepgram (no GPU required)

Python API

Documentation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mazinger Dubber

What It Does

Prerequisites

Installation

Quick Start

Dub a video in one command

Use a voice profile instead of local files

Use a voice theme (no files needed)

Auto-clone the original speaker's voice

Produce a video with burned subtitles

Run a single stage

Cloud transcription with Deepgram (no GPU required)

Python API

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages