GitHub - Ashot72/LTX-2-Audio-to-Video-Local-XPU: Running LTX‑2 19B AI Model: Image/Audio-to-Video Locally on Intel Arc (XPU) GPU & CPU

🚀 Running LTX‑2 19B AI Model: Image/Audio-to-Video Locally on Intel Arc (XPU) GPU & CPU

Run the LTX‑2 19B model locally to generate videos from both an image and an audio track using Intel Arc (XPU) GPU or CPU. This project demonstrates local AI video generation without sending data to the cloud, leveraging extended shared GPU memory for high-parameter models.

While most local AI models are optimized for NVIDIA GPUs (CUDA), this project demonstrates how to run LTX-2 on Intel Arc (XPU) hardware. This is a significant milestone, as XPU support is currently rare, providing a functional pathway for Intel Arc users to leverage high-performance video generation.

The model itself isn’t just one giant block. It’s more like a carefully choreographed pipeline where each part has its own job. Gemma‑3‑12B handles text prompts, turning your words into embeddings that guide what the video should look like. The Video VAE compresses the video into a latent space, making it easier and faster for the transformer to process, and then decodes it back into frames. The Audio VAE does a similar thing for sound, capturing pitch, rhythm, and timbre while ignoring unnecessary details. Connectors act like translators between the different inputs, aligning text, audio, and images into a shared representation that the transformer can understand.

At the center is the 19B-parameter transformer. This is the “brain” that fuses all the inputs and generates coherent video and audio in one pass. The scheduler is part of this process too—it’s like a timing coordinator that tells the transformer how to gradually refine the output. Instead of trying to generate the video and audio perfectly in one shot, the scheduler loops through multiple denoising steps, slowly turning noisy latent representations into clean, synchronized video and sound. This ensures that everything stays aligned and natural over time. Finally, the vocoder takes the audio latents and converts them into actual high-quality sound, so it doesn’t end up robotic or synthetic.

All of these components together make it possible for LTX‑2 to take an image, an audio track, and a prompt, and generate a synchronized video with sound, all in one go. It’s like a team where everyone knows their role, the scheduler keeps the timing right, and the transformer is the conductor making everything come together seamlessly.

👉 Links & Resources

Place these models in the models/ folder. LTX-2 generates the video, Gemma encodes text prompts, and the LTX-2 components handle video/audio encoding and decoding.

models/

📦 ltx-2-19b-distilled-fp8.safetensors — Download
📁 ltx2_components/
- 📦 vae/ — Download
- 📦 audio_vae/ — Download
- 📦 connectors/ — Download
- 📦 vocoder/ — Download
📁 gemma-3-12b-it-qat-q4_0-unquantized/ — Download

🚀 Clone and Run

# Clone the repository
git clone https://github.com/Ashot72/LTX-2-Audio-to-Video-Local-XPU
cd LTX-2-Audio-to-Video-Local-XPU

Install (Windows - Intel XPU/Arc):
install_all.bat

Add inputs:
inputs/singer.png (image)
inputs/track.mp3 (audio)

Run:
python worker.py

CPU-only:
python worker.py --cpu

Output: final_music_video.mp4

📺 Video Watch on YouTube

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
inputs		inputs
models/ltx2_components		models/ltx2_components
pipelines		pipelines
temp_render		temp_render
.gitignore		.gitignore
README.md		README.md
config.py		config.py
final_music_video.mp4		final_music_video.mp4
install_all.bat		install_all.bat
pipeline_loader.py		pipeline_loader.py
requirements.txt		requirements.txt
utils.py		utils.py
verify_xpu.py		verify_xpu.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Running LTX‑2 19B AI Model: Image/Audio-to-Video Locally on Intel Arc (XPU) GPU & CPU

👉 Links & Resources

models/

🚀 Clone and Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Running LTX‑2 19B AI Model: Image/Audio-to-Video Locally on Intel Arc (XPU) GPU & CPU

👉 Links & Resources

models/

🚀 Clone and Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages