comfyui-id-lora-ltx

ComfyUI custom nodes for ID-LoRA-2.3 inference — audio+video generation with speaker identity transfer, built on top of LTX-2.3. Supports both one-stage (single resolution) and two-stage (2x spatial upsampling) pipelines.

Demo

Two-stage output (max_resolution=1024, HQ mode, 242 frames @ 25fps):

Input voice sample: reference.mp3

ComfyUI_demo_.mp4

What it does

ID-LoRA transfers a speaker's vocal identity from a reference audio clip into a generated talking-head video. This package wraps the inference pipelines as ComfyUI nodes:

One-stage — generates at a single resolution (lower quality):

IDLoraModelLoader --> IDLoraPromptEncoder --> IDLoraOneStageSampler --> SaveVideo
                                                  ^          ^
                                            first_frame  reference_audio

Two-stage — generates at target resolution then refines at 2x with spatial upsampling (higher quality):

IDLoraTwoStageModelLoader --> IDLoraPromptEncoder --> IDLoraTwoStageSampler --> SaveVideo
                                                          ^          ^
                                                    first_frame  reference_audio

The two-stage pipeline produces higher-resolution output (e.g. 512x512 -> 1024x1024) by:

Stage 1: Generating video+audio at target resolution with ID-LoRA + full guidance (CFG, STG, identity guidance, A/V bimodal)
Transition: Freeing stage-1 models, 2x spatial upsampling the video latent, re-encoding the first frame at 2x
Stage 2: Refining at 2x resolution with a distilled LoRA only (no guidance), audio frozen from stage 1

An optional HQ mode uses the res2s second-order sampler instead of Euler for higher quality.

Requirements

GPU: NVIDIA GPU with >=24 GB VRAM (48 GB recommended for non-quantized; 80 GB recommended for two-stage non-quantized at high resolutions)
Python: 3.10+
ComfyUI: Recent version with comfy_api.latest support
Disk: ~51 GB for one-stage models, ~60 GB for two-stage models (additional upsampler + distilled LoRA)

Installation

1. Clone ID-LoRA

This package depends on the ID-LoRA repo for its Python packages and model download script. ID-LoRA-2.3 is a subdirectory inside it.

git clone https://github.com/ID-LoRA/ID-LoRA.git

2. Download models

Run the download script from the repository root so models end up where the nodes expect them:

bash ID-LoRA/ID-LoRA-2.3/scripts/download_models.sh models/

This downloads all required models from HuggingFace:

Model	HuggingFace repo	Size
LTX-2.3 base checkpoint	`Lightricks/LTX-2.3`	~44 GB
Gemma 3 12B text encoder	`google/gemma-3-12b-it-qat-q4_0-unquantized`	~6 GB
ID-LoRA CelebV-HQ weights	`AviadDahan/LTX-2.3-ID-LoRA-CelebVHQ-3K`	~1.1 GB
ID-LoRA TalkVid weights	`AviadDahan/LTX-2.3-ID-LoRA-TalkVid-3K`	~1.1 GB

For two-stage, also download (included in the download script):

Model	HuggingFace repo	Size
Spatial upsampler	`Lightricks/LTX-2.3`	~1 GB
Distilled LoRA	`Lightricks/LTX-2.3`	~7.6 GB

Note: Gemma requires accepting the license at https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized and logging in with huggingface-cli login before downloading.

After downloading, your directory should contain:

models/
├── ltx-2.3-22b-dev.safetensors                    # 44 GB — base model
├── gemma-3-12b-it-qat-q4_0-unquantized/           # 6 GB — text encoder
│   ├── config.json
│   ├── model*.safetensors
│   ├── tokenizer.model
│   └── ...
├── id-lora-celebvhq-ltx2.3/                       # 1.1 GB — ID-LoRA
│   └── lora_weights.safetensors
├── ltx-2.3-spatial-upscaler-x2-1.1.safetensors    # 1 GB — spatial upsampler (two-stage)
└── ltx-2.3-22b-distilled-lora-384.safetensors      # 7.6 GB — distilled LoRA (two-stage)

3. Install the ltx packages

pip install -e ID-LoRA/ID-LoRA-2.3/packages/ltx-core
pip install -e ID-LoRA/ID-LoRA-2.3/packages/ltx-pipelines
pip install -e ID-LoRA/ID-LoRA-2.3/packages/ltx-trainer

4. Clone this repo into ComfyUI custom_nodes

cd ComfyUI/custom_nodes
git clone <this-repo-url> comfyui-id-lora-ltx

5. Start ComfyUI

python ComfyUI/main.py

Then open http://127.0.0.1:8188 in your browser. The five nodes will appear under the ID-LoRA category in the node menu.

Example workflow templates

Two ready-to-use workflows are included in the example_workflows/ directory:

id_lora_one_stage.json — one-stage pipeline
id_lora_two_stage.json — two-stage pipeline with 2x upsampling

In ComfyUI, go to Browse Templates and look for comfyui-id-lora-ltx to load them. They wire all nodes together with LoadImage, LoadAudio, and SaveVideo.

The example workflows use poster_image.png (first frame) and reference.mp3 (speaker reference audio), which are included in the example_inputs/ directory. Copy them into ComfyUI's input/ folder so the LoadImage and LoadAudio nodes can find them:

cp ComfyUI/custom_nodes/comfyui-id-lora-ltx/example_inputs/poster_image.png ComfyUI/input/
cp ComfyUI/custom_nodes/comfyui-id-lora-ltx/example_inputs/reference.mp3 ComfyUI/input/

Then load either workflow template from ComfyUI and click Queue Prompt to run.

Nodes

ID-LoRA Model Loader

Loads the base LTX-2.3 checkpoint, Gemma text encoder, and ID-LoRA weights into a one-stage pipeline object. This is the slow/expensive node — ComfyUI caches its output when inputs don't change.

Input	Type	Default	Description
`checkpoint_path`	String	`models/ltx-2.3-22b-dev.safetensors`	Path to LTX-2.3 checkpoint
`text_encoder_path`	String	`models/gemma-3-12b-it-qat-q4_0-unquantized`	Path to Gemma text encoder directory
`lora_path`	String	(empty)	Path to ID-LoRA `.safetensors`
`lora_strength`	Float	1.0	LoRA strength (0.0-2.0)
`quantize`	Combo	`none`	`none`, `int8`, or `fp8`
`stg_scale`	Float	1.0	Spatio-Temporal Guidance scale (0 disables)
`identity_guidance_scale`	Float	3.0	Identity guidance strength
`av_bimodal_scale`	Float	3.0	Audio-video bimodal CFG scale

Output: ID_LORA_PIPELINE

ID-LoRA Two-Stage Model Loader

Loads the base checkpoint, text encoder, ID-LoRA, spatial upsampler, and distilled LoRA for two-stage generation. Like the one-stage loader, ComfyUI caches this node.

Input	Type	Default	Description
`checkpoint_path`	String	`models/ltx-2.3-22b-dev.safetensors`	Path to LTX-2.3 checkpoint
`text_encoder_path`	String	`models/gemma-3-12b-it-qat-q4_0-unquantized`	Path to Gemma text encoder directory
`lora_path`	String	(empty)	Path to ID-LoRA `.safetensors`
`lora_strength`	Float	1.0	LoRA strength (0.0-2.0)
`upsampler_path`	String	`models/ltx-2.3-spatial-upscaler-x2-1.1.safetensors`	Path to spatial upsampler
`distilled_lora_path`	String	`models/ltx-2.3-22b-distilled-lora-384.safetensors`	Path to distilled LoRA for stage 2
`quantize`	Combo	`none`	`none`, `int8`, or `fp8`
`stg_scale`	Float	1.0	Spatio-Temporal Guidance scale (0 disables)
`identity_guidance_scale`	Float	3.0	Identity guidance strength
`av_bimodal_scale`	Float	3.0	Audio-video bimodal CFG scale

Output: ID_LORA_PIPELINE

ID-LoRA Prompt Encoder

Encodes positive and negative text prompts into conditioning tensors. Works with both one-stage and two-stage model loaders. Loads the text encoder temporarily, encodes, then frees it to save VRAM.

Input	Type	Default	Description
`pipeline`	ID_LORA_PIPELINE	—	From either Model Loader
`prompt`	String (multiline)	(empty)	Positive prompt
`negative_prompt`	String (multiline)	(default negative)	Negative prompt

Output: ID_LORA_CONDITIONING

ID-LoRA One-Stage Sampler

Runs the one-stage generation pipeline: denoising loop with identity guidance, then decodes video and audio. Outputs a ComfyUI VIDEO that you wire directly to the built-in Save Video node.

Input	Type	Default	Description
`pipeline`	ID_LORA_PIPELINE	—	From Model Loader
`conditioning`	ID_LORA_CONDITIONING	—	From Prompt Encoder
`first_frame`	IMAGE (optional)	None	First-frame image for face conditioning
`reference_audio`	AUDIO (optional)	None	Reference speaker audio for identity transfer
`seed`	Int	42	Random seed
`height`	Int	512	Output height (multiple of 32)
`width`	Int	512	Output width (multiple of 32)
`num_frames`	Int	121	Number of frames to generate
`num_inference_steps`	Int	30	Denoising steps
`frame_rate`	Float	25.0	Output frame rate
`video_guidance_scale`	Float	3.0	Video CFG scale
`audio_guidance_scale`	Float	7.0	Audio CFG scale
`auto_resolution`	Boolean	True	Auto-detect resolution from first frame, preserving aspect ratio
`max_resolution`	Int	512	Target long-side resolution when auto_resolution is on — increase for higher res output

Output: VIDEO (wire to Save Video)

ID-LoRA Two-Stage Sampler

Runs the two-stage generation pipeline. Stage 1 generates at the specified resolution with full guidance, then stage 2 refines at 2x resolution with the distilled LoRA. Output resolution is 2x the specified height/width.

Input	Type	Default	Description
`pipeline`	ID_LORA_PIPELINE	—	From Two-Stage Model Loader
`conditioning`	ID_LORA_CONDITIONING	—	From Prompt Encoder
`first_frame`	IMAGE (optional)	None	First-frame image for face conditioning
`reference_audio`	AUDIO (optional)	None	Reference speaker audio for identity transfer
`seed`	Int	42	Random seed
`height`	Int	512	Stage 1 height (output = 2x this value)
`width`	Int	512	Stage 1 width (output = 2x this value)
`num_frames`	Int	121	Number of frames to generate
`num_inference_steps`	Int	30	Stage 1 denoising steps (stage 2 uses 3 fixed steps)
`frame_rate`	Float	25.0	Output frame rate
`video_guidance_scale`	Float	3.0	Video CFG scale
`audio_guidance_scale`	Float	7.0	Audio CFG scale
`auto_resolution`	Boolean	True	Auto-detect resolution from first frame, preserving aspect ratio
`max_resolution`	Int	512	Target long-side resolution when auto_resolution is on — increase for higher res output (final output is 2x)
`hq_mode`	Boolean	True	Use res2s second-order sampler (higher quality, slower)

Output: VIDEO (wire to Save Video)

Prompt format

Prompts use a structured [VISUAL] / [SPEECH] / [SOUNDS] format:

[VISUAL]: A medium shot of a young man with curly brown hair, sitting on a beige couch.
He is wearing a light blue shirt and speaking warmly.
[SPEECH]: We are proud to introduce ID-LoRA.
[SOUNDS]: The speaker has a moderate volume and conversational tone. Light instrumental
background music plays softly.

All three sections are optional but recommended for best results.

Example workflows

One-stage

Add ID-LoRA Model Loader — set paths to checkpoint, text encoder, and LoRA
Add ID-LoRA Prompt Encoder — connect the pipeline, write your prompt
Add Load Image + Load Audio — load face image and reference audio
Add ID-LoRA One-Stage Sampler — connect everything
Add Save Video — connect the video output

Two-stage (2x upsampling)

Add ID-LoRA Two-Stage Model Loader — set paths (including upsampler and distilled LoRA)
Add ID-LoRA Prompt Encoder — connect the pipeline, write your prompt
Add Load Image + Load Audio — load face image and reference audio
Add ID-LoRA Two-Stage Sampler — connect everything, optionally enable hq_mode
Add Save Video — connect the video output

The output resolution will be 2x the stage 1 resolution (e.g. 512x512 stage 1 -> 1024x1024 output).

Reducing memory usage

The full pipeline is memory-intensive — the 22B parameter transformer alone requires significant VRAM. If you run into out-of-memory errors, here are several ways to reduce usage, roughly ordered from least to most impact on quality:

Enable quantization — Set quantize to int8 or fp8 on the Model Loader node. This reduces the transformer's memory footprint substantially with a small quality tradeoff. int8 is recommended as a good balance; fp8 saves slightly more memory but may introduce more artifacts.
Lower max_resolution — Reducing from 768 to 512 (or lower) significantly reduces VRAM during generation, since memory scales quadratically with resolution. For two-stage, this also reduces the stage-2 resolution proportionally (e.g. 512 → 1024 output instead of 768 → 1536).
Reduce num_frames — Fewer frames means less latent state in memory. Try 121 instead of 242 if memory is tight.
Disable HQ mode (two-stage only) — The res2s sampler uses more memory than Euler due to its second-order midpoint evaluation. Turning off hq_mode saves memory at the cost of some quality.
Use one-stage instead of two-stage — The two-stage pipeline requires loading two separate transformers sequentially and upsampling the latent to 2x, which has higher peak memory than one-stage.

These options can be combined. For example, int8 quantization + max_resolution=512 + one-stage should run on 24 GB GPUs.

Notes

Model paths can be relative or absolute. Relative paths are resolved from the repository root (3 levels above the custom node directory).
Auto-resolution (enabled by default) matches the first frame's aspect ratio while capping the long side at max_resolution (default 512px). To generate at higher resolution, increase max_resolution on the sampler node — the aspect ratio is always preserved. For the two-stage pipeline, this controls stage 1; the final output will be 2x that (e.g. max_resolution=768 produces up to 1536px output).
Two-stage VRAM: Stage-1 models (transformer, audio encoder) are fully freed before loading stage-2 models. The video encoder is shared for upsampling then freed.
HQ mode (two-stage only): Uses the res2s second-order sampler instead of Euler in both stages for higher quality at the cost of speed.
Reusable pipeline (one-stage only): The video encoder is stashed to CPU (not deleted) after encoding, so re-running with different prompts/seeds doesn't require a full reload.

File structure

comfyui-id-lora-ltx/
├── __init__.py              # Extension entry point (registers all 5 nodes)
├── nodes_model_loader.py    # IDLoraModelLoader + IDLoraTwoStageModelLoader nodes
├── nodes_prompt_encoder.py  # IDLoraPromptEncoder node (shared)
├── nodes_sampler.py         # IDLoraOneStageSampler + IDLoraTwoStageSampler nodes
├── pipeline_wrapper.py      # _IDLoraBase, IDLoraOneStagePipeline, IDLoraTwoStagePipeline
├── example_inputs/
│   ├── poster_image.png         # Example first-frame image
│   └── reference.mp3            # Example reference audio
├── example_outputs/
│   └── two_stage_demo.mp4       # Sample two-stage output
├── example_workflows/
│   ├── id_lora_one_stage.json   # One-stage workflow template
│   └── id_lora_two_stage.json   # Two-stage workflow template
├── pyproject.toml           # Package metadata
├── requirements.txt         # Dependencies
└── README.md

📝 Citation

bibtex
@misc{dahan2026idloraidentitydrivenaudiovideopersonalization,
  title     = {ID-LoRA: Identity-Driven Audio-Video Personalization
               with In-Context LoRA},
  author    = {Aviad Dahan and Moran Yanuka and Noa Kraicer and Lior Wolf and Raja Giryes},
  year      = {2026},
  eprint    = {2603.10256},
  archivePrefix = {arXiv},
  primaryClass  = {cs.SD},
  url       = {https://arxiv.org/abs/2603.10256}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

comfyui-id-lora-ltx

Demo

What it does

Requirements

Installation

1. Clone ID-LoRA

2. Download models

3. Install the ltx packages

4. Clone this repo into ComfyUI custom_nodes

5. Start ComfyUI

Example workflow templates

Nodes

ID-LoRA Model Loader

ID-LoRA Two-Stage Model Loader

ID-LoRA Prompt Encoder

ID-LoRA One-Stage Sampler

ID-LoRA Two-Stage Sampler

Prompt format

Example workflows

One-stage

Two-stage (2x upsampling)

Reducing memory usage

Notes

File structure

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
example_inputs		example_inputs
example_outputs		example_outputs
example_workflows		example_workflows
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
nodes_model_loader.py		nodes_model_loader.py
nodes_prompt_encoder.py		nodes_prompt_encoder.py
nodes_sampler.py		nodes_sampler.py
pipeline_wrapper.py		pipeline_wrapper.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

comfyui-id-lora-ltx

Demo

What it does

Requirements

Installation

1. Clone ID-LoRA

2. Download models

3. Install the ltx packages

4. Clone this repo into ComfyUI custom_nodes

5. Start ComfyUI

Example workflow templates

Nodes

ID-LoRA Model Loader

ID-LoRA Two-Stage Model Loader

ID-LoRA Prompt Encoder

ID-LoRA One-Stage Sampler

ID-LoRA Two-Stage Sampler

Prompt format

Example workflows

One-stage

Two-stage (2x upsampling)

Reducing memory usage

Notes

File structure

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages