Skip to content

Hunyuan Video Lora Convert (OneTrainer to ComfyUI)#1374

Open
Calamdor wants to merge 11 commits intoNerogar:masterfrom
Calamdor:HYV_Lora_Refactor
Open

Hunyuan Video Lora Convert (OneTrainer to ComfyUI)#1374
Calamdor wants to merge 11 commits intoNerogar:masterfrom
Calamdor:HYV_Lora_Refactor

Conversation

@Calamdor
Copy link
Copy Markdown
Contributor

This pull request adds an option to the convert tool in OneTrainer.

When Hunyuan Video is selected, it is now possible to select "ComfyUI Lora" for the output format. This option is hidden for other models.
image

This will only convert the blocks to the ComfyUI format, and strip the rest of the keys.
In testing, trying to map the other keys (which are conditioning keys) only caused degradation of output generation quality. Maybe they have a use, but no other community Lora that I have seen for Hunyuan Video have them.

The analysis from Claude Code when working on this on the non-block keys:

 The 67 non-block keys fall into two categories:

  Likely harmless / correct (text processing, independent of inference settings):
  - txt_in.input_embedder — projects LLAMA tokens, portable
  - txt_in.individual_token_refiner.blocks.* — token refinement, text-side only
  - txt_in.c_embedder.*, txt_in.t_embedder.* — text conditioning embedders
  - vector_in.* — pooled CLIP-L vector embedding
  - final_layer.linear — output projection
  - final_layer.adaLN_modulation.1 — final adaptive norm (chunks correctly swapped)

  The problem child:
  - guidance_in.* — embeds the CFG guidance scale value (a scalar → 3072-dim). The LoRA learned how to respond to
  guidance during OT's training setup. But if OT trained with guidance=1.0 and ComfyUI infers with a
  different guidance scale, this LoRA actively distorts the guidance signal → quality regression.

Theoretically, DoRa and LoHa should work as well, and OFTv2 if you have the plugin.

This does not change normal OT HYV Lora generation.
This will be a user initiated action that has some pain points (mostly, if you have many Lora's to convert).
Theoretically, ComfyUI format Lora's will load into OT with the changes done, but this is not a use case I work with. (Lora tab, base model)
The llama keys (a theoretical use case for TE training) are moved to lora_llama to allow te1 to remain CLIP for ComfyUI.
This is not how OT handles this (te1=llama).

Feedback welcome.

Calamdor and others added 5 commits March 15, 2026 10:30
Implements ModelFormat.COMFY_LORA save support so OT-trained HYV LoRAs
can be loaded directly in ComfyUI without manual key remapping.

Changes:
- convert_hunyuan_video_lora.py: add convert_hunyuan_video_lora_comfyui_key_sets()
  with transformer root prefix "transformer" (dot notation, no lora_ prefix),
  CLIP-L mapped to lora_te1_* (ComfyUI convention), LLAMA to lora_llama_*
  (distinct prefix to avoid collision with CLIP's lora_te1_*)

- LoRASaverMixin: implement COMFY_LORA case in _save(); add _get_comfyui_convert_key_sets()
  override point (defaults to standard key sets) and __save_comfyui() which
  calls convert_to_diffusers() with the ComfyUI key sets

- HunyuanVideoLoRASaver: override _get_comfyui_convert_key_sets() to return
  the HYV-specific ComfyUI key sets

- LoRALoaderMixin: add _preprocess_state_dict() no-op hook called before key
  conversion in both safetensors and ckpt load paths

- HunyuanVideoLoRALoader: override _preprocess_state_dict() to remap ComfyUI
  format keys back to OT diffusers format for round-trip loading
  (transformer.* -> lora_transformer.*, lora_te1_* -> lora_te2_*,
  lora_llama_* -> lora_te1_*)

- ConvertModelUI: add "ComfyUI LoRA" to output format dropdown, shown only
  when Hunyuan Video model type is selected; resets to Safetensors if model
  type is switched away from HYV while ComfyUI LoRA is active

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
__detect_source failed to identify the source format when using ComfyUI
key sets against OT's internal diffusers state dict, because the prefix
mappings differ (lora_te2 vs lora_te1, lora_transformer vs transformer,
etc). With source='', __convert used in_prefix='' which matched every key
against the first key set (bundle_emb), prepending 'bundle_emb' to all
output keys.

Fix: two-step conversion in __save_comfyui — normalize to OMI first using
the standard key sets (which correctly identify the source format), then
convert from OMI to ComfyUI diffusers format using the ComfyUI key sets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rging

Complete rewrite of the ComfyUI conversion based on inspecting a working
ComfyUI HunyuanVideo LoRA and ComfyUI source (comfy/lora.py, comfy/ldm/flux/layers.py):

Key format findings:
- ComfyUI uses native HYV attribute names (OMI paths), not HuggingFace diffusers names
- transformer.double_blocks/single_blocks with fc1/fc2 (not fc0) for MLP first linear
- lora_A.weight / lora_B.weight naming (diffusers-pipe convention), no alpha tensor
- CLIP-L: legacy underscore format lora_te1_* (lora_down/lora_up/alpha)
- QKV: combined single weight in ComfyUI vs separate Q/K/V in OT (HF diffusers)

QKV combination (lossless, exact):
- OT trains separate LoRA adapters for Q, K, V (and proj_mlp for single blocks)
  because HF diffusers splits the combined linear1/qkv weight into sub-layers
- ComfyUI applies one LoRA to the combined weight
- Fix: build block-diagonal lora_B and concatenated scaled lora_A so that
  lora_B @ lora_A reproduces the sum of original per-component deltas exactly
- Alpha scaling baked into lora_A per component (alpha_i/rank_i), no alpha tensor
  in output — ComfyUI applies at scale 1.0 which is correct

Other path renames:
- .mlp.0. -> .mlp.fc1., .mlp.2. -> .mlp.fc2. (time_in, guidance_in)
- .fc0. -> .fc1. (img_mlp, txt_mlp, token refiner mlp first linear)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MLPEmbedder layers (guidance_in, time_in, txt_in embedders) use
in_layer/out_layer in ComfyUI's native model, not mlp.fc1/mlp.fc2
or linear_1/linear_2. Adds specific renames for these layers in
the ComfyUI export path and their reversal in the round-trip loader.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Conditioning/embedding layers (guidance_in, time_in, txt_in, vector_in,
final_layer) are inference-setting-dependent and absent from all reference
ComfyUI HYV LoRAs. Including them caused quality degradation. This matches
the established community convention of blocks-only HYV LoRAs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dxqb
Copy link
Copy Markdown
Collaborator

dxqb commented Mar 15, 2026

does Comfy need a Comfy-specific format for LoRA, or is this only a LoRA format derived from the non-diffusers model?
could you have a look at this please, and see if a conversion pattern would be sufficient? #1236

It's quite new and only used for Flux2 checkpoints currently. But it replaces all the model-specific conversion code.

@Calamdor
Copy link
Copy Markdown
Contributor Author

Sorry, I totally forgot about that ongoing work and you have tagged me in it a few times. Let me review.

Calamdor and others added 2 commits March 15, 2026 12:56
Replace hand-rolled QKV grouping and _combine_qkv() with the
lora_qkv_fusion / lora_qkv_mlp_fusion helpers from convert_util
(PR Nerogar#1236). Key routing is now expressed as declarative
ConversionPattern tuples processed by convert_util.convert(),
making the structure consistent with the Flux2 conversion approach.

DoRA scale merging and text-encoder legacy remapping are kept
separate as convert_util doesn't handle those cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Calamdor
Copy link
Copy Markdown
Contributor Author

Changed.
Note:

  - Kept dora_scale concatenation (framework doesn't handle it) and text encoder legacy remapping (framework doesn't
  support dot→underscore)

The dora scale I cannot speak to
But the text encoder legacy remapping is just a theoretical use case in my opinion, and included more as a complete "FULL" conversion which I have already changed by removing non block keys. I am thinking to remove them as well.

… attrs

- convert_util strict=False: avoids runtime crash when converting LoRAs
  trained on non-standard layer subsets (unknown keys pass through instead
  of raising).
- Remove "self_attn_qkv" from _COMFYUI_QKV_SPLIT_ATTRS: token refiner
  keys are filtered out in step 2 (blocks-only filter) and can never
  reach the DoRA merge step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@dxqb dxqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete review, just 1 comment

for k, v in state_dict.items():
if k.startswith("transformer."):
# Reverse ComfyUI conditioning MLPEmbedder renames → OT OMI paths
k = k.replace(".guidance_in.in_layer.", ".guidance_in.mlp.0.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still code that I'd like to avoid, given the new conversion tool.
I'm not 100% sure if this is possible with the current conversion tool because it's incomplete, but still I'd like to avoid merging new code that does manual string manipulation like this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have attempted to refactor the PR to better handle this comment.
This is now just a OneTrainer -> Comfy/Community Lora convert workflow. Although Lora's going through this workflow will still be able to be loaded back into OneTrainer with edits to HunyuanVideoLoRALoader._preprocess_state_dict.
Convert_util has changes (including 1 bug for children set to none) to allow unfuse. (lora_unfuse_qkv/lora_unfuse_qkv_mlp).
Let me know of any more comments and thanks!

@dxqb dxqb added merging last steps before merge and removed merging last steps before merge labels Mar 28, 2026
Calamdor and others added 2 commits March 28, 2026 14:56
…on patterns

- lora_unfuse_qkv: inverse of lora_fuse_qkv. Extracts the three diagonal
  blocks from OT's block-diagonal QKV adapter back into separate Q, K, V
  adapters. Exact round-trip with lora_fuse_qkv.

- lora_unfuse_qkv_mlp: inverse of lora_fuse_qkv_mlp. Recovers per-component
  dims from the block-diagonal structure via column-block non-zero detection,
  so Q/K/V and MLP dims need not be equal.

- _not_implemented: sentinel for the non-reversible fallback patterns in
  lora_qkv_mlp_fusion (qkv-only and mlp-only cases), so
  reverse_conversion_pattern() is constructable for all patterns without
  raising at construction time.

- Wire lora_unfuse_qkv / lora_unfuse_qkv_mlp as reverse_convert_fn in
  lora_qkv_fusion / lora_qkv_mlp_fusion.

- Fix reverse_conversion_pattern crash when input.children is None.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Calamdor Calamdor requested a review from dxqb March 28, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants