Hunyuan Video Lora Convert (OneTrainer to ComfyUI)#1374
Hunyuan Video Lora Convert (OneTrainer to ComfyUI)#1374Calamdor wants to merge 11 commits intoNerogar:masterfrom
Conversation
Implements ModelFormat.COMFY_LORA save support so OT-trained HYV LoRAs can be loaded directly in ComfyUI without manual key remapping. Changes: - convert_hunyuan_video_lora.py: add convert_hunyuan_video_lora_comfyui_key_sets() with transformer root prefix "transformer" (dot notation, no lora_ prefix), CLIP-L mapped to lora_te1_* (ComfyUI convention), LLAMA to lora_llama_* (distinct prefix to avoid collision with CLIP's lora_te1_*) - LoRASaverMixin: implement COMFY_LORA case in _save(); add _get_comfyui_convert_key_sets() override point (defaults to standard key sets) and __save_comfyui() which calls convert_to_diffusers() with the ComfyUI key sets - HunyuanVideoLoRASaver: override _get_comfyui_convert_key_sets() to return the HYV-specific ComfyUI key sets - LoRALoaderMixin: add _preprocess_state_dict() no-op hook called before key conversion in both safetensors and ckpt load paths - HunyuanVideoLoRALoader: override _preprocess_state_dict() to remap ComfyUI format keys back to OT diffusers format for round-trip loading (transformer.* -> lora_transformer.*, lora_te1_* -> lora_te2_*, lora_llama_* -> lora_te1_*) - ConvertModelUI: add "ComfyUI LoRA" to output format dropdown, shown only when Hunyuan Video model type is selected; resets to Safetensors if model type is switched away from HYV while ComfyUI LoRA is active Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
__detect_source failed to identify the source format when using ComfyUI key sets against OT's internal diffusers state dict, because the prefix mappings differ (lora_te2 vs lora_te1, lora_transformer vs transformer, etc). With source='', __convert used in_prefix='' which matched every key against the first key set (bundle_emb), prepending 'bundle_emb' to all output keys. Fix: two-step conversion in __save_comfyui — normalize to OMI first using the standard key sets (which correctly identify the source format), then convert from OMI to ComfyUI diffusers format using the ComfyUI key sets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rging Complete rewrite of the ComfyUI conversion based on inspecting a working ComfyUI HunyuanVideo LoRA and ComfyUI source (comfy/lora.py, comfy/ldm/flux/layers.py): Key format findings: - ComfyUI uses native HYV attribute names (OMI paths), not HuggingFace diffusers names - transformer.double_blocks/single_blocks with fc1/fc2 (not fc0) for MLP first linear - lora_A.weight / lora_B.weight naming (diffusers-pipe convention), no alpha tensor - CLIP-L: legacy underscore format lora_te1_* (lora_down/lora_up/alpha) - QKV: combined single weight in ComfyUI vs separate Q/K/V in OT (HF diffusers) QKV combination (lossless, exact): - OT trains separate LoRA adapters for Q, K, V (and proj_mlp for single blocks) because HF diffusers splits the combined linear1/qkv weight into sub-layers - ComfyUI applies one LoRA to the combined weight - Fix: build block-diagonal lora_B and concatenated scaled lora_A so that lora_B @ lora_A reproduces the sum of original per-component deltas exactly - Alpha scaling baked into lora_A per component (alpha_i/rank_i), no alpha tensor in output — ComfyUI applies at scale 1.0 which is correct Other path renames: - .mlp.0. -> .mlp.fc1., .mlp.2. -> .mlp.fc2. (time_in, guidance_in) - .fc0. -> .fc1. (img_mlp, txt_mlp, token refiner mlp first linear) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MLPEmbedder layers (guidance_in, time_in, txt_in embedders) use in_layer/out_layer in ComfyUI's native model, not mlp.fc1/mlp.fc2 or linear_1/linear_2. Adds specific renames for these layers in the ComfyUI export path and their reversal in the round-trip loader. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Conditioning/embedding layers (guidance_in, time_in, txt_in, vector_in, final_layer) are inference-setting-dependent and absent from all reference ComfyUI HYV LoRAs. Including them caused quality degradation. This matches the established community convention of blocks-only HYV LoRAs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
does Comfy need a Comfy-specific format for LoRA, or is this only a LoRA format derived from the non-diffusers model? It's quite new and only used for Flux2 checkpoints currently. But it replaces all the model-specific conversion code. |
|
Sorry, I totally forgot about that ongoing work and you have tagged me in it a few times. Let me review. |
Replace hand-rolled QKV grouping and _combine_qkv() with the lora_qkv_fusion / lora_qkv_mlp_fusion helpers from convert_util (PR Nerogar#1236). Key routing is now expressed as declarative ConversionPattern tuples processed by convert_util.convert(), making the structure consistent with the Flux2 conversion approach. DoRA scale merging and text-encoder legacy remapping are kept separate as convert_util doesn't handle those cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Changed. The dora scale I cannot speak to |
… attrs - convert_util strict=False: avoids runtime crash when converting LoRAs trained on non-standard layer subsets (unknown keys pass through instead of raising). - Remove "self_attn_qkv" from _COMFYUI_QKV_SPLIT_ATTRS: token refiner keys are filtered out in step 2 (blocks-only filter) and can never reach the DoRA merge step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb
left a comment
There was a problem hiding this comment.
incomplete review, just 1 comment
| for k, v in state_dict.items(): | ||
| if k.startswith("transformer."): | ||
| # Reverse ComfyUI conditioning MLPEmbedder renames → OT OMI paths | ||
| k = k.replace(".guidance_in.in_layer.", ".guidance_in.mlp.0.") |
There was a problem hiding this comment.
this is still code that I'd like to avoid, given the new conversion tool.
I'm not 100% sure if this is possible with the current conversion tool because it's incomplete, but still I'd like to avoid merging new code that does manual string manipulation like this.
There was a problem hiding this comment.
I have attempted to refactor the PR to better handle this comment.
This is now just a OneTrainer -> Comfy/Community Lora convert workflow. Although Lora's going through this workflow will still be able to be loaded back into OneTrainer with edits to HunyuanVideoLoRALoader._preprocess_state_dict.
Convert_util has changes (including 1 bug for children set to none) to allow unfuse. (lora_unfuse_qkv/lora_unfuse_qkv_mlp).
Let me know of any more comments and thanks!
…on patterns - lora_unfuse_qkv: inverse of lora_fuse_qkv. Extracts the three diagonal blocks from OT's block-diagonal QKV adapter back into separate Q, K, V adapters. Exact round-trip with lora_fuse_qkv. - lora_unfuse_qkv_mlp: inverse of lora_fuse_qkv_mlp. Recovers per-component dims from the block-diagonal structure via column-block non-zero detection, so Q/K/V and MLP dims need not be equal. - _not_implemented: sentinel for the non-reversible fallback patterns in lora_qkv_mlp_fusion (qkv-only and mlp-only cases), so reverse_conversion_pattern() is constructable for all patterns without raising at construction time. - Wire lora_unfuse_qkv / lora_unfuse_qkv_mlp as reverse_convert_fn in lora_qkv_fusion / lora_qkv_mlp_fusion. - Fix reverse_conversion_pattern crash when input.children is None. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ainer into HYV_Lora_Refactor
This pull request adds an option to the convert tool in OneTrainer.
When Hunyuan Video is selected, it is now possible to select "ComfyUI Lora" for the output format. This option is hidden for other models.

This will only convert the blocks to the ComfyUI format, and strip the rest of the keys.
In testing, trying to map the other keys (which are conditioning keys) only caused degradation of output generation quality. Maybe they have a use, but no other community Lora that I have seen for Hunyuan Video have them.
The analysis from Claude Code when working on this on the non-block keys:
Theoretically, DoRa and LoHa should work as well, and OFTv2 if you have the plugin.
This does not change normal OT HYV Lora generation.
This will be a user initiated action that has some pain points (mostly, if you have many Lora's to convert).
Theoretically, ComfyUI format Lora's will load into OT with the changes done, but this is not a use case I work with. (Lora tab, base model)
The llama keys (a theoretical use case for TE training) are moved to lora_llama to allow te1 to remain CLIP for ComfyUI.
This is not how OT handles this (te1=llama).
Feedback welcome.