Add HunYuan Dense V1 (hunyuan_v1_dense) model support#2144
Open
anilmartha wants to merge 22 commits into
Open
Conversation
Adds builder and runtime support for tencent/HY-MT series models (HunYuanDenseV1ForCausalLM). Key implementation details: - New HunyuanDenseV1Model builder (src/python/py/models/builders/hunyuan.py) that overrides make_attention_qk_subgraph to apply QK norms AFTER RoPE, matching the model's architecture (standard base class applies QK norm before). - Dynamic NTK-alpha RoPE scaling baked into static rope_theta at export time: effective_theta = rope_theta * alpha^(head_dim/(head_dim-2)) - Forces disable_qkv_fusion and use_rope_in_attn=False to enable separate Q/K paths required for post-RoPE QK norm insertion. - Registers "hunyuan_v1_dense" in the LLM array in model_type.h (size 21->22). - Includes example inference script using HF tokenizer for correct special-token handling. Supports 1.8B and 7B variants of tencent/HY-MT1.5 (same architecture class). Requires transformers>=4.57 for native HunYuanDenseV1 support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Contributor
Author
|
@microsoft-github-policy-service agree company="AMD" |
Contributor
|
In the older PR, I added several review comments. Let's fix them in this PR. |
…oft#2045) Split the monolithic make_attention_qk_subgraph in base.py into three composable methods that subclasses can override individually: make_attention_qk_norm(layer_id, attention) Makes Q/K SimplifiedLayerNorm nodes when q_norm/k_norm are set. make_attention_qk_rope(layer_id, **kwargs) -> (cos_cache, sin_cache) Makes RotaryEmbedding nodes (or caches for use_rope_in_attn). make_attention_qk_rope_and_norm(layer_id, attention, **kwargs) Calls norm then rope (base order); returns cache names. make_attention_qk_subgraph now delegates to make_attention_qk_rope_and_norm so the rest of the method (repeat_kv, sinks, attention op) is unchanged. HunyuanDenseV1Model no longer overrides the full make_attention_qk_subgraph. It only overrides make_attention_qk_rope_and_norm to reverse the order (RoPE first, then QK norm) which is the Hunyuan-specific requirement. Also explicitly sets attention_attrs["q_norm"] = True and ["k_norm"] = True after super().__init__() so make_attention_qk_norm correctly emits the QK norm nodes — these flags default to False in base.py and are never auto-detected, every model using QK norm must set them explicitly. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
…ft#2045) Replace explicit disable_qkv_fusion extra option with make_attention_init override that sets q_norm and k_norm before the base class packed matmul check, matching the Qwen3Model pattern. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Contributor
Author
|
Hi @kunal-vaishnavi |
1ff8493 to
bbe7f2e
Compare
Extract the fused RoPE support check into an overridable method on the base Model class, replacing the inline self.ep not in [dml] check in make_attention_init(). HunyuanDenseV1Model overrides it to return False since it needs explicit RotaryEmbedding nodes to insert QK norms between RoPE and attention. Addresses PR microsoft#2144 review feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
kunal-vaishnavi
previously approved these changes
May 21, 2026
auto-merge was automatically disabled
May 21, 2026 09:35
Head branch was pushed to by a user without write access
kunal-vaishnavi
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds model-builder and runtime support for Tencent's HY-MT series (architecture class:
HunYuanDenseV1ForCausalLM), covering both the 1.8B and 7B parameter variants.Changes
src/python/py/models/builders/hunyuan.py— NewHunyuanDenseV1Modelbuilder subclassingModel(base). Key overrides:make_attention_qk_subgraphto apply QK norms (query/key LayerNorm) after RoPE — the correct order for this architecture (base class applies them before).rope_thetaat export time:effective_theta = rope_theta × α^(head_dim / (head_dim − 2))rope_scalingso the standard RoPE codepath is used.disable_qkv_fusion=Trueanduse_rope_in_attn=Falseto create the separate Q/K paths required for post-RoPE QK norm insertion.src/models/model_type.h— Registers"hunyuan_v1_dense"in the LLM model-type array (21 → 22 entries).src/python/py/models/builder.pyandsrc/python/py/models/builders/__init__.py— WireHunyuanDenseV1Modelunder thehunyuan_v1_densemodel-type key.examples/python/test_hy_mt.py— Example inference script using the HF tokenizer for correct special-token handling.Architecture Notes
HunYuan Dense V1 differs from Llama-style models in two ways:
All weight names are standard (no custom mapping needed).
Requirement:
transformers >= 4.57forHunYuanDenseV1ForCausalLM.