Add Qwen3.6-27B contrib model with vLLM APC baseline#164
Open
m-deepankar-singh wants to merge 3 commits into
Open
Add Qwen3.6-27B contrib model with vLLM APC baseline#164m-deepankar-singh wants to merge 3 commits into
m-deepankar-singh wants to merge 3 commits into
Conversation
|
Working with our team to evaluate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Relationship to PR #140
This PR keeps the same high-level Qwen3.6 architecture described in PR #140: Qwen3.6-27B is a post-training update of Qwen3.5-27B with the same
qwen3_5architecture, 64 layers, and[3 DeltaNet + 1 GQA]pattern. The additional focus here is the production serving path:Architecture
[3 DeltaNet + 1 GQA] x 16Files
Test Results
Static Checks
git diff --check: PASSpython3 -m py_compileon model, kernel, vLLM helper, and OpenAI server Python files: PASSbash -non vLLM shell helpers: PASSLocal unit-test execution on the Mac checkout is blocked because the Neuron runtime packages (
neuronx_distributed) are not installed there. Hardware/unit validation below was run on Trn2 with the Neuron inference environment.Unit Coverage
The contrib includes 57 CPU unit tests:
test_config.pytest_weight_conversion.pytest_hybrid_cache_manager.pytest_deltanet_decay.pyCoverage includes config parsing, Qwen3.6/Qwen3.5 architectural compatibility, weight conversion, q/gate split handling, RMSNorm +1 conversion, hybrid cache allocation, DeltaNet state cache shapes, and decay handling.
Long-Context vLLM/APC Validation
Hardware:
trn2.3xlarge, TP=4, LNC=2, SDK 2.29, vLLM Neuron plugin path, MLP-only FP8 artifact, CTE bucket 512.APC / Prefix Cache Validation
Native vLLM APC was validated with exact greedy output matches.
Shared-prefix concurrency at 1/2/4 requests returned all requested markers exactly. The current artifact is compiled for
max_num_seqs=1, so requests queue rather than true multi-sequence batching.Notes and Limitations
Checklist
contrib/models/Qwen3.6-27B/