Skip to content

Add InternVL3-8B-Instruct contrib model#163

Open
jimburtoft wants to merge 1 commit into
aws-neuron:mainfrom
jimburtoft:contrib/internvl3-8b
Open

Add InternVL3-8B-Instruct contrib model#163
jimburtoft wants to merge 1 commit into
aws-neuron:mainfrom
jimburtoft:contrib/internvl3-8b

Conversation

@jimburtoft
Copy link
Copy Markdown
Contributor

Summary

  • Add InternVL3-8B-Instruct VLM contrib (InternViT-300M vision encoder + Qwen2.5-7B text backbone)
  • Validated on trn2.3xlarge TP=4 with SDK 2.29: 75.1 tok/s decode, 138ms TTFT, cosine 0.9984 vs CPU
  • 1.85x faster output throughput vs NVIDIA L40S GPU
  • Includes vision encoder compilation, text backbone with vision embedding injection, vLLM integration patches, accuracy tests, and GPU benchmark scripts

Model Details

Component Details
Vision encoder InternViT-300M-448px-V2.5 (24 layers, traced via torch_neuronx)
Projector Pixel shuffle + 2-layer MLP
Text backbone Qwen2.5-7B (NxDI NeuronBaseForCausalLM)
Framework NeuronBaseForImageToText

Files

  • src/ — 3-file VLM implementation (main, text, vision)
  • test/ — Integration tests
  • vllm/ — vLLM-neuron patches for serving
  • Benchmark/validation scripts (accuracy, scaling, NKI, GPU comparison)

Validation

  • CTE logit comparison vs CPU FP32: cosine=0.9984, top-1 match
  • TKG text generation: correct output
  • Multimodal (text + image): end-to-end working
  • Sequence lengths: 2K-32K validated

VLM (InternViT-300M + Qwen2.5-7B) on trn2.3xlarge TP=4.
75.1 tok/s decode, 138ms TTFT, cosine 0.9984 vs CPU.
Includes vision encoder, text backbone, vLLM patches,
accuracy tests, and GPU benchmark comparison (1.85x vs L40S).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant