JIT: Improve codegen for Vector128/256.NarrowWithSaturation#126226
Open
saucecontrol wants to merge 6 commits intodotnet:mainfrom
Open
JIT: Improve codegen for Vector128/256.NarrowWithSaturation#126226saucecontrol wants to merge 6 commits intodotnet:mainfrom
saucecontrol wants to merge 6 commits intodotnet:mainfrom
Conversation
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors x86/x64 SIMD vector conversion intrinsic selection into a shared helper and adds missing fast paths for Vector128/256.NarrowWithSaturation in non-AVX512 environments, reducing instruction count and code size for several narrow-with-saturation cases.
Changes:
- Introduce
GenTreeHWIntrinsic::GetHWIntrinsicIdForVectorConvert(...)to centralize lookup of conversion-related intrinsics (including optional saturating preference). - Improve
Vector128/256.NarrowWithSaturationcodegen on pre-AVX512 machines by using pack-based sequences where applicable. - Refactor existing conversion/widen/narrow construction to use the shared lookup helper instead of duplicated switch logic.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/coreclr/jit/hwintrinsicxarch.cpp | Uses the new conversion lookup helper and adds optimized pack-based paths for NarrowWithSaturation on non-AVX512. |
| src/coreclr/jit/gentree.h | Declares the new shared vector-convert intrinsic lookup helper. |
| src/coreclr/jit/gentree.cpp | Implements the helper and refactors several SIMD convert/narrow/widen paths to use it. |
This was referenced Mar 27, 2026
This was referenced Mar 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #116526
This adds some missing optimized paths for
NarrowWithSaturationintrinsics in pre-AVX-512 environments.Vector128.NarrowWithSaturationwas fully accelerated for signed types but not unsigned:vbroadcastss xmm0, dword ptr [reloc @RWD00] vpminuw xmm1, xmm0, xmmword ptr [rdx] - vpand xmm1, xmm1, xmm0 - vpminuw xmm2, xmm0, xmmword ptr [r8] - vpand xmm0, xmm2, xmm0 + vpminuw xmm0, xmm0, xmmword ptr [r8] vpackuswb xmm0, xmm1, xmm0 vmovups xmmword ptr [rcx], xmm0 mov rax, rcx ret RWD00 dd 00FF00FFh ; 2.34184e-38 -; Total bytes of code 39 +; Total bytes of code 31Vector256.NarrowWithSaturationwas using the slow path for both signed and unsigned:The bulk of the code changes here are from a refactoring of the intrinsic lookup for various vector convert ops. I'm filling in some of the optimization gaps in these intrinsics, and the tangle of logic required to select the right intrinsic is spread out in various places. Having it in a shared method will make it easier to complete the other changes I have planned.
The refactor is split into the first commit, which is zero-diff. Second commit on include the codegen improvements.
Full diffs