@@ -11,7 +11,7 @@ toc: true
1111---
1212
1313<!--
14- SPDX-FileCopyrightText: Copyright 2011-2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
14+ SPDX-FileCopyrightText: Copyright 2011-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>
1515SPDX-FileCopyrightText: Copyright 2022 Google LLC.
1616CC-BY-SA-4.0 AND Apache-Patent-License
1717See LICENSE.md file for details
@@ -487,6 +487,9 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
487487* Removed all references to Transactional Memory Extension (TME).
488488* Added [**Alpha**](#current-status-and-anticipated-changes) support
489489 for Brain 16-bit floating-point vector multiplication intrinsics.
490+ * Added [**Alpha**](#current-status-and-anticipated-changes)
491+ support for SVE2.3 (FEAT_SVE2p3), SME2.3 (FEAT_SME2p3), FEAT_F16F32DOT,
492+ FEAT_F16F32MM, FEAT_F16MM and FEAT_SVE_B16MM intrinsics.
490493
491494### References
492495
@@ -2003,6 +2006,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
20032006 are available and if the associated [ACLE features]
20042007(#sme-language-extensions-and-intrinsics) are supported.
20052008
2009+ `__ARM_FEATURE_SVE2p3` is defined to 1 if the FEAT_SVE2p3 instructions
2010+ are available and if the associated [ACLE features]
2011+ (#sme-language-extensions-and-intrinsics) are supported.
2012+
20062013#### NEON-SVE Bridge macro
20072014
20082015`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
@@ -2026,6 +2033,7 @@ of SME has an associated preprocessor macro, given in the table below:
20262033| FEAT_SME2 | __ARM_FEATURE_SME2 |
20272034| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
20282035| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |
2036+ | FEAT_SME2p3 | __ARM_FEATURE_SME2p3 |
20292037
20302038Each macro is defined if there is hardware support for the associated
20312039architecture feature and if all of the [ACLE
@@ -2162,6 +2170,20 @@ See [Half-precision brain
21622170floating-point](#half-precision-brain-floating-point) for details
21632171of half-precision brain floating-point types.
21642172
2173+ #### Brain 16-bit floating-point matrix multiplication support
2174+
2175+ This section is in
2176+ [**Alpha** state](#current-status-and-anticipated-changes) and might change or be
2177+ extended in the future.
2178+
2179+ `__ARM_FEATURE_SVE_B16MM` is defined to `1` if there is hardware
2180+ support for the SVE BF16 matrix multiply extension and if the
2181+ associated ACLE intrinsics are available.
2182+
2183+ See [Half-precision brain
2184+ floating-point](#half-precision-brain-floating-point) for details
2185+ of half-precision brain floating-point types.
2186+
21652187### Cryptographic extensions
21662188
21672189#### “Crypto” extension
@@ -2380,6 +2402,18 @@ this implies:
23802402
23812403 * `__ARM_NEON == 1`
23822404
2405+ #### Half-precision to single-precision dot product extension
2406+
2407+ This section is in
2408+ [**Alpha** state](#current-status-and-anticipated-changes) and might change or be
2409+ extended in the future.
2410+
2411+ `__ARM_FEATURE_F16F32DOT` is defined if the half-precision dot product
2412+ accumulating to single-precision instructions are supported and the vector
2413+ intrinsics are available. Note that this implies:
2414+
2415+ * `__ARM_NEON == 1`
2416+
23832417#### Complex number intrinsics
23842418
23852419`__ARM_FEATURE_COMPLEX` is defined if the complex addition and complex
@@ -2425,6 +2459,21 @@ instructions and if the associated ACLE intrinsics are available.
24252459for the SVE 16-bit floating-point to 32-bit floating-point matrix multiply and add
24262460(FEAT_SVE_F16F32MM) instructions and if the associated ACLE intrinsics are available.
24272461
2462+ ##### Multiplication of 16-bit floating-point matrices (AdvSIMD)
2463+
2464+ This section is in
2465+ [**Alpha** state](#current-status-and-anticipated-changes) and might change or be
2466+ extended in the future.
2467+
2468+ `__ARM_FEATURE_F16F32MM` is defined if the NEON half-precision matrix multiply
2469+ accumulating to single-precision instruction is supported. Note that this implies:
2470+
2471+ * `__ARM_NEON == 1`
2472+
2473+ `__ARM_FEATURE_F16MM` is defined if the NEON non-widening half-precision matrix multiply instruction instruction is supported. Note that this implies:
2474+
2475+ * `__ARM_NEON == 1`
2476+
24282477##### Multiplication of 32-bit floating-point matrices
24292478
24302479`__ARM_FEATURE_SVE_MATMUL_FP32` is defined to `1` if there is hardware support
@@ -2662,6 +2711,9 @@ be found in [[BA]](#BA).
26622711| [`__ARM_FEATURE_DIRECTED_ROUNDING`](#directed-rounding) | Directed Rounding | 1 |
26632712| [`__ARM_FEATURE_DOTPROD`](#availability-of-dot-product-intrinsics) | Dot product extension (ARM v8.2-A) | 1 |
26642713| [`__ARM_FEATURE_DSP`](#dsp-instructions) | DSP instructions (Arm v5E) (32-bit-only) | 1 |
2714+ | [`__ARM_FEATURE_F16F32DOT`](#half-precision-to-single-precision-dot-product-extension) | Half-precision to single-precision dot product extension (FEAT_F16F32DOT) | 1 |
2715+ | [`__ARM_FEATURE_F16F32MM`](#multiplication-of-16-bit-floating-point-matrices-advsimd) | Half-precision to single-precision matrix multiply accumulating extension (FEAT_F16F32MM). | 1 |
2716+ | [`__ARM_FEATURE_F16MM`](#multiplication-of-16-bit-floating-point-matrices-advsimd) | Half-precision matrix multiply accumulating extension (FEAT_F16MM) | 1 |
26652717| [`__ARM_FEATURE_FAMINMAX`](#floating-point-absolute-minimum-and-maximum-extension) | Floating-point absolute minimum and maximum extension | 1 |
26662718| [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 |
26672719| [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 |
@@ -2714,6 +2766,7 @@ be found in [[BA]](#BA).
27142766| [`__ARM_FEATURE_SSVE_FP8FMA`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 |
27152767| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
27162768| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
2769+ | [`__ARM_FEATURE_SVE_B16MM`](#brain-16-bit-floating-point-matrix-multiplication-support) | SVE brain 16-bit floating-point matrix multiply extension (FEAT_SVE_B16MM) | 1 |
27172770| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
27182771| [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 |
27192772| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
@@ -2737,6 +2790,7 @@ be found in [[BA]](#BA).
27372790| [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 |
27382791| [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1)
27392792| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2)
2793+ | [`__ARM_FEATURE_SVE2p3`](#sve2) | SVE version 2.3 (FEAT_SVE2p3)
27402794| [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 |
27412795| [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 |
27422796| [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 |
@@ -9594,6 +9648,15 @@ BFloat16 floating-point adjust exponent vectors.
95949648
95959649### SVE2 floating-point matrix multiply-accumulate instructions.
95969650
9651+ #### BFMMLA, FMMLA(non-widening)
9652+
9653+ 16-bit floating-point matrix multiply-accumulate.
9654+ ```c
9655+ // Only if __ARM_FEATURE_SVE_B16MM
9656+ // Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM)
9657+ svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);
9658+ ```
9659+
95979660#### FMMLA (widening, FP8 to FP16)
95989661
95999662Modal 8-bit floating-point matrix multiply-accumulate to half-precision.
@@ -9955,6 +10018,23 @@ Lookup table read with 4-bit indices.
995510018 svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
995610019```
995710020
10021+ ### SVE2.3 lookup table
10022+
10023+ The intrinsics in this section are defined by the header file
10024+ [`<arm_sve.h>`](#arm_sve.h) when `__ARM_FEATURE_SVE2p3` is defined to 1.
10025+
10026+ #### LUTI6
10027+
10028+ Lookup table read with 6-bit indices (8-bit).
10029+
10030+ Use of this intrinsic if `svcntb() * 8 < 256` results in undefined behaviour.
10031+
10032+ ```c
10033+ // Variant is also available for: _u8 _mf8
10034+ svint8_t svluti6[_s8](svint8x2_t table, svuint8_t indices);
10035+ ```
10036+
10037+
995810038### SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions
995910039
996010040The specification for SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions is in
@@ -13548,6 +13628,7 @@ Multi-vector saturating rounding shift right narrow and interleave
1354813628
1354913629``` c
1355013630 // Variants are also available for _u16[_u32_x2]
13631+ // and _s8[_s16_x2] _u8[_u16_x2] if __ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3
1355113632 svint16_t svqrshrn[_n]_s16[_s32_x2](svint32x2_t zn, uint64_t imm);
1355213633 ```
1355313634
@@ -13556,6 +13637,7 @@ Multi-vector saturating rounding shift right narrow and interleave
1355613637Multi-vector saturating rounding shift right unsigned narrow and interleave
1355713638
1355813639``` c
13640+ // Variant for _u8[_u16_x2] is available if __ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3
1355913641 svuint16_t svqrshrun[_n]_u16[_s32_x2](svint32x2_t zn, uint64_t imm);
1356013642 ```
1356113643
@@ -13857,6 +13939,128 @@ Scalar index of first/last true predicate element (predicated).
1385713939
1385813940 ```
1385913941
13942+ ### SVE2.3 and SME2.3 instruction intrinsics
13943+
13944+ The specification for SVE2.3 and SME2.3 are in
13945+ [**Alpha** state](#current-status-and-anticipated-changes) and might change or be
13946+ extended in the future.
13947+
13948+ The functions in this section are defined by either the header file
13949+ [`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
13950+ when `__ARM_FEATURE_SVE2p3` or `__ARM_FEATURE_SME2p3` is defined, respectively.
13951+
13952+ #### ADDQP
13953+
13954+ Add pairwise within quadword vector segments.
13955+
13956+ ``` c
13957+ // Variants are also available for _s16, _s32 and _s64
13958+ svint8_t svaddqp[_s8](svint8_t zn, svint8_t zm);
13959+ ```
13960+
13961+ #### ADDSUBP
13962+
13963+ Add subtract pairwise.
13964+
13965+ ``` c
13966+ // Variants are also available for _s16, _s32 and _s64
13967+ svint8_t svaddsubp[_s8](svint8_t zn, svint8_t zm);
13968+ ```
13969+
13970+ #### LUTI6
13971+
13972+ Lookup table read with 6-bit indices (16-bit).
13973+
13974+ Use of this intrinsic if `svcntb() * 8 < 512` results in undefined behaviour.
13975+
13976+ ``` c
13977+ // Variants are also available for _u16_x2 _f16_x2
13978+ svint16_t svluti6_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
13979+ ```
13980+
13981+ #### FCVTZSN, FCVTZUN
13982+
13983+ Floating-point narrowing convert to interleaved integer, rounding toward zero
13984+
13985+ ``` c
13986+ // Variants are also available for
13987+ // s16[_f32_x2], s32[_f64_x2],
13988+ // u8[_f16_x2], u16[_f32_x2], u32[_f64_x2]
13989+ svint8_t svcvt_s8[_f16_x2](svfloat16x2_t zn);
13990+ ```
13991+
13992+ #### SABAL, UABAL
13993+
13994+ Two-way absolute difference sum and accumulate long.
13995+
13996+ ``` c
13997+ // Variants are also available for
13998+ // s32[_s16], s64[_s32],
13999+ // u16[_u8], u32[_u16], u64[_u32]
14000+ svint16_t svaba_s16[_s8](svint16_t zda, svint8_t zn, svint8_t zm);
14001+ ```
14002+
14003+ #### SCVTF, SCVTFLT, UCVTF, UCVTFLT
14004+
14005+ Integer convert to floating-point (top and bottom).
14006+
14007+ ``` c
14008+ // Variants are also available for
14009+ // f32[_s16], f64[_s32],
14010+ // f16[_u8], f32[_u16], f64[_u32]
14011+
14012+ svfloat16_t svcvtt_f16[_s8](svint8_t zn);
14013+ svfloat16_t svcvtb_f16[_s8](svint8_t zn);
14014+ ```
14015+
14016+ #### SDOT, UDOT (vectors)
14017+
14018+ Integer dot-product (2-way, vectors).
14019+
14020+ ``` c
14021+ // Variants are also available for _u16_u8
14022+ svint16_t svdot[_s16_s8](svint16_t zda, svint8_t zn, svint8_t zm);
14023+ ```
14024+
14025+ #### SDOT, UDOT (indexed)
14026+
14027+ Integer dot product by indexed element (two-way).
14028+
14029+ ``` c
14030+ // Variants are also available for _u16_u8
14031+ svint16_t svdot_lane[_s16_s8](svint16_t zda, svint8_t zn, svint8_t zm,
14032+ uint64_t imm_idx);
14033+ ```
14034+
14035+ #### SQSHRN, UQSHRN
14036+
14037+ Multi-vector saturating shift right narrow and interleave.
14038+
14039+ ``` c
14040+ // Variants are also available for
14041+ // _u16[_u32_x2]
14042+ // _s8[_s16_x2] _u8[_u16_x2]
14043+ svint16_t svqshrn[_n]_s16[_s32_x2](svint32x2_t zn, uint64_t imm);
14044+ ```
14045+
14046+ #### SQSHRUN
14047+
14048+ Signed saturating shift right narrow by immediate to interleaved unsigned integer.
14049+
14050+ ``` c
14051+ // Variant for _u8[_s16_x2] is also available.
14052+ svuint16_t svqshrun[_n]_u16[_s32_x2](svint32x2_t zn, uint64_t imm);
14053+ ```
14054+
14055+ #### SUBP
14056+
14057+ Subtract pairwise.
14058+
14059+ ``` c
14060+ // Variants are also available for _s16, _s32 and _s64
14061+ svint8_t svsubp[_s8](svbool_t pg, svint8_t zn, svint8_t zm);
14062+ ```
14063+
1386014064### SME2 maximum and minimum absolute value
1386114065
1386214066The intrinsics in this section are defined by the header file
@@ -14581,6 +14785,36 @@ non-overloaded names to indicate which vector argument is a vector register pair
1458114785 __arm_streaming __arm_inout("za");
1458214786```
1458314787
14788+ ### SME2.3 lookup table
14789+
14790+ The intrinsics in this section are defined by the header file
14791+ [`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p3` is defined to 1.
14792+
14793+ #### LUTI6
14794+
14795+ Lookup table read with 6-bit indices (16-bit)
14796+
14797+ Use of this intrinsic if `svcntb() * 8 < 512` results in undefined behaviour.
14798+
14799+ ```c
14800+ // Variant are also available for: _u16, _f16 and _bf16
14801+ svint16x4_t svluti6_lane_s16_x4[_s16_x2](svint16x2_t table, svuint8x2_t indices, uint64_t imm_idx);
14802+ ```
14803+
14804+ Lookup table read with 6-bit indices (four registers, 8-bit)
14805+
14806+ ``` c
14807+ // Variants are also available for: _u8 _mf8
14808+ svint8x4_t svluti6_zt_s8_x4(uint64_t zt0, svuint8x3_t zn) __arm_streaming __arm_in("zt0");
14809+ ```
14810+
14811+ Lookup table read with 6-bit indices (table, single, 8-bit)
14812+
14813+ ``` c
14814+ // Variants are also available for: _u8 _mf8
14815+ svint8_t svluti6_zt_s8(uint64_t zt0, svuint8_t zn) __arm_streaming __arm_in("zt0");
14816+ ```
14817+
1458414818# M-profile Vector Extension (MVE) intrinsics
1458514819
1458614820The M-profile Vector Extension (MVE) [[MVE-spec]](#MVE-spec) instructions provide packed Single
0 commit comments