Skip to content

Commit 7425ff8

Browse files
bashbaugalycm
andauthored
publish cl_khr_work_group_uniform_arithmetic (#786)
* publish cl_khr_work_group_uniform_arithmetic * Apply suggestions from code review fix the names of the SPIR-V instructions used by this extension fixed typo Co-authored-by: Alastair Murray <alastair.murray@codeplay.com> Co-authored-by: Alastair Murray <alastair.murray@codeplay.com>
1 parent 8803f20 commit 7425ff8

4 files changed

Lines changed: 264 additions & 0 deletions

File tree

OpenCL_Ext.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ include::ext/cl_khr_command_buffer.asciidoc[]
104104
include::ext/cl_khr_expect_assume.asciidoc[]
105105

106106
include::ext/cl_khr_subgroup_rotate.asciidoc[]
107+
include::ext/cl_khr_work_group_uniform_arithmetic.asciidoc[]
107108

108109
// NOTE: To keep meaningful section numbers, new
109110
// extension documents should be added above here!

env/extensions.asciidoc

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,26 @@ declare the following SPIR-V capabilities:
355355

356356
* *GroupNonUniformRotateKHR*
357357

358+
==== `cl_khr_work_group_uniform_arithmetic`
359+
360+
If the OpenCL environment supports the extension `cl_khr_work_group_uniform_arithmetic`, then the environment must accept modules that declare use of the extension `SPV_KHR_uniform_group_instructions` via *OpExtension*.
361+
362+
If the OpenCL environment supports the extension `cl_khr_work_group_uniform_arithmetic` and use of the SPIR-V extension `SPV_KHR_uniform_group_instructions` is declared in the module via *OpExtension*, then the environment must accept modules that declare the following SPIR-V capabilities:
363+
364+
* *GroupUniformArithmeticKHR*
365+
366+
For instructions requiring these capabilities, _Scope_ for _Execution_ may be:
367+
368+
* *Workgroup*
369+
370+
For the instructions *OpGroupLogicalAndKHR*, *OpGroupLogicalOrKHR*, and *OpGroupLogicalXorKHR*, the valid type for _X_ is *OpTypeBool*.
371+
372+
Otherwise, for the *GroupUniformArithmeticKHR* scan and reduction instructions, valid types for _X_ are:
373+
374+
* Scalars of supported types:
375+
** *OpTypeInt* with _Width_ equal to `32` or `64` (equivalent to `int`, `uint`, `long`, and `ulong`)
376+
** *OpTypeFloat* (equivalent to `half`, `float`, and `double`)
377+
358378
=== Embedded Profile Extensions
359379

360380
==== `cles_khr_int64`
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
// Copyright 2022 The Khronos Group. This work is licensed under a
2+
// Creative Commons Attribution 4.0 International License; see
3+
// http://creativecommons.org/licenses/by/4.0/
4+
5+
[[cl_khr_work_group_uniform_arithmetic]]
6+
== Work Group Uniform Arithmetic
7+
8+
This extension adds additional work-group collective functions to OpenCL C.
9+
Specifically, this extension adds support for work-group scans and reductions for the following operators:
10+
11+
* Logical operations (`and`, `or`, and `xor`).
12+
* Bitwise operations (`and`, `or`, and `xor`).
13+
* Integer multiplication (`mul`).
14+
* Floating-point multiplication (`mul`).
15+
16+
=== General Information
17+
18+
==== Name Strings
19+
20+
`cl_khr_work_group_uniform_arithmetic`
21+
22+
==== Version History
23+
24+
[cols="1,1,3",options="header",]
25+
|====
26+
| *Date* | *Version* | *Description*
27+
| 2022-04-29 | 1.0.0 | Initial version.
28+
|====
29+
30+
==== Dependencies
31+
32+
This extension is written against the OpenCL Specification
33+
Version 3.0.10.
34+
35+
This extension requires OpenCL 2.0.
36+
37+
==== Contributors
38+
39+
Kevin Petit, Arm Ltd. +
40+
Ben Ashbaugh, Intel +
41+
42+
=== New OpenCL C Functions
43+
44+
The following functions are added to OpenCL C.
45+
46+
[source,opencl_c]
47+
----
48+
int work_group_reduce_logical_and(int predicate);
49+
int work_group_reduce_logical_or(int predicate);
50+
int work_group_reduce_logical_xor(int predicate);
51+
52+
int work_group_scan_inclusive_logical_and(int predicate);
53+
int work_group_scan_inclusive_logical_or(int predicate);
54+
int work_group_scan_inclusive_logical_xor(int predicate);
55+
56+
int work_group_scan_exclusive_logical_and(int predicate);
57+
int work_group_scan_exclusive_logical_or(int predicate);
58+
int work_group_scan_exclusive_logical_xor(int predicate);
59+
----
60+
61+
For the following functions, the generic type name `gentype` may be one of the supported built-in scalar data types `int`, `uint`, `long`, or `ulong`.
62+
63+
[source,opencl_c]
64+
----
65+
gentype work_group_reduce_and(gentype value);
66+
gentype work_group_reduce_or(gentype value);
67+
gentype work_group_reduce_xor(gentype value);
68+
69+
gentype work_group_scan_inclusive_and(gentype value);
70+
gentype work_group_scan_inclusive_or(gentype value);
71+
gentype work_group_scan_inclusive_xor(gentype value);
72+
73+
gentype work_group_scan_exclusive_and(gentype value);
74+
gentype work_group_scan_exclusive_or(gentype value);
75+
gentype work_group_scan_exclusive_xor(gentype value);
76+
----
77+
78+
For the following functions, the generic type name `gentype` may be one of the supported built-in scalar data types `int`, `uint`, `long`, `ulong`, `float`, `double` (if double precision is supported), or `half` (if half precision is supported).
79+
80+
[source,opencl_c]
81+
----
82+
gentype work_group_reduce_mul(gentype value);
83+
gentype work_group_scan_inclusive_mul(gentype value);
84+
gentype work_group_scan_exclusive_mul(gentype value);
85+
----
86+
87+
=== Modifications to the OpenCL C Specification
88+
89+
(Add to Section 6.15.16, *Work-group Collective Functions*) ::
90+
+
91+
--
92+
The table below describes the OpenCL C programming language built-in functions that perform
93+
logical arithmetic operations across work items in a work-group. These functions must be
94+
encountered by all work items in a work-group executing the kernel, otherwise the behavior is
95+
undefined. For these functions, a non-zero _predicate_ argument or return value is logically
96+
`true` and a zero _predicate_ argument or return value is logically `false`.
97+
98+
[cols="2a,1",options="header"]
99+
|====
100+
| Function
101+
| Description
102+
|[source,opencl_c]
103+
----
104+
int work_group_reduce_logical_and(int predicate);
105+
int work_group_reduce_logical_or(int predicate);
106+
int work_group_reduce_logical_xor(int predicate);
107+
----
108+
| Returns the logical *and*, *or*, or *xor* of _predicate_ for all work items in the work-group.
109+
110+
|[source,opencl_c]
111+
----
112+
int work_group_scan_inclusive_logical_and(int predicate);
113+
int work_group_scan_inclusive_logical_or(int predicate);
114+
int work_group_scan_inclusive_logical_xor(int predicate);
115+
----
116+
| Returns the result of an inclusive scan operation, which is the logical
117+
*and*, *or*, or *xor* of _predicate_ for all work items in the work-group with
118+
a work-group linear local ID less than or equal to this work item’s work-group
119+
linear local ID.
120+
121+
|[source,c]
122+
----
123+
int work_group_scan_exclusive_logical_and(int predicate);
124+
int work_group_scan_exclusive_logical_or(int predicate);
125+
int work_group_scan_exclusive_logical_xor(int predicate);
126+
----
127+
| Returns the result of an exclusive scan operation, which is the logical
128+
*and*, *or*, or *xor* of _predicate_ for all work items in the work-group with
129+
a work-group linear local ID less than this work item’s work-group linear
130+
local ID.
131+
132+
If there is no work item in the work-group with a work-group linear local ID
133+
less than this work item’s work-group linear local ID then an identity value
134+
`I` is returned. For *and*, the identity value is `true` (non-zero). For *or*
135+
and *xor*, the identity value is `false` (zero).
136+
137+
|====
138+
139+
The table below describes the OpenCL C programming language built-in functions
140+
that perform bitwise integer operations across work items in a work-group. These
141+
functions must be encountered by all work items in a work-group executing the
142+
kernel, otherwise the behavior is undefined. For the functions below, the
143+
generic type name `gentype` may be one of the supported built-in scalar data
144+
types `int`, `uint`, `long`, and `ulong`.
145+
146+
[cols="2a,1",options="header"]
147+
|====
148+
| Function
149+
| Description
150+
151+
|[source,opencl_c]
152+
----
153+
gentype work_group_reduce_and(gentype value);
154+
gentype work_group_reduce_or(gentype value);
155+
gentype work_group_reduce_xor(gentype value);
156+
----
157+
| Returns the bitwise *and*, *or*, or *xor* of _value_ for all work items in the work-group.
158+
159+
|[source,opencl_c]
160+
----
161+
gentype work_group_scan_inclusive_and(gentype value);
162+
gentype work_group_scan_inclusive_or(gentype value);
163+
gentype work_group_scan_inclusive_xor(gentype value);
164+
----
165+
| Returns the result of an inclusive scan operation, which is the bitwise *and*,
166+
*or*, or *xor* of _value_ for all work items in the work-group with a
167+
work-group linear local ID less than or equal to this work item’s work-group
168+
linear local ID.
169+
170+
|[source,opencl_c]
171+
----
172+
gentype work_group_scan_exclusive_and(gentype value);
173+
gentype work_group_scan_exclusive_or(gentype value);
174+
gentype work_group_scan_exclusive_xor(gentype value);
175+
----
176+
| Returns the result of an exclusive scan operation, which is the bitwise *and*,
177+
*or*, or *xor* of _value_ for all work items in the work-group with a
178+
work-group linear local ID less than this work item’s work-group linear local
179+
ID.
180+
181+
If there is no work item in the work-group with a work-group linear local ID less than
182+
this work item’s work-group linear local ID then an identity value `I` is returned.
183+
For *and*, the identity value is `~0` (all bits set). For *or* and *xor*, the identity
184+
value is `0`.
185+
186+
|====
187+
188+
The table below describes the OpenCL C programming language built-in functions
189+
that perform multiplicative operations across work items in a work-group. These
190+
functions must be encountered by all work items in a work-group executing the
191+
kernel, otherwise the behavior is undefined. For the functions below, the
192+
generic type name `gentype` may be one of the supported built-in scalar data
193+
types `int`, `uint`, `long`, `ulong`, `float`, `double` (if double precision is
194+
supported), or `half` (if half precision is supported).
195+
196+
[cols="2a,1",options="header"]
197+
|====
198+
| Function
199+
| Description
200+
201+
|[source,opencl_c]
202+
----
203+
gentype work_group_reduce_mul(gentype value);
204+
----
205+
| Returns the multiplication of _value_ for all work items in the work-group.
206+
207+
|[source,opencl_c]
208+
----
209+
gentype work_group_scan_inclusive_mul(gentype value);
210+
----
211+
| Returns the result of an inclusive scan operation which is the multiplication
212+
of _value_ for all work items in the work-group with a work-group linear local
213+
ID less than or equal to this work item’s work-group linear local ID.
214+
215+
|[source,opencl_c]
216+
----
217+
gentype work_group_scan_exclusive_mul(gentype value);
218+
----
219+
| Returns the result of an exclusive scan operation which is the multiplication
220+
of _value_ for all work items in the work-group with a work-group linear local
221+
ID less than this work item’s work-group linear local ID.
222+
223+
If there is no work item in the work-group with a work-group linear local ID
224+
less than this work item’s work-group linear local ID then the identity value
225+
`1` is returned.
226+
227+
|====
228+
--
229+
230+
=== Issues
231+
232+
. For these built-in functions, do we only want to support the types supported by the existing work-group collective functions, or do we want to support the types supported by the sub-group collective functions?
233+
+
234+
--
235+
`RESOLVED`: The extension will require the same types as the existing work-group collective functions.
236+
237+
The difference are the 8-bit and 16-bit types: `char`, `uchar`, `short`, and `ushort`. Note that `half` is already supported, if half-precision is supported.
238+
--
239+

ext/quick_reference.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,4 +273,8 @@
273273
| Create Command Queues with Different Throttle Policies
274274
| Extension
275275

276+
| <<cl_khr_work_group_uniform_arithmetic,cl_khr_work_group_uniform_arithmetic>>
277+
| Work Group Uniform Arithmetic
278+
| Extension
279+
276280
|====

0 commit comments

Comments
 (0)