Skip to content

Commit eb4d1fe

Browse files
authored
Update cl_arm_scheduling_controls to v0.5.0 (#823)
* Update cl_arm_scheduling_controls to v0.5.0 - Add support for warp throttling - Add support to control the size of compute unit batch queues Change-Id: Ic246edeac5298856a348dda847c4e9f463b301b5 Signed-off-by: Kevin Petit <kevin.petit@arm.com> * Add missing CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM Signed-off-by: Kevin Petit <kevin.petit@arm.com> Change-Id: I1ae8b6f683f7444d97d3f02a54f80affb53a5b1a Signed-off-by: Kevin Petit <kevin.petit@arm.com>
1 parent 6538e75 commit eb4d1fe

2 files changed

Lines changed: 89 additions & 7 deletions

File tree

extensions/cl_arm_scheduling_controls.asciidoc

Lines changed: 73 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
:data-uri:
33
:icons: font
44
include::../config/attribs.txt[]
5+
include::{generated}/api/api-dictionary-no-links.asciidoc[]
56
:source-highlighter: coderay
67

78
= cl_arm_scheduling_controls
@@ -16,7 +17,8 @@ Kevin Petit (kevin.petit 'at' arm.com)
1617

1718
== Contributors
1819

19-
Kevin Petit, Arm Ltd.
20+
Kevin Petit, Arm Ltd. +
21+
Radek Szymanski, Arm Ltd. +
2022

2123
== Notice
2224

@@ -29,7 +31,7 @@ Shipping
2931
== Version
3032

3133
Built On: {docdate} +
32-
Version: 0.3.0
34+
Version: 0.5.0
3335

3436
== Dependencies
3537

@@ -55,16 +57,22 @@ CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM (1 << 1)
5557
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM (1 << 2)
5658
CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM (1 << 3)
5759
CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM (1 << 4)
60+
CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM (1 << 5)
61+
CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM (1 << 6)
5862
5963
CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM 0x41EB
64+
65+
CL_DEVICE_MAX_WARP_COUNT_ARM 0x41EA
6066
----
6167

6268
Accepted value for the _param_name_ parameter to *clSetKernelExecInfo*:
6369

6470
[source,c]
6571
----
66-
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
67-
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
72+
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
73+
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
74+
CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM 0x41E8
75+
CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM 0x41F1
6876
----
6977

7078
Accepted value for the _properties_ parameter to *clCreateCommandQueueWithProperties*:
@@ -75,6 +83,13 @@ CL_QUEUE_KERNEL_BATCHING_ARM 0x41E7
7583
CL_QUEUE_DEFERRED_FLUSH_ARM 0x41EC
7684
----
7785

86+
Accepted value for the _param_name_ parameter to *clGetKernelInfo*:
87+
88+
[source,c]
89+
----
90+
CL_KERNEL_MAX_WARP_COUNT_ARM 0x41E9
91+
----
92+
7893
== New build options
7994

8095
This extension adds a build option to control the number of registers allocated
@@ -121,11 +136,28 @@ NOTE: Missing before version 0.3.0
121136
- `CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM` is set when the device compiler
122137
supports the `-fregister-allocation` option.
123138

139+
- {CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM} is set when the device supports
140+
{CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM}, {CL_KERNEL_MAX_WARP_COUNT_ARM}
141+
and {CL_DEVICE_MAX_WARP_COUNT_ARM}. +
142+
Missing before version 0.4.
143+
144+
- {CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM} is set when the
145+
device supports {CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM}. +
146+
Missing before version 0.5.
147+
124148
| `CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM`
125149
| `cl_int[]`
126150
| Returns an array of valid register allocations for this device. Each of the
127151
returned values can be passed to the `-fregister-allocation` build option. +
128152
Missing before version 0.3.
153+
154+
| {CL_DEVICE_MAX_WARP_COUNT_ARM}
155+
| {cl_uint_TYPE}
156+
| Returns the maximum number of warps per compute unit a kernel may use. The
157+
value returned is an upper bound for any possible kernel. When
158+
{CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM} is not set, the call to
159+
{clGetDeviceInfo} returns {CL_INVALID_VALUE}. +
160+
Missing before version 0.4.
129161
|====
130162

131163
--
@@ -225,9 +257,44 @@ or multiply (positive value) the batch size by 2. If the requested modification
225257
is not possible due to hardware constraints, the greatest possible modification
226258
will be used.
227259

260+
| {CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM}
261+
| {cl_uint_TYPE}
262+
| Limit the number of warps allowed to run in each compute unit for this kernel. +
263+
The value passed must be greater than 0 and smaller than or equal to
264+
{CL_KERNEL_MAX_WARP_COUNT_ARM}, otherwise the call to {clSetKernelExecInfo}
265+
will fail and return {CL_INVALID_VALUE}. +
266+
Missing before version 0.4.
267+
268+
| {CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM}
269+
| {cl_uint_TYPE}
270+
| Limit the number of workgroup batches each compute unit can have in its queue. +
271+
Acceptable values depend on the device. If a value not supported by the device
272+
is passed, the call to {clSetKernelExecInfo} will fail and return {CL_INVALID_VALUE}.
273+
All devices that support this functionality accept `0` to let the implementation
274+
select the max number of queued workgroup batches. +
275+
Missing before version 0.5.
228276
|====
229277

230278
--
279+
280+
(Add the following to Table 32, _List of supported param_names by *clGetKernelInfo*_) ::
281+
+
282+
--
283+
284+
[cols="1,1,4",options="header"]
285+
|====
286+
| Kernel Info
287+
| Return type
288+
| Description
289+
290+
| {CL_KERNEL_MAX_WARP_COUNT_ARM}
291+
| {cl_uint_TYPE}
292+
| Returns the maximum number of warps this kernel can use per compute unit. +
293+
Missing before version 0.4.
294+
|====
295+
296+
--
297+
231298
--
232299

233300
== Interactions with Other Extensions
@@ -251,6 +318,8 @@ None.
251318
[options="header"]
252319
|====
253320
| Version | Date | Author | Changes
321+
| 0.5.0 | 2022-06-29 | Kévin Petit | Add support for compute unit batch size queue control
322+
| 0.4.0 | 2022-02-28 | Kévin Petit | Add support for warp throttling
254323
| 0.3.0 | 2021-01-19 | Kévin Petit | Add support for register allocation control
255324
| 0.2.0 | 2020-09-14 | Kévin Petit | Add support for deferred queue flush control
256325
| 0.1.0 | 2020-08-28 | Kévin Petit | Initial version

xml/cl.xml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1232,7 +1232,9 @@ server's OpenCL/api-docs repository.
12321232
<enum bitpos="2" name="CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM"/>
12331233
<enum bitpos="3" name="CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM"/>
12341234
<enum bitpos="4" name="CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM"/>
1235-
<unused start="5" end="63"/>
1235+
<enum bitpos="5" name="CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM"/>
1236+
<enum bitpos="6" name="CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM"/>
1237+
<unused start="7" end="63"/>
12361238
</enums>
12371239

12381240
<enums name="cl_command_termination_reason_arm" vendor="Arm">
@@ -2151,14 +2153,17 @@ server's OpenCL/api-docs repository.
21512153
<enum value="0x41E5" name="CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM"/>
21522154
<enum value="0x41E6" name="CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM"/>
21532155
<enum value="0x41E7" name="CL_QUEUE_KERNEL_BATCHING_ARM"/>
2154-
<unused start="0x41E8" end="0x41EA"/>
2156+
<enum value="0x41E8" name="CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM"/>
2157+
<enum value="0x41E9" name="CL_KERNEL_MAX_WARP_COUNT_ARM"/>
2158+
<enum value="0x41EA" name="CL_DEVICE_MAX_WARP_COUNT_ARM"/>
21552159
<enum value="0x41EB" name="CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM"/>
21562160
<enum value="0x41EC" name="CL_QUEUE_DEFERRED_FLUSH_ARM"/>
21572161
<enum value="0x41ED" name="CL_EVENT_COMMAND_TERMINATION_REASON_ARM"/>
21582162
<enum value="0x41EE" name="CL_DEVICE_CONTROLLED_TERMINATION_CAPABILITIES_ARM"/>
21592163
<enum value="0x41EF" name="CL_IMPORT_ANDROID_HARDWARE_BUFFER_PLANE_INDEX_ARM"/>
21602164
<enum value="0x41F0" name="CL_IMPORT_ANDROID_HARDWARE_BUFFER_LAYER_INDEX_ARM"/>
2161-
<unused start="0x41F1" end="0x41FF"/>
2165+
<enum value="0x41F1" name="CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM"/>
2166+
<unused start="0x41F2" end="0x41FF"/>
21622167
</enums>
21632168

21642169
<enums start="0x4200" end="0x420F" name="enums.4200" vendor="Intel">
@@ -6568,14 +6573,22 @@ server's OpenCL/api-docs repository.
65686573
<enum name="CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM"/>
65696574
<enum name="CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM"/>
65706575
<enum name="CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM"/>
6576+
<enum name="CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM"/>
6577+
<enum name="CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM"/>
65716578
</require>
65726579
<require comment="cl_device_info">
65736580
<enum name="CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM"/>
65746581
<enum name="CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM"/>
6582+
<enum name="CL_DEVICE_MAX_WARP_COUNT_ARM"/>
65756583
</require>
65766584
<require comment="cl_kernel_exec_info">
65776585
<enum name="CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM"/>
65786586
<enum name="CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM"/>
6587+
<enum name="CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM"/>
6588+
<enum name="CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM"/>
6589+
</require>
6590+
<require comment="cl_kernel_info">
6591+
<enum name="CL_KERNEL_MAX_WARP_COUNT_ARM"/>
65796592
</require>
65806593
<require comment="cl_queue_properties">
65816594
<enum name="CL_QUEUE_KERNEL_BATCHING_ARM"/>

0 commit comments

Comments
 (0)