22:data-uri:
33:icons: font
44include::../config/attribs.txt[]
5+ include::{generated}/api/api-dictionary-no-links.asciidoc[]
56:source-highlighter: coderay
67
78= cl_arm_scheduling_controls
@@ -16,7 +17,8 @@ Kevin Petit (kevin.petit 'at' arm.com)
1617
1718== Contributors
1819
19- Kevin Petit, Arm Ltd.
20+ Kevin Petit, Arm Ltd. +
21+ Radek Szymanski, Arm Ltd. +
2022
2123== Notice
2224
@@ -29,7 +31,7 @@ Shipping
2931== Version
3032
3133Built On: {docdate} +
32- Version: 0.3 .0
34+ Version: 0.5 .0
3335
3436== Dependencies
3537
@@ -55,16 +57,22 @@ CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM (1 << 1)
5557CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM (1 << 2)
5658CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM (1 << 3)
5759CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM (1 << 4)
60+ CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM (1 << 5)
61+ CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM (1 << 6)
5862
5963CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM 0x41EB
64+
65+ CL_DEVICE_MAX_WARP_COUNT_ARM 0x41EA
6066----
6167
6268Accepted value for the _param_name_ parameter to *clSetKernelExecInfo*:
6369
6470[source,c]
6571----
66- CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
67- CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
72+ CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
73+ CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
74+ CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM 0x41E8
75+ CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM 0x41F1
6876----
6977
7078Accepted value for the _properties_ parameter to *clCreateCommandQueueWithProperties*:
@@ -75,6 +83,13 @@ CL_QUEUE_KERNEL_BATCHING_ARM 0x41E7
7583CL_QUEUE_DEFERRED_FLUSH_ARM 0x41EC
7684----
7785
86+ Accepted value for the _param_name_ parameter to *clGetKernelInfo*:
87+
88+ [source,c]
89+ ----
90+ CL_KERNEL_MAX_WARP_COUNT_ARM 0x41E9
91+ ----
92+
7893== New build options
7994
8095This extension adds a build option to control the number of registers allocated
@@ -121,11 +136,28 @@ NOTE: Missing before version 0.3.0
121136- `CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM` is set when the device compiler
122137 supports the `-fregister-allocation` option.
123138
139+ - {CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM} is set when the device supports
140+ {CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM}, {CL_KERNEL_MAX_WARP_COUNT_ARM}
141+ and {CL_DEVICE_MAX_WARP_COUNT_ARM}. +
142+ Missing before version 0.4.
143+
144+ - {CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM} is set when the
145+ device supports {CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM}. +
146+ Missing before version 0.5.
147+
124148| `CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM`
125149| `cl_int[]`
126150| Returns an array of valid register allocations for this device. Each of the
127151 returned values can be passed to the `-fregister-allocation` build option. +
128152 Missing before version 0.3.
153+
154+ | {CL_DEVICE_MAX_WARP_COUNT_ARM}
155+ | {cl_uint_TYPE}
156+ | Returns the maximum number of warps per compute unit a kernel may use. The
157+ value returned is an upper bound for any possible kernel. When
158+ {CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM} is not set, the call to
159+ {clGetDeviceInfo} returns {CL_INVALID_VALUE}. +
160+ Missing before version 0.4.
129161|====
130162
131163--
@@ -225,9 +257,44 @@ or multiply (positive value) the batch size by 2. If the requested modification
225257is not possible due to hardware constraints, the greatest possible modification
226258will be used.
227259
260+ | {CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM}
261+ | {cl_uint_TYPE}
262+ | Limit the number of warps allowed to run in each compute unit for this kernel. +
263+ The value passed must be greater than 0 and smaller than or equal to
264+ {CL_KERNEL_MAX_WARP_COUNT_ARM}, otherwise the call to {clSetKernelExecInfo}
265+ will fail and return {CL_INVALID_VALUE}. +
266+ Missing before version 0.4.
267+
268+ | {CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM}
269+ | {cl_uint_TYPE}
270+ | Limit the number of workgroup batches each compute unit can have in its queue. +
271+ Acceptable values depend on the device. If a value not supported by the device
272+ is passed, the call to {clSetKernelExecInfo} will fail and return {CL_INVALID_VALUE}.
273+ All devices that support this functionality accept `0` to let the implementation
274+ select the max number of queued workgroup batches. +
275+ Missing before version 0.5.
228276|====
229277
230278--
279+
280+ (Add the following to Table 32, _List of supported param_names by *clGetKernelInfo*_) ::
281+ +
282+ --
283+
284+ [cols="1,1,4",options="header"]
285+ |====
286+ | Kernel Info
287+ | Return type
288+ | Description
289+
290+ | {CL_KERNEL_MAX_WARP_COUNT_ARM}
291+ | {cl_uint_TYPE}
292+ | Returns the maximum number of warps this kernel can use per compute unit. +
293+ Missing before version 0.4.
294+ |====
295+
296+ --
297+
231298--
232299
233300== Interactions with Other Extensions
@@ -251,6 +318,8 @@ None.
251318[options="header"]
252319|====
253320| Version | Date | Author | Changes
321+ | 0.5.0 | 2022-06-29 | Kévin Petit | Add support for compute unit batch size queue control
322+ | 0.4.0 | 2022-02-28 | Kévin Petit | Add support for warp throttling
254323| 0.3.0 | 2021-01-19 | Kévin Petit | Add support for register allocation control
255324| 0.2.0 | 2020-09-14 | Kévin Petit | Add support for deferred queue flush control
256325| 0.1.0 | 2020-08-28 | Kévin Petit | Initial version
0 commit comments