You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
type conversion is awkward (e.g., `__nv_cvt_halfraw2_to_fp8x2` converts float16 to float8),
22
22
and some functionality is missing (e.g., one cannot convert a `__half` to `__nv_bfloat16`).
23
23
24
-
_Kernel Float_ resolves this by offering a single data type `kernel_float::vec<T, N>`
25
-
that stores `N` elements of type `T`.
26
-
Internally, the data is stored using the most optimal type available, for example, `vec<half, 2>` stores a `__half2` and `vec<fp8_e5m2, 4>` uses a `__nv_fp8x4_e5m2`.
24
+
_Kernel Float_ resolves this by offering a single data type `kernel_float::vec<T, N>` that stores `N` elements of type `T`.
25
+
Internally, the data is stored as a fixed-sized array of elements.
27
26
Operator overloading (like `+`, `*`, `&&`) has been implemented such that the most optimal intrinsic for the available types is selected automatically.
28
27
Many mathetical functions (like `log`, `exp`, `sin`) and common operations (such as `sum`, `range`, `for_each`) are also available.
29
28
@@ -36,7 +35,8 @@ In a nutshell, _Kernel Float_ offers the following features:
36
35
37
36
* Single type `vec<T, N>` that unifies all vector types.
38
37
* Operator overloading to simplify programming.
39
-
* Support for half (16 bit) and quarter (8 bit) floating-point precision.
38
+
* Support for half (16 bit) floating-point arithmetic, with a fallback to single precision for unsupported operations.
39
+
* Support for quarter (8 bit) floating-point types.
40
40
* Easy integration as a single header file.
41
41
* Written for C++17.
42
42
* Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).
Kernel Float is a header-only library that makes it easy to work with vector types and low-precision floating-point types, mainly focusing on CUDA kernel code.
5
+
6
+
Installation
7
+
------------
8
+
9
+
The easiest way to use the library is get the single header file from github:
It is conventient to define a namespace alias `kf` to shorten the full name `kernel_float`.
17
+
18
+
19
+
```C++
20
+
#include"kernel_float.h"
21
+
namespacekf = kernel_float;
22
+
```
23
+
24
+
25
+
Example C++ code
26
+
----------------
27
+
28
+
Kernel Float essentially offers a single data-type `kernel_float::vec<T, N>` that stores `N` elements of type `T`.
29
+
This type can be initialized normally using list-initialization (e.g., `{a, b, c}`) and elements can be accessed using the `[]` operator.
30
+
Operation overload is available to perform binary operations (such as `+`, `*`, and `&`), where the optimal intrinsic for the available types is selected automatically.
31
+
32
+
Many mathetical functions (like `log`, `sin`, `cos`) are also available, see the [API reference](../api) for the full list of functions.
33
+
In some cases, certain operations might not be natively supported by the platform for the some floating-point type.
34
+
In these cases, Kernel Float falls back to performing the operations in 32 bit precision.
35
+
36
+
The code below shows a very simple example of how to use Kernel Float:
37
+
38
+
```C++
39
+
#include "kernel_float.h"
40
+
namespace kf = kernel_float;
41
+
42
+
int main() {
43
+
using Type = float;
44
+
const int N = 8;
45
+
46
+
kf::vec<int, N> i = kf::range<int, N>();
47
+
kf::vec<Type, N> x = kf::cast<Type>(i);
48
+
kf::vec<Type, N> y = x * kf::sin(x);
49
+
Type result = kf::sum(y);
50
+
printf("result=%f", double(result));
51
+
52
+
return EXIT_SUCCESS;
53
+
}
54
+
```
55
+
56
+
Notice how easy it would be to change the floating-point type `Type` or the vector length `N` without affecting the rest of the code.
When working with Kernel Float, you'll find that you need to prefix every function and type with the `kernel_float::...` prefix.
5
+
This can be a bit cumbersome.
6
+
It's strongly discouraged not to dump the entire `kernel_float` namespace into the global namespace (with `using namespace kernel_float`) since
7
+
many symbols in Kernel Float may clash with global symbols, causing conflicts and issues.
8
+
9
+
To work around this, the library provides a handy `kernel_float::prelude` namespace. This namespace contains a variety of useful type and function aliases that won't conflict with global symbols.
10
+
11
+
To make use of it, use the following code:
12
+
13
+
14
+
```C++
15
+
#include"kernel_float.h"
16
+
usingnamespacekernel_float::prelude;
17
+
18
+
// You can now use aliases like `kf`, `kvec`, `kint`, etc.
19
+
```
20
+
21
+
The prelude defines many aliases, include the following:
For operations that involve two input arguments (or more), ``kernel_float`` will first convert the inputs into a common type before applying the operation.
5
+
For example, when adding ``vec<int, N>`` to a ``vec<float, N>``, both arguments must first be converted into a ``vec<float, N>``.
6
+
7
+
This procedure is called "type promotion" and is implemented as follows.
8
+
First, all arguments are converted into a vector by calling ``into_vec``.
9
+
Next, all arguments must have length ``N`` or length ``1`` and vectors of length ``1`` are resized to become length ``N``.
10
+
Finally, the vector element types are promoted into a common type.
11
+
12
+
The rules for element type promotion in ``kernel_float`` are slightly different than in regular C++.
13
+
In short, for two element types ``T`` and ``U``, the promotion rules can be summarized as follows:
14
+
15
+
* If one of the types is ``bool``, the result is the other type.
16
+
* If one type is a floating-point type and the other is a signed or unsigned integer, the result is the floating-point type.
17
+
* If both types are floating-point types, the result is the largest of the two types. An exception here is combining ``half`` and ``bfloat16``, which results in ``float``.
18
+
* If both types are integer types of the same signedness, the result is the largest of the two types.
19
+
* Combining a signed integer and unsigned integer type is not allowed.
20
+
21
+
Overview
22
+
--------
23
+
24
+
The type promotion rules are shown in the table below.
25
+
The labels are as follows:
26
+
27
+
* ``b``: boolean
28
+
* ``iN``: signed integer of ``N`` bits (e.g., ``int``, ``long``)
0 commit comments