Merge pull request #207 from omlins/ka2doc

omlins · web-flow · commit c07b5dbda0c8 · 2026-03-02T15:10:20.000+01:00
Update README.md with new constants and example data allocation for S…
diff --git a/README.md b/README.md
@@ -265,8 +265,8 @@ The KernelAbstractions backend keeps the familiar parse-time `@init_parallel_ste
 import CUDA                                                             # 1 Import backends to be used by the KernelAbstractions backend
 using ParallelStencil
 @init_parallel_stencil(package=KernelAbstractions, numbertype=Float32)  # 2 Initialize KernelAbstractions backend at parse time
-const N = 1024
-const α = 2.5
+const N = 2;
+const α = 1.5;
 
 # --- Kernel definition -------------------------------------------------
 @parallel_indices (i) function saxpy!(Y, α, X)                          # 3 Define a single time a hardware-agnostic SAXPY kernel
@@ -276,16 +276,18 @@ end
 
 # --- First run on default runtime hardware (CPU) -----------------------
 println("Current runtime hardware target: ", @current_hardware())       # 4 Query current (default) runtime hardware target
-X = @rand(N)                                                            # 5 Allocate data on the current target
-Y = @rand(N)                                                            # 5 Allocate data on the current target
+X = @fill(3, N)                                                         # 5 Allocate data on the current target
+Y = @ones(N)                                                            # ...
 @parallel saxpy!(Y, α, X)                                               # 6 Launch kernel on the current target
+Y                                                                       # 7 Observe correct results
 
 # --- Reselect runtime hardware to CUDA-capable GPU and run again -------
-@select_hardware(:gpu_cuda)                                             # 7 Switch runtime hardware target to CUDA-capable GPU
-println("Current runtime hardware target: ", @current_hardware())       # 8 Confirm the CUDA-capable GPU runtime hardware target
-X = @rand(N)                                                            # 9 Allocate data on the new target
-Y = @rand(N)                                                            # 9 Allocate data on the new target
-@parallel saxpy!(Y, α, X)                                               # 10 Launch kernel on the new target without redefining anything
+@select_hardware(:gpu_cuda)                                             # 8 Switch runtime hardware target to CUDA-capable GPU
+println("Current runtime hardware target: ", @current_hardware())       # 9 Confirm the CUDA-capable GPU runtime hardware target
+X = @fill(3, N)                                                         # 10 Allocate data on the new target
+Y = @ones(N)                                                            # ...
+@parallel saxpy!(Y, α, X)                                               # 11 Launch kernel on the new target without redefining anything
+Y                                                                       # 12 Observe correct results
 ```
 Type `?@select_hardware` and `?@current_hardware` in the [Julia REPL] to see what runtime hardware targets are supported and which symbols to use to select them.
 Note that the KernelAbstractions backend comes with a trade-off: the convenience `Data`/`TData` modules for fixed data types and single-architecture backends are not available, as well as the warp-level primitives in `@parallel_indices` kernels (see [Support for architecture-agnostic low level kernel programming](#support-for-architecture-agnostic-low-level-kernel-programming) and the hide communication feature, described in the next section, is implemented to have no effect for KernelAbstractions (but it nevertheless executes correctly).