Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
548a9f3
Merge remote-tracking branch 'origin/develop' into HEAD
ChipKerchner Mar 11, 2026
376d3a1
Fast performing edges for FP32 GEMM of RVV.
ChipKerchner Mar 12, 2026
6d6af1d
Add bool types for C.
ChipKerchner Mar 12, 2026
9c16449
Add K-unrolling to M = 8. Other small changes.
ChipKerchner Mar 13, 2026
fda433f
Unroll K for N less than or equal to 4.
ChipKerchner Mar 13, 2026
eb9bbcc
Common unroll code.
ChipKerchner Mar 14, 2026
b0ee407
Preserve K.
ChipKerchner Mar 14, 2026
010f24f
Better K.
ChipKerchner Mar 16, 2026
f927b94
Global optimizations.
ChipKerchner Mar 16, 2026
79d9fe3
Use mf2 instead of m1.
ChipKerchner Mar 17, 2026
477dd40
Simplier loops.
ChipKerchner Mar 17, 2026
d832ee5
More global optimzation and clean up.
ChipKerchner Mar 18, 2026
1e48686
Merge remote-tracking branch 'origin/develop' into fasterRVVEdges
ChipKerchner Mar 19, 2026
a8a00bb
Avoid greater than 4 segment load and store penalties by using 2. Fi…
ChipKerchner Mar 19, 2026
1bb72b2
Only initialize unused variables to prevent GCC warnings.
ChipKerchner Mar 20, 2026
ebf4cd1
Fix typo.
ChipKerchner Mar 22, 2026
8fc0004
Fix another typo.
ChipKerchner Mar 24, 2026
d69be17
Convert 2X LMUL1 instructions to 1X LMUL2. Improved FP64 GEMM edges …
ChipKerchner Mar 30, 2026
daa3215
Remove shadow variable.
ChipKerchner Mar 31, 2026
3b1aef1
Use LMUL2 loads in main block.
ChipKerchner Apr 2, 2026
22b7950
Use LMUL2 for calculations in main block - just break them apart befo…
ChipKerchner Apr 2, 2026
cc1b579
Reduce number of vectors in use from 32 to 24 for last stage of main …
ChipKerchner Apr 2, 2026
0a4d6b2
Forgot files from previous check-in.
ChipKerchner Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading