Lightweight C++ implementations of SIMD-optimized microkernels for ML primitives, with Python bindings, benchmark automation, and optional OpenMP support.
python machine-learning cpp simd high-performance-computing performance-optimization microkernels ml-primitives
-
Updated
Jun 9, 2026 - Makefile