Olajide-Badejo

Olajide Badejo Olajide-Badejo

M.Sc. Computational Engineering @ RUB CUDA • C++ • GPU Performance • HPC • Python

Pinned Loading

GPU-Based-Matrix-Operations GPU-Based-Matrix-Operations Public

CUDA/C++ matrix-vector and matrix-matrix kernels with naive, shared-memory tiled, and warp-coalesced variants, plus GFLOPS benchmarking vs CPU.

Cuda 1
GPU-Physics-Simulation GPU-Physics-Simulation Public

Real-time CUDA physics engine for N-body gravity, SPH fluids, and rigid-body collisions. Uses shared-memory tiling, kernel fusion, and spatial hashing on RTX 4080/4090.

Cuda
ARM-Neon-Conv3x3 ARM-Neon-Conv3x3 Public

ARMv8-A NEON 3×3 convolution in C++17. Scalar, NEON naive, and cache/register-blocked variants with runtime dispatch and perf/PMU analysis.

C++
GPU-Training-Bench GPU-Training-Bench Public

PyTorch training throughput benchmark for NVIDIA GPUs. Sweeps batch size, precision, and DataLoader workers, profiles step phases, collects NVML telemetry, and generates JSON + HTML reports.

Python
CUDA-Matrix-Library CUDA-Matrix-Library Public

CUDA matrix library for GEMM, GEMV, TRSM with naive, tiled, register-blocked, and tensor-core kernels. Includes FP16/BF16 mixed precision, sparse ops, cuSOLVER wrappers, and Python bindings.

C++
Metal-msl-microbenchmark Metal-msl-microbenchmark Public

CUDA bandwidth tests ported to Metal MSL. Measures M1 GPU DRAM, threadgroup SRAM, and texture-cache bandwidth with Apple Instruments traces.

Objective-C++