CUDA Matmul Experiments

This repository collects CUDA matrix multiplication experiments, from CUDA-core SGEMM kernels to WMMA / TF32 tensor-core kernels.

The main goal is to keep each optimization step measurable: every kernel variant should have a clear hypothesis, benchmark result, and profiling note when useful.

Directory Structure

Path	Description
`matmul/`	CUDA-core SGEMM optimization experiments. Includes naive, tiled, register-tiled, vectorized, double-buffered, and `cp.async` versions.
`matmul/README.md`	Detailed CUDA-core benchmark notes, optimization log, summary table, and shape experiments.
`matmul_tensor/`	Tensor-core matmul experiment using WMMA with TF32 tensor core compute.
`matmul_tensor/README.md`	Tensor-core benchmark notes and current profiling summary.

Projects

CUDA-Core SGEMM

See matmul/.

This project tracks a step-by-step CUDA-core SGEMM optimization path:

global-memory baseline
shared-memory tiling
per-thread micro tiling
shared-memory layout experiments
column-major layout
warp tiling and lane mapping
vectorized memory access
double buffering
cp.async prefetching
tall/wide shape experiments

Tensor-Core SGEMM

See matmul_tensor/.

This project currently has WMMA kernel:

float input matrices
TF32 tensor core computation
FP32 accumulation
column-major matrix layout
128x128x32 CTA tile
16x16x8 WMMA fragments

Typical Workflow

For each experiment:

Implement or adjust a kernel variant.
Run correctness checks.
Compare against cuBLAS when relevant.
Profile with Nsight Compute or Nsight Systems.
Record the result and takeaway in the project README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Matmul Experiments

Directory Structure

Projects

CUDA-Core SGEMM

Tensor-Core SGEMM

Typical Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
matmul		matmul
matmul_tensor		matmul_tensor
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CUDA Matmul Experiments

Directory Structure

Projects

CUDA-Core SGEMM

Tensor-Core SGEMM

Typical Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages