Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.
performance-engineering deep-learning async cuda pytorch gpu-optimization nvtx ml-systems nsight-systems automomous-driving
-
Updated
Feb 12, 2026 - Python