@@ -7,6 +7,61 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
88## [ Unreleased]
99
10+ ## [ 0.3.2] - 2026-01-20
11+
12+ ### Added
13+
14+ #### GPU Profiling Infrastructure
15+
16+ - ** CUDA Profiling Module** (` ringkernel-cuda/src/profiling/ ` ) - ** NEW MODULE**
17+ - Feature-gated via ` profiling ` feature flag
18+ - Comprehensive GPU profiling capabilities for performance analysis
19+
20+ - ** CUDA Event Wrappers** (` profiling/events.rs ` )
21+ - ` CudaEvent ` - RAII wrapper for CUDA events with timing support
22+ - ` CudaEventFlags ` - Event configuration (blocking sync, disable timing, interprocess)
23+ - ` GpuTimer ` - Start/stop timer using CUDA events with microsecond precision
24+ - ` GpuTimerPool ` - Pool of reusable timers with interior mutability for concurrent access
25+
26+ - ** NVTX Integration** (` profiling/nvtx.rs ` )
27+ - ` CudaNvtxProfiler ` - Real NVTX profiler using cudarc's nvtx module
28+ - Timeline visualization in Nsight Systems and Nsight Compute
29+ - ` NvtxCategory ` - Predefined categories (Kernel, Transfer, Memory, Sync, Queue, User)
30+ - ` NvtxRange ` - RAII wrapper for automatic range end on drop
31+ - ` NvtxPayload ` - Typed payloads for markers (I32, I64, U32, U64, F32, F64)
32+ - Implements ` GpuProfiler ` trait for integration with ringkernel-core
33+
34+ - ** Kernel Metrics** (` profiling/metrics.rs ` )
35+ - ` KernelMetrics ` - Execution metadata (grid/block dims, GPU time, occupancy, registers)
36+ - ` TransferMetrics ` - Memory transfer stats with bandwidth calculation
37+ - ` TransferDirection ` - HostToDevice, DeviceToHost, DeviceToDevice
38+ - ` ProfilingSession ` - Collects kernel and transfer events with timestamps
39+ - ` KernelAttributes ` - Query kernel attributes via cuFuncGetAttribute
40+
41+ - ** Memory Tracking** (` profiling/memory_tracker.rs ` )
42+ - ` CudaMemoryTracker ` - Track GPU memory allocations with timing
43+ - ` TrackedAllocation ` - Allocation metadata (ptr, size, kind, label, timestamp)
44+ - ` CudaMemoryKind ` - Device, Pinned, Mapped, Managed memory types
45+ - Peak usage tracking and allocation statistics
46+ - Integration with ` GpuMemoryDashboard ` from ringkernel-core
47+
48+ - ** Chrome Trace Export** (` profiling/chrome_trace.rs ` )
49+ - ` GpuTraceEvent ` - Chrome trace format event structure
50+ - ` GpuEventArgs ` - Rich event metadata (grid/block dims, occupancy, bandwidth)
51+ - ` GpuChromeTraceBuilder ` - Build Chrome trace JSON from profiling sessions
52+ - Support for kernel events, transfer events, NVTX ranges, memory allocations
53+ - Process/thread naming for multi-GPU and multi-stream visualization
54+ - Compatible with chrome://tracing, Perfetto UI, and Nsight Systems
55+
56+ ### Changed
57+
58+ - ** Dependencies** - Added ` nvtx ` feature to cudarc dependency
59+ - ** ringkernel-cuda/Cargo.toml** - Added optional ` serde ` and ` serde_json ` for Chrome trace export
60+
61+ ### Fixed
62+
63+ - Added ` ProfilerRange::stub() ` public constructor in ringkernel-core for external profiler implementations
64+
1065## [ 0.3.1] - 2026-01-19
1166
1267### Added
@@ -771,7 +826,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
771826- CLAUDE.md with build commands and architecture overview
772827- Code examples for all major features
773828
774- [ Unreleased ] : https://github.com/mivertowski/RustCompute/compare/v0.3.1...HEAD
829+ [ Unreleased ] : https://github.com/mivertowski/RustCompute/compare/v0.3.2...HEAD
830+ [ 0.3.2 ] : https://github.com/mivertowski/RustCompute/compare/v0.3.1...v0.3.2
775831[ 0.3.1 ] : https://github.com/mivertowski/RustCompute/compare/v0.3.0...v0.3.1
776832[ 0.3.0 ] : https://github.com/mivertowski/RustCompute/compare/v0.2.0...v0.3.0
777833[ 0.2.0 ] : https://github.com/mivertowski/RustCompute/compare/v0.1.3...v0.2.0
0 commit comments