This repository was archived by the owner on Apr 29, 2026. It is now read-only.
Commit 5ecf1a0
committed
Refactor CUDA evaluation functions and improve multi-GPU handling in
- Removed redundant thread creation in CUDA evaluation functions to reduce CPU overhead.
- Introduced a work enqueueing mechanism for non-blocking stream execution, allowing concurrent GPU operations.
- Enhanced the `bitset_matrix_device` structure with a host staging buffer to optimize memory management.
- Updated resource cleanup logic to ensure proper handling of CUDA device states and buffers.pe_synth_cuda_u64_cones.cu
1 parent a823a81 commit 5ecf1a0
1 file changed
Lines changed: 226 additions & 103 deletions
0 commit comments