Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Commit 5ecf1a0

Browse files
committed
Refactor CUDA evaluation functions and improve multi-GPU handling in pe_synth_cuda_u64_cones.cu
- Removed redundant thread creation in CUDA evaluation functions to reduce CPU overhead. - Introduced a work enqueueing mechanism for non-blocking stream execution, allowing concurrent GPU operations. - Enhanced the `bitset_matrix_device` structure with a host staging buffer to optimize memory management. - Updated resource cleanup logic to ensure proper handling of CUDA device states and buffers.
1 parent a823a81 commit 5ecf1a0

1 file changed

Lines changed: 226 additions & 103 deletions

File tree

0 commit comments

Comments
 (0)