|
1 | | -# Loch: CUDA accelerated Grand Canonical Monte Carlo (GCMC) water sampling |
| 1 | +# Loch: GPU accelerated Grand Canonical Monte Carlo (GCMC) water sampling |
2 | 2 |
|
3 | 3 | ## Introduction |
4 | 4 |
|
5 | | -We present `loch`, a high-performance CUDA-accelerated Python package designed |
| 5 | +We present `loch`, a high-performance GPU-accelerated Python package designed |
6 | 6 | for Grand Canonical Monte Carlo (GCMC) water sampling in molecular simulations |
7 | 7 | via [OpenMM](https://openmm.org/). To enable parallelisation of insertion and |
8 | | -deletion attempts, `loch` leverages GPU capabilities using a custom CUDA kernel |
9 | | -for nonbonded interactions. This allows thousands of GCMC trials to be attempted |
10 | | -in parallel, significantly enhancing sampling efficiency compared to traditional |
11 | | -CPU-based implementations that perform sequential attempts via the OpenMM Python |
12 | | -API. Additionally, electrostatics for GCMC attempts are computed using the |
13 | | -reaction field (RF) method, with accepted candidates being re-evaluated with a |
14 | | -correction step based on the difference between reaction field and Particle Mesh |
15 | | -Ewald (PME) potential energies. The use of an approximate potential for trial |
16 | | -moves leads to a substantial speed-up in GCMC move evaluation. `loch` has been |
17 | | -designed to be modular, allowing standalone GCMC sampling, or integration with |
18 | | -OpenMM-based molecular dynamics simulation code, e.g. as has been done in the |
19 | | -[SOMD2](https://github.com/openbiosim/somd2) free-energy perturbation engine. |
| 8 | +deletion attempts, `loch` leverages GPU capabilities using a custom CUDA/OpenCL |
| 9 | +kernel for nonbonded interactions. This allows thousands of GCMC trials to be |
| 10 | +attempted in parallel, significantly enhancing sampling efficiency compared to |
| 11 | +traditional CPU-based implementations that perform sequential attempts via the |
| 12 | +OpenMM Python API. Additionally, electrostatics for GCMC attempts are computed |
| 13 | +using the reaction field (RF) method, with accepted candidates being |
| 14 | +re-evaluated with a correction step based on the difference between reaction |
| 15 | +field and Particle Mesh Ewald (PME) potential energies. The use of an |
| 16 | +approximate potential for trial moves leads to a substantial speed-up in GCMC |
| 17 | +move evaluation. `loch` has been designed to be modular, allowing standalone |
| 18 | +GCMC sampling, or integration with OpenMM-based molecular dynamics simulation |
| 19 | +code, e.g. as has been done in the [SOMD2](https://github.com/openbiosim/somd2) |
| 20 | +free-energy perturbation engine. |
20 | 21 |
|
21 | 22 | ## Parallelisation strategy |
22 | 23 |
|
@@ -52,6 +53,14 @@ each iteration, as more trials need to be evaluated in parallel, and more data |
52 | 53 | needs to be transferred to and from the GPU, in which case it might be more |
53 | 54 | efficient to simply perform more iterations with a smaller batch size. |
54 | 55 |
|
| 56 | +To enable reproduciblility across GPU platforms we choose to generate random |
| 57 | +numbers on the host using NumPy's random number generator, then transfer these |
| 58 | +to the GPU kernels where required. This avoids differences in random number |
| 59 | +generation across different GPU architectures and drivers, making testing |
| 60 | +and validation of the implementation significantly easier. In benchmarks we |
| 61 | +have found the NumPy approach to be as performant as using GPU-based random |
| 62 | +numbers for the typical batch sizes employed in `loch`. |
| 63 | + |
55 | 64 | ## Sampling from an approximate potential |
56 | 65 |
|
57 | 66 | In order to further accelerate the evaluation of GCMC insertion and deletion |
@@ -91,7 +100,7 @@ Other than the cost of evaluating GCMC trials using PME, performance is aslo |
91 | 100 | impacted by the cost of updating nonbonded parameters and atomic positions |
92 | 101 | in the OpenMM context after each accepted insertion or deletion. (No updates |
93 | 102 | are required for trial moves, since these are all evaluated via the custom |
94 | | -CUDA kernel.) [Recent updates](https://github.com/openmm/openmm/pull/4610) |
| 103 | +CUDA/OpenCL kernel.) [Recent updates](https://github.com/openmm/openmm/pull/4610) |
95 | 104 | to OpenMM have helped mitigate the cost of modifying force field parameters, |
96 | 105 | allowing updates for only the subset of parameters that have changed within |
97 | 106 | a particular force. However, updating atomic positions still requires |
|
0 commit comments