|
| 1 | +# Lanczos solver, |
| 2 | + Computes the matrix-vector product sqrt(M)·v using a recursive algorithm. |
| 3 | + For that, it requires a functor in which the () operator takes an output real* array and an input real* (both in device memory if compiled in CUDA mode or host memory otherwise) as: |
| 4 | + ```c++ |
| 5 | + inline operator()(real* out_Mv, real * a_v); |
| 6 | + ``` |
| 7 | + This function must fill "out" with the result of performing the M·v dot product- > out = M·a_v. |
| 8 | + If M has size NxN and the cost of the dot product is O(M). The total cost of the algorithm is O(m·M). Where m << N. |
| 9 | + If M·v performs a dense M-V product, the cost of the algorithm would be O(m·N^2). |
| 10 | +
|
| 11 | +This is a header-only library, although a shared library can be |
| 12 | +
|
| 13 | +## Usage: |
| 14 | +
|
| 15 | +See example.cu for an usage example that can be compiled to work in GPU or CPU mode instinctively. |
| 16 | +See example.cpp for a CPU only example. |
| 17 | +
|
| 18 | +Let us go through the remaining one, a GPU-only example. |
| 19 | +
|
| 20 | +Create the module: |
| 21 | +```c++ |
| 22 | + real tolerance = 1e-6; |
| 23 | + lanczos::Solver lanczos(tolerance); |
| 24 | +``` |
| 25 | +Write a functor that computes the product between the original matrix and a given vector, "v": |
| 26 | +```c++ |
| 27 | +//A functor that will return the result of multiplying a certain matrix times a given vector |
| 28 | +struct MatrixDot{ |
| 29 | + int size; |
| 30 | + MatrixDot(int size): size(size){} |
| 31 | + |
| 32 | + void operator()(real* v, real* Mv){ |
| 33 | + //An example diagonal matrix |
| 34 | + for(int i=0; i<size; i++){ |
| 35 | + Mv[i] = 2*v[i]; |
| 36 | + } |
| 37 | + } |
| 38 | + |
| 39 | +}; |
| 40 | + |
| 41 | +``` |
| 42 | +
|
| 43 | +Provide the solver with an instance of the functor and the target vector: |
| 44 | +
|
| 45 | +```c++ |
| 46 | + int size = 10; |
| 47 | + //A vector filled with 1. |
| 48 | + //Lanczos defines this type for convenience. It will be a thrust::device_vector if CUDA_ENABLED is defined and an std::vector otherwise |
| 49 | + thrust::device_vector<real> v(size); |
| 50 | + thrust::fill(v.begin(), v.end(), 1); |
| 51 | + //A vector to store the result of sqrt(M)*v |
| 52 | + thrust::device_vector<real> result(size); |
| 53 | + //A functor that multiplies by the identity matrix times two |
| 54 | + MatrixDot dot(size); |
| 55 | + //Call the solver |
| 56 | + real* d_result = thrust::raw_pointer_cast(result.data()); |
| 57 | + real* d_v = thrust::raw_pointer_cast(v.data()); |
| 58 | + int numberIterations = lanczos.solve(dot, d_result, d_v, size); |
| 59 | +``` |
| 60 | +The solve function returns the number of iterations that were needed to achieve the requested accuracy. |
| 61 | + |
| 62 | +## Other functions: |
| 63 | + |
| 64 | +After a certain number of iterations, if convergence was not achieved, the module will give up and throw an error. |
| 65 | +To increase this threshold you can use this function: |
| 66 | +```c++ |
| 67 | +lanczos::Solver::setIterationHardLimit(int newlimit); |
| 68 | +``` |
| 69 | +## Compilation: |
| 70 | +This library requires lapacke and cblas (can be replaced by MKL). In GPU mode CUDA is also needed. |
| 71 | +Note, however, that the heavy-weight of this solver comes from the Matrix-vector multiplication that must provided by the user. The main benefit of the CUDA mode is not an increased performance of the internal library code, but the fact that the input/output arrays will live in the GPU (saving potential memory copies). |
| 72 | +## Optional macros: |
| 73 | +
|
| 74 | +**CUDA_ENABLED**: Will compile a GPU enabled shared library, the solver expects input/output arrays to be in the GPU and most of the computations will be carried out in the GPU. Requires a working CUDA environment. |
| 75 | +**DOUBLE_PRECISION**: The library is compiled in single precision by default. This macro switches to double precision, making ```lanczos::real``` be a typedef to double. |
| 76 | +**USE_MKL**: Will include mkl.h instead of lapacke and cblas. You will have to modify the compilation flags accordingly. |
| 77 | +**SHARED_LIBRARY_COMPILATION**: The Makefile uses this macro to compile a shared library. By default, this library is header only. |
| 78 | +
|
| 79 | +See the Makefile for further instructions. |
| 80 | +
|
| 81 | +## References: |
| 82 | +
|
| 83 | + [1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347 |
| 84 | + |
| 85 | +## Some notes: |
| 86 | +
|
| 87 | + From what I have seen, this algorithm converges to an error of ~1e-3 in a few steps (<5) and from that point a lot of iterations are needed to lower the error. |
| 88 | + It usually achieves machine precision in under 50 iterations. |
| 89 | +
|
| 90 | + If the matrix does not have a sqrt (not positive definite, not symmetric...) it will usually be reflected as a nan in the current error estimation. In this case an exception will be thrown. |
0 commit comments