Skip to content

Commit 20a63df

Browse files
committed
First commit
0 parents  commit 20a63df

14 files changed

Lines changed: 891 additions & 0 deletions

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*~
2+
a.out
3+
example
4+
*so
5+
*o
6+
compile_flags.txt

Makefile

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
2+
3+
NVCC=nvcc
4+
CXX=g++
5+
6+
#Uncomment for a GPU enabled library
7+
#CUDA_ENABLED=-DCUDA_ENABLED
8+
9+
#Uncomment to compile in double precision mode
10+
#DOUBLE_PRECISION=-DDOUBLE_PRECISION
11+
12+
13+
INCLUDEFLAGS=-Iinclude/
14+
NVCCLDFLAGS= -lcublas
15+
LDFLAGS= -llapacke -lcblas
16+
17+
LIBNAME=liblanczos.so
18+
19+
20+
CXXFLAGS=-fPIC -w -O3 -g -std=c++14 $(INCLUDEFLAGS) $(DOUBLE_PRECISION) $(CUDA_ENABLED)
21+
NVCCFLAGS=-ccbin=$(CXX) -Xcompiler "$(CXXFLAGS)" -std=c++14 -O3 $(INCLUDEFLAGS) $(DOUBLE_PRECISION) $(CUDA_ENABLED)
22+
23+
ifndef CUDA_ENABLED
24+
COMPILER=$(CXX)
25+
CXXFLAGS:=$(CXXFLAGS) -xc++
26+
else
27+
COMPILER=$(NVCC)
28+
LDFLAGS:=$(LDFLAGS) $(NVCCLDFLAGS)
29+
CXXFLAGS:=$(NVCCFLAGS) -I$(CUDA_ROOT)/include
30+
endif
31+
32+
all: shared $(patsubst %.cu, %, $(wildcard *.cu)) $(patsubst %.cpp, %, $(wildcard *.cpp))
33+
34+
$(LIBNAME): $(wildcard include/*.cu)
35+
$(COMPILER) -DSHARED_LIBRARY_COMPILATION -shared $(CXXFLAGS) $^ -o $@ $(LDFLAGS)
36+
37+
38+
shared: $(LIBNAME)
39+
40+
41+
%: %.cu Makefile
42+
$(COMPILER) $(CXXFLAGS) $< -o $@ $(LDFLAGS)
43+
44+
45+
46+
#clean: $(patsubst include/%.cu, %.clean, $(wildcard include/*.cu))
47+
48+
#%.clean:
49+
#rm -f $(@:.clean=.so)
50+
clean:
51+
rm -rf include/*.o $(LIBNAME) example

README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Lanczos solver,
2+
Computes the matrix-vector product sqrt(M)·v using a recursive algorithm.
3+
For that, it requires a functor in which the () operator takes an output real* array and an input real* (both in device memory if compiled in CUDA mode or host memory otherwise) as:
4+
```c++
5+
inline operator()(real* out_Mv, real * a_v);
6+
```
7+
This function must fill "out" with the result of performing the M·v dot product- > out = M·a_v.
8+
If M has size NxN and the cost of the dot product is O(M). The total cost of the algorithm is O(m·M). Where m << N.
9+
If M·v performs a dense M-V product, the cost of the algorithm would be O(m·N^2).
10+
11+
This is a header-only library, although a shared library can be
12+
13+
## Usage:
14+
15+
See example.cu for an usage example that can be compiled to work in GPU or CPU mode instinctively.
16+
See example.cpp for a CPU only example.
17+
18+
Let us go through the remaining one, a GPU-only example.
19+
20+
Create the module:
21+
```c++
22+
real tolerance = 1e-6;
23+
lanczos::Solver lanczos(tolerance);
24+
```
25+
Write a functor that computes the product between the original matrix and a given vector, "v":
26+
```c++
27+
//A functor that will return the result of multiplying a certain matrix times a given vector
28+
struct MatrixDot{
29+
int size;
30+
MatrixDot(int size): size(size){}
31+
32+
void operator()(real* v, real* Mv){
33+
//An example diagonal matrix
34+
for(int i=0; i<size; i++){
35+
Mv[i] = 2*v[i];
36+
}
37+
}
38+
39+
};
40+
41+
```
42+
43+
Provide the solver with an instance of the functor and the target vector:
44+
45+
```c++
46+
int size = 10;
47+
//A vector filled with 1.
48+
//Lanczos defines this type for convenience. It will be a thrust::device_vector if CUDA_ENABLED is defined and an std::vector otherwise
49+
thrust::device_vector<real> v(size);
50+
thrust::fill(v.begin(), v.end(), 1);
51+
//A vector to store the result of sqrt(M)*v
52+
thrust::device_vector<real> result(size);
53+
//A functor that multiplies by the identity matrix times two
54+
MatrixDot dot(size);
55+
//Call the solver
56+
real* d_result = thrust::raw_pointer_cast(result.data());
57+
real* d_v = thrust::raw_pointer_cast(v.data());
58+
int numberIterations = lanczos.solve(dot, d_result, d_v, size);
59+
```
60+
The solve function returns the number of iterations that were needed to achieve the requested accuracy.
61+
62+
## Other functions:
63+
64+
After a certain number of iterations, if convergence was not achieved, the module will give up and throw an error.
65+
To increase this threshold you can use this function:
66+
```c++
67+
lanczos::Solver::setIterationHardLimit(int newlimit);
68+
```
69+
## Compilation:
70+
This library requires lapacke and cblas (can be replaced by MKL). In GPU mode CUDA is also needed.
71+
Note, however, that the heavy-weight of this solver comes from the Matrix-vector multiplication that must provided by the user. The main benefit of the CUDA mode is not an increased performance of the internal library code, but the fact that the input/output arrays will live in the GPU (saving potential memory copies).
72+
## Optional macros:
73+
74+
**CUDA_ENABLED**: Will compile a GPU enabled shared library, the solver expects input/output arrays to be in the GPU and most of the computations will be carried out in the GPU. Requires a working CUDA environment.
75+
**DOUBLE_PRECISION**: The library is compiled in single precision by default. This macro switches to double precision, making ```lanczos::real``` be a typedef to double.
76+
**USE_MKL**: Will include mkl.h instead of lapacke and cblas. You will have to modify the compilation flags accordingly.
77+
**SHARED_LIBRARY_COMPILATION**: The Makefile uses this macro to compile a shared library. By default, this library is header only.
78+
79+
See the Makefile for further instructions.
80+
81+
## References:
82+
83+
[1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347
84+
85+
## Some notes:
86+
87+
From what I have seen, this algorithm converges to an error of ~1e-3 in a few steps (<5) and from that point a lot of iterations are needed to lower the error.
88+
It usually achieves machine precision in under 50 iterations.
89+
90+
If the matrix does not have a sqrt (not positive definite, not symmetric...) it will usually be reflected as a nan in the current error estimation. In this case an exception will be thrown.

example.cpp

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
/*Raul P. Pelaez 2022. Lanczos solver CPU example
2+
This code will compute and print sqrt(M)*v using the iterative Krylov solver in [1].
3+
4+
In this usage example, the matrix M is a diagonal matrix and v is filled with ones.
5+
6+
This code is equivalent to example.cu, but can only be compiled in CPU mode.
7+
8+
You can compile it with:
9+
g++ -std=c++14 -Iinclude example.cpp -llapacke -lcblas
10+
11+
References:
12+
[1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations
13+
J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347
14+
*/
15+
#include<iostream>
16+
#include <LanczosAlgorithm.h>
17+
18+
//Using this floating point type (either float or double) will make the code be compiled with the same
19+
// precision as the library.
20+
using real = lanczos::real;
21+
//A functor that will return the result of multiplying a certain matrix times a given vector
22+
struct MatrixDot{
23+
int size;
24+
MatrixDot(int size): size(size){}
25+
26+
void operator()(real* v, real* Mv){
27+
//an example diagonal matrix
28+
for(int i=0; i<size; i++){
29+
Mv[i] = (2+i/10.0)*v[i];
30+
}
31+
}
32+
33+
};
34+
35+
36+
int main(){
37+
{
38+
//Initialize the solver
39+
real tolerance = 1e-6;
40+
lanczos::Solver lanczos(tolerance);
41+
int size = 10;
42+
//A vector filled with 1.
43+
//Lanczos defines this type for convenience. It will be a thrust::device_vector if CUDA_ENABLED is defined and an std::vector otherwise
44+
std::vector<real> v(size);
45+
std::fill(v.begin(), v.end(), 1);
46+
//A vector to store the result of sqrt(M)*v
47+
std::vector<real> result(size);
48+
//A functor that multiplies by the identity matrix times two
49+
MatrixDot dot(size);
50+
//Call the solver
51+
int numberIterations = lanczos.solve(dot, result.data(), v.data(), size);
52+
std::cout<<"Solved after "<<numberIterations<< " iterations"<<std::endl;
53+
//Now result is filled with sqrt(M)*v = sqrt(2)*[1,1,1...1]
54+
std::cout<<"Result: ";for(int i = 0; i<10; i++) std::cout<<result[i]<<" "; std::cout<<std::endl;
55+
//Compute error
56+
std::cout<<"Error: ";
57+
for(int i = 0; i<10; i++){
58+
real truth = sqrt(2+i/10.0);
59+
std::cout<<abs(result[i]-truth)/truth<<" ";
60+
}
61+
std::cout<<std::endl;
62+
}
63+
return 0;
64+
}

example.cu

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
/*Raul P. Pelaez 2022. Lanczos solver CPU/GPU example
2+
This code will compute and print sqrt(M)*v using the iterative Krylov solver in [1].
3+
4+
In this usage example, the matrix M is a diagonal matrix and v is filled with ones.
5+
6+
The code below uses a series of utilities available in the library to make it work in either the CPU or GPU.
7+
8+
If this code is compiled with CUDA_ENABLED defined, the code will run in a GPU. Otherwise it will be CPU only.
9+
10+
References:
11+
[1] Krylov subspace methods for computing hydrodynamic interactions in Brownian dynamics simulations
12+
J. Chem. Phys. 137, 064106 (2012); doi: 10.1063/1.4742347
13+
*/
14+
#include<iostream>
15+
#include <LanczosAlgorithm.h>
16+
17+
//Using this floating point type (either float or double) will make the code be compiled with the same
18+
// precision as the library.
19+
using real = lanczos::real;
20+
21+
//A functor that will return the result of multiplying a certain matrix times a given vector
22+
struct MatrixDot{
23+
int size;
24+
MatrixDot(int size): size(size){}
25+
26+
void operator()(real* v, real* Mv){
27+
//An example diagonal matrix
28+
for(int i=0; i<size; i++){
29+
Mv[i] = (2+i/10.0)*v[i];
30+
}
31+
}
32+
33+
};
34+
35+
36+
int main(){
37+
{
38+
//Initialize the solver
39+
real tolerance = 1e-6;
40+
lanczos::Solver lanczos(tolerance);
41+
int size = 10;
42+
//A vector filled with 1.
43+
//Lanczos defines this type for convenience. It will be a thrust::device_vector if CUDA_ENABLED is defined and an std::vector otherwise
44+
lanczos::device_container<real> v(size);
45+
lanczos::detail::device_fill(v.begin(), v.end(), 1);
46+
//A vector to store the result of sqrt(M)*v
47+
lanczos::device_container<real> result(size);
48+
//A functor that multiplies by the identity matrix times two
49+
MatrixDot dot(size);
50+
//Call the solver
51+
real* d_result = lanczos::detail::getRawPointer(result);
52+
real* d_v = lanczos::detail::getRawPointer(v);
53+
int numberIterations = lanczos.solve(dot, d_result, d_v, size);
54+
std::cout<<"Solved after "<<numberIterations<< " iterations"<<std::endl;
55+
//Now result is filled with sqrt(M)*v = sqrt(2)*[1,1,1...1]
56+
std::cout<<"Result: ";for(int i = 0; i<10; i++) std::cout<<result[i]<<" "; std::cout<<std::endl;
57+
//Compute error
58+
std::cout<<"Error: ";
59+
for(int i = 0; i<10; i++){
60+
real truth = sqrt(2+i/10.0);
61+
std::cout<<abs(result[i]-truth)/truth<<" ";
62+
}
63+
std::cout<<std::endl;
64+
65+
}
66+
return 0;
67+
}

0 commit comments

Comments
 (0)