Conversation
|
Just noting here that the trace operator doesn't work on GPU's as cupy's einsum doesn't take an out argument. A similar fix to dot product (commit fb9b3d6) fixes things. |
|
Interpolate doesn't currently work either, with error |
trace is fixed as of [6e32312] . If arg0 is a python int, and arg1 is np.float32, this function returns np.float64, leading to the error. This error is unavoidable, since subtract invokes a multiplication by the int (-1). For example f-g will do f + ( (-1)*g). I am not sure what the best fix is. I've tried which fixes things, i.e. use arg0 rather than type(arg0). This gives np.float32 which is correct behaviour here. I wanted to check this is a good fix before pushing. |
…ke a new buff and recompute the analysis.
|
We may want to use cupy's jit-rawkernel approach to handle type generality: https://docs.cupy.dev/en/stable/user_guide/kernel.html#jit-kernel-definition |
This seems to work for here |
|
I've changed the Chebyshev transform default from the cupy DCT to matrix transforms for now, since even up to sizes of 1024 that seems much much faster. The cupy transform seems quite slow for some reason in my tests. Note there is no CUDA-native DCT, so cupy implements the DCT as an extended FFT. |
This PR adds GPU support for one dimensional bases and cartesian problems. Remaining rough edges include good defaults for subproblem coupling, and raising or warning for unsupported features (GPU+MPI, GPU+curvilinear).