CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments

I've attempted to copy the matrix [1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ] stored in column-major format as x, by first copying it to a matrix in an NVIDIA GPU d_x using cublasSetMatrix, and then copying d_x to y using cublasGetMatrix(). #include<stdio.h>...

Set each gpu for each thread

For example, I have 2 GPUs and 2 host threads. I cant check it because multigpu PC is far away from me. I want to make the first host thread work with the first GPU and the second host thread work with the second GPU. All host threads consist of...

Sharing roots and weights for many Gauss-Legendre Quadrature in GPUs

I am intending to compute in parallel fashion a lot of numerical quadratures that at the end of the day use a common set of data for all the computations ( a quite big arrays of roots and weights ocupying about 25 Kb of memory). The Gauss-Legendre quadrature method is...