FAQ Database Discussion Community


Theano: cublasSgemm failed (14) an internal operation failed

cuda,theano,cublas
Sometimes, after a while of running fine, I get such an error with Theano / CUDA: RuntimeError: cublasSgemm failed (14) an internal operation failed unit=0 N=0, c.dims=[512 2048], a.dim=[512 493], alpha=%f, beta=%f, a=%p, b=%p, c=%p sa_0=%d, sa_1=%d, sb_0=%d, sb_1=%d, sc_0=%d, sc_1=%d Apply node that caused the error: GpuDot22(GpuReshape{2}.0, GpuReshape{2}.0) Inputs...

cublasSgetriBatched compilation error with CUDA 7.0 Release Candidate

cuda,cublas
Consider the code posted by sgarizvi at CUBLAS: Incorrect inversion for matrix with zero pivot I'm using that code as an off-the-shelf reproducer of my problem. If I compile it with CUDA 6.0, everything works fine. Opposite to that, if I compile it with CUDA 6.5 or CUDA 7.0 Release...

Why cuSparse is much slower than cuBlas for sparse matrix multiplication

matrix,cuda,multiplication,sparse,cublas
Recently when I used cuSparse and cuBlas in CUDA TOOLKIT 6.5 to do sparse matrix multiplication, I find cuSparse is much slower than cuBlas in all cases! In all my experiments, I used "cusparseScsrmm" in cuSparse and "cublasSgemm" in cuBlas. In the sparse matrix, half of the total elements are...

Crash in Theano/CUDA exit in cuStreamDestroy

cuda,theano,cublas
I have an application which is linked against CPython and calls Theano+CUDA code from there. The application itself also uses CUDA and Cublas. But as they are creating their own handle, I think they should not get into problems. The GPU is in exclusive mode, i.e. only used by that...

Is there a function in the cublas that can apply the sigmoid function with a vector?

cuda,gpu,cublas
As the title says, I want to do the element-wise operation in the vector with a function.I wonder that is there any function in the cublas library to do that?

CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments

cuda,gpgpu,gpu-programming,cublas
I've attempted to copy the matrix [1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ] stored in column-major format as x, by first copying it to a matrix in an NVIDIA GPU d_x using cublasSetMatrix, and then copying d_x to y using cublasGetMatrix(). #include<stdio.h>...