FAQ Database Discussion Community


purposely causing bank conflicts for shared memory on CUDA device

cuda,gpu,shared-memory,bank-conflict
It is a mystery for me how shared memory on cuda devices work. I was curious to count threads having access to the same shared memory. For this I wrote a simple program #include <cuda_runtime.h> #include <stdio.h> #define nblc 13 #define nthr 1024 //[email protected] __device__ int inwarpD[nblc]; __global__ void kernel(){...

CUDA shared memory bank conflicts report higher

cuda,gpu,shared-memory,bank-conflict
I've been working on optimizing some code and ran into an issue with the shared memory bank conflict report with the CUDA Nsight performance analysis. I was able to reduce it to a very simple piece of code that Nsight reports as having a bank conflict, when it doesn't seem...

mobile OpenCL local memory bank conflict. Why using local memory is slower than does global memory in kernel?

android,opencl,viola-jones,bank-conflict
I'm developing face detection app in android platform using OpenCL. Face detection algorithm is based on Viola Jones algorithm. I tried to make Cascade classification step kernel code. and I set classifier data of cascade stage 1 among cascade stages to local memory(__local) because classifier data are used for all...