FAQ Database Discussion Community


CUDA thrust::device_vector of class | error

c++,cuda,thrust
I am new to CUDA and the thrust library. I've been looking through a lot of examples and questions regarding my problem, however, I was not able to transfer a solution. I have a class Cell which should contain a vector of Tree (another class). This is my Cell.h #pragma...

Using CUDA Thrust algorithms sequentially on the host

cuda,thrust
I wish to compare a Thrust algorithm's runtime when executed sequentially on a single CPU core versus a parallel execution on a GPU. Thrust specifies the thrust::seq execution policy, but how can I explicity target the host backend system? I wish to avoid executing the algorithm sequentially on the GPU....

thrust::exclusive_scan_by_key unexpected behavior

c++,cuda,thrust
int data[ 10 ] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }; int keys[ 10 ] = { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; thrust::exclusive_scan_by_key( keys, keys + 10, data, data ); By the examples at Thrust Site I expected...

In CUDA / Thrust, how can I access a vector element's neighbor during a for-each operation?

c++,cuda,thrust
I am trying to do some scientific simulation using Thrust library in CUDA, but I got stuck in the following operation which is basically a for-each loop: device_vector<float> In(N); for-each In(x) in In Out(x) = some_calculation(In(x-1),In(x),In(x+1)); end I have already looked up stackoverflow.com and find some similar questions: Similar questions...

thrust remove copy unique by key

cuda,thrust,stream-compaction
I'm a bit confused on the best way to do the following: Say I have the following sorted key value pairs (K:V) (0 : .5)(0 : .7)(0 : .9) (1 : .2) (1 : .6) (1 : .8) and so on.. I want to remove copy the minimum value of...

Determining the 2 largest elements and their positions in each matrix row with CUDA Thrust

algorithm,sorting,cuda,thrust
I have a matrix and I need to compute the 2 largest numbers AND their position in each row of this matrix. My initial attempt was to try and sort each row of the matrix and then look at the last two values. While I could sort each row, I...

How to use Thrust to sort the rows of a matrix?

sorting,cuda,thrust
I have a 5000x500 matrix and I want to sort each row separately with cuda. I can use arrayfire but this is just a for loop over the thrust::sort, which should not be efficient. https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cuda/kernel/sort.hpp for(dim_type w = 0; w < val.dims[3]; w++) { dim_type valW = w * val.strides[3];...

Thrust : reduce_by_key is slower than expected

performance,cuda,parallel-processing,gpgpu,thrust
I have the following code : thrust::device_vector<int> unique_idxs(N); thrust::device_vector<int> sizes(N); thrust::pair<thrust::device_vector<int>::iterator, thrust::device_vector<int>::iterator> new_end = reduce_by_key(idxs.begin(), idxs.end(),thrust::make_constant_iterator(1),unique_idxs.begin(),sizes.begin()); int unique_elems=new_end.first-unique_idxs.begin(); sizes.erase(new_end.second, sizes.end()); where idxs is a sorted device vector of indices, unique_idxs are the unique indices and sizes...

How do you build the example CUDA Thrust device sort?

c++,visual-studio-2010,sorting,cuda,thrust
I am trying to build and run the Thrust example code in Visual Studio 2010 with the latest version (7.0) of CUDA and the THURST install that comes with it. I cannot get the example code to build and run. By eliminating parts of the code, I found the problem...

from thrust to arrayfire - gfor usage?

cuda,thrust,arrayfire
I am trying to replace some thrust calls to arrayfire to check the performance. I am not sure if I am using properly arrayfire because the results I am taking do not match at all. So , the thrust code for example I am using is: cudaMalloc( (void**) &devRow, N...

Compilation error using FindCUDA.cmake and Thrust with THRUST_DEVICE_SYSTEM_OMP

cuda,cmake,openmp,thrust
I recently discovered that Thrust was able to handle automatic OMP and TBB parallelisation in addition to its classic cuda capability. Although I was able to use this extremely verstile feature on a simple example, my cmake configuration generated compilation error, maybe I am using FindCUDA.cmake the wrong way, or...

Accelerating __device__ function in Thrust comparison operator

cuda,parallel-processing,gpgpu,thrust
I'm running a Thrust parallelized binary search-type routine on an array: // array and array2 are raw pointers to device memory thrust::device_ptr<int> array_ptr(array); // Search for first position where 0 could be inserted in array // without violating the ordering thrust::device_vector<int>::iterator iter; iter = thrust::lower_bound(array_ptr, array_ptr+length, 0, cmp(array2)); A custom...

Is sort_by_key in thrust a blocking call?

cuda,gpgpu,thrust
I repeatedly enqueue a sequence of kernels: for 1..100: for 1..10000: // Enqueue GPU kernels Kernel 1 - update each element of array Kernel 2 - sort array Kernel 3 - operate on array end // run some CPU code output "Waiting for GPU to finish" // copy from device...

cuda thrust: selective copying and resizing results

cuda,thrust
I am copying items selectively between two thrust device arrays using copy_if as follows: thrust::device_vector<float4> collated = thrust::device_vector<float4> original_vec.size()); thrust::copy_if(original_vec.begin(), original_vec.end(), collated.begin(), is_valid_pt()); collated.shrink_to_fit(); The is_valid_pt is implemented as: struct is_valid_kpt { __host__ __device__ bool operator()(const float4 x) { return x.w >= 0; } }; Now after running this code,...

Do I need to free device_ptr returned by thrust?

c++,pointers,cuda,thrust
I have a function to get the minimum value of an array and it's executed within a loop. thrust::device_ptr<float> min_ptr = thrust::min_element(populationFitness, populationFitness + POPULATION); Do I have to free the returned device_ptr? I tried with thrust::device_free(min_ptr) but an exception is thrown....

Reduce by key on device array

cuda,parallel-processing,thrust
I am using reduce_by_key to find the number of elements in an array of type int2 which has same first values . For example Array: <1,2> <1,3> <1,4> <2,5> <2,7> so no. elements with 1 as first element are 3 and with 2 are 2. CODE: struct compare_int2 : public...

Stream compaction with Thrust; best practices and fastest way?

c++,cuda,gpgpu,thrust,sparse-array
I am interested in porting some existing code to use thrust to see if I can speed it up on the GPU with relative ease. What I'm looking to accomplish is a stream compaction operation, where only nonzero elements will be kept. I have this mostly working, per the example...