FAQ Database Discussion Community


OpenMP over Summation

parallel-processing,fortran,openmp,fortran90,gfortran
I have been trying to apply OpenMP on a simple summation operation inside two nested loop, but it produced incorrect result so far. I have been looking around in here and here, also in here. All suggest to use reduction clause, but it does not work for my case by...

OpenMP Matrix-Vector Multiplication Executes on Only One Thread

c++,multithreading,parallel-processing,openmp,mex
I have this code (outlined below) for parallelizing matrix-vector multiplication. But whenever I run it, I discover that it is executing on just one thread (even though I specified 4). How can I separate parts of the parallel code to run on separate threads. Any help will be highly appreciated....

OpenMP - Parallel code give different result from sequential one

openmp
I got some problem on openmp. I've written some computational codes and parallize the code using openmp. But sequential and parallel gave me different result. Here is the code for(i=0; i<grid_number; i++) { double norm = 0; const double alpha = gsl_vector_get(valpha, i); for(j=0; j<n_sim; j++) { gsl_matrix_complex *sub_data =...

Segmentation fault in openMP program with SSE instructions with threads > 4

c++,multithreading,segmentation-fault,openmp,sse
I wrote a simple C++ openMP program that uses SSE instructions, and I am facing a segmentation fault when the number of threads is bigger than 4. I am using g++ on Linux. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> #include <emmintrin.h> #include <assert.h> #include <stdint.h> #include <omp.h> unsigned...

Disabling OpenMP when Profiling Enabled

c,macros,profiling,openmp
When profiling my C code, I would like to disable/reduce the number of OMP threads to 1. After a brief search, I found this question. I would therefore decided to do something like #ifdef foo #define omp_get_thread_num() 0 #endif where foo is a macro that is true if the -pg...

Reason to use declare target pragma in OpenMP

openmp,offloading
I wonder what is the reason to use the declare target directive. I can simply use target {, data} map (to/from/tofrom ...) in order to specify which variables should be used by the device. As for the functions, is that compulsory for a function called from a target region to...

Is dynamic scheduling better or static scheduling (Parallel Programming)?

multithreading,parallel-processing,openmp,scheduling
I understand my question title is rather broad, I am new to parallel programming and openmp. I tried to parallelize a C++ solution for the N-body problem and study it for different schedule types and granularity. I collected data by running program for different cases and plotted the data, this...

OpenMP specify thread number of a for loop iteration

c++,multithreading,parallel-processing,openmp
I'm using the following command to parallelize a single loop over the available threads of the program: #pragma omp parallel for num_threads(threads) for(long i = 0; i < threads; i++) { array[i] = calculateStuff(i,...); } For technical reasons, I would like to guarantee that thread number 0 executes i=0, and...

Padding array manually

c,performance,openmp,xeon-phi
I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!): #define PAD64 0 #define...

OpenMP SIMD on Power8

openmp,vectorization,simd,powerpc
I'm wondering whether there is any compiler (gcc, xlc, etc.) on Power8 that supports OpenMP SIMD constructs on Power8? I tried with XL (13.1) but I couldn't compile successfully. Probably it doesn't support simd construct yet. I could compile with gcc 4.9.1 (with these flags -fopenmp -fopenmp-simd and -O1). I...

installing Rcpp on R compiled with intel composer on OSX Yosemite

r,clang,openmp,rcpp,intel-composer
Inspite of succeeding with the compilation of R-3.1.2 using the intel suite of compilers ver.2015.0.077 including MKL on my late 2014 MacBook Pro running Yosemite (outlined here), I am unable to install the excellent Rcpp package that I have been thoroughly enjoying thus far via the prepackaged binary R for...

Labeling data for Bag Of Words

c++,opencv,openmp,pragma,labeling
I've been looking at this tutorial and the labeling part confuses me. Not the act of labeling itself, but the way the process is shown in the tutorial. More specifically the #pragma omp sections: #pragma omp parallel for schedule(dynamic,3) for(..loop a directory?..) { ... #pragma omp critical { if(classes_training_data.count(class_) ==...

Compilation error using FindCUDA.cmake and Thrust with THRUST_DEVICE_SYSTEM_OMP

cuda,cmake,openmp,thrust
I recently discovered that Thrust was able to handle automatic OMP and TBB parallelisation in addition to its classic cuda capability. Although I was able to use this extremely verstile feature on a simple example, my cmake configuration generated compilation error, maybe I am using FindCUDA.cmake the wrong way, or...

Understanding the collapse clause in openmp

openmp
I came across an OpenMP code that had the collapse clause, which was new to me. I'm trying to understand what it means, but I don't think I have fully grasped it's implications; One definition that I found is: COLLAPSE: Specifies how many loops in a nested loop should be...

C++ OpenMP object counter incorrect counts with std::vector of objects

c++,multithreading,openmp
I need a threadsafe counter for the number of current objects of type Apple. I have tried to make a simple one with OpenMP, but I don't understand why the counting is incorrect. Here is a simplification of the class, with actual test code and actual output: Class class Apple...

Does OMP Pragmas nesting have significance?

openmp,pragma
I'm looking at some code like below (in a reviewer/auditor capacity). The nesting shown below was created with TABS in the source code. #pragma omp parallel #pragma omp sections { #pragma omp section p2 = ModularExponentiation((a % p), dp, p); #pragma omp section q2 = ModularExponentiation((a % q), dq, q);...

Compile OpenMP programs with gcc compiler on OS X Yosemite

c++,c,xcode,gcc,openmp
$ gcc 12.c -fopenmp 12.c:9:9: fatal error: 'omp.h' file not found #include<omp.h> ^ 1 error generated. While compiling openMP programs I get the above error. I am using OS X Yosemite. I first tried by installing native gcc compiler by typing gcc in terminal and later downloaded Xcode too still...

Simple speed up of C++ OpenMP kernel

c++,opencv,openmp
I have never worked with OpenMP or optimization of C++, so all help is welcome. I'm probably doing some very stupid things that slow down the process drastically. It doesn't need to be the fastest, but I think some easy tricks will significantly speed it up. Anyone? Thanks a lot!...

What preprocessor define does -fopenmp provide?

c,openmp,c-preprocessor
I've got some code that can run with (or without) OpenMP - it depends on how the user sets up the makefile. If they want to run with OpenMP, then they just add -fopenmp to CFLAGS and CXXFLAGS. I'm trying to determine what preprocessor macro I can use to tell...

Performance issue of OpenMP code called from a pthread

c++,openmp
I try to perform some computation asynchronously from an I/O bound operation. To do that I have used a pthread in which a loop is parallelized using OpenMP. However, this results in a performance degradation compared to the case where I perform the I/O bound operation in a pthread or...

How to run two set of code in parallel using openmp in c++

c++,multithreading,parallel-processing,openmp
I have two function which are not related to each other for example: int add(int num) { int sum=0; for(i=0;i<num;++i) sum+=i; return sum; } int mul(int num) { int mul=1; for(int i=1;i<num;++i) mul * i; return mul; } and I am suing them as follow: auto x=add(100); auto m=mul(200); cout<<a<<...

Why do my runtime images taken with eztrace not show the idleness of threads?

c,multithreading,openmp,trace
I was doing a college work parallelizing a code in C with OpenMP and then getting a runtime image with eztrace, converting and displaying on vite. But it's not showing the idle time on threads. My code obviously has an idle thanks to the use of static clause int prime_v2(int...

openMP slows down when passing from 2 to 4 threads doing binary searches in a custom container

c++,multithreading,openmp,sparse-matrix,slowdown
I'm currently having a problem parallelizing a program in c++ using openMP. I am implementing a recommendation system with a user-based collaborative filtering method. To do that, I implemented a sparse_matrix class as a dictionary of dictionaries (where I mean a sort of python dictionary). In my case, since insertion...

How to calculate how many times each thread executed a critical section in OpenMP?

c,multithreading,openmp
I have an OpenMP code, where I need to calculate how many times each thread executes the critical section, any idea how to do it? Code samples are highly welcomed.

windows - visual studio 2013 : OpenMP: omp_set_num_threads() not working

c++,openmp
I want to run this program : #include <iostream> #include <omp.h> using namespace std; int main() { int numThread, myId; cout << "num_procs=" << omp_get_num_procs(); omp_set_num_threads(omp_get_num_procs()); #pragma omp parallel { cout << "\nid=" << omp_get_thread_num(); numThread = omp_get_num_threads(); cout << "\nmax-thread=" << omp_get_max_threads(); } getchar(); } The result is: num_procs=4...

c++ & OpenMP : undefined reference to GOMP_loop_dynamic_start

c++,openmp
I'm stuck in the following problem : at first I compile the following file cancme.cpp : void funct() { int i,j,k,N; double s; #pragma omp parallel for default(none) schedule(dynamic,10) private(i,k,s) shared(j,N) for(i=j+1;i<N;i++) {} } by: mingw32-g++.exe -O3 -std=c++11 -mavx -fopenmp -c C:\pathtofile\cancme.cpp -o C:\pathtofile\cancme.o Next I build a second file,...

ANT doesnt terminate openmp executable (C++)

c++,linux,ant,openmp,icc
When I'm starting an executable (OpenMP, C++, icc) in an ANT exec-task, the task does not terminate. After looking in the processes, I discovered that my process was died (defunct). The executable writes output and it seems quite properly. There is no problem without using OpenMP. There is also no...

core dumped using lock in openMP

parallel-processing,locking,openmp
I want to parallelize function S and lock every node but I keep getting core dump. I'm trying to use a lock in every node of the graph. It will work if I use a single lock on my nodes. for (l = 0; l < n; l++) omp_init_lock(&(lock[l])); #pragma...

OMP For parallel thread ID hello world

c,multithreading,for-loop,openmp,parallel-for
I'm trying to get started with using basic OpenMP functionality in C. My basic understanding of 'omp parallel for' leads me to believe the following should distribute the following iterations of the loop between threads and should execute concurrently. The output I am getting is as follows. Code Below. Is...

Use of if clause in OpenMP

synchronization,task,openmp
Can't figure out the use of the if (0) clause in the following code as there also exists the #pragma omp single clause. Any ideas? ...

incomprehensible performance improvement with openmp even when num_threads(1)

c++,openmp
The following lines of code int nrows = 4096; int ncols = 4096; size_t numel = nrows * ncols; unsigned char *buff = (unsigned char *) malloc( numel ); unsigned char *pbuff = buff; #pragma omp parallel for schedule(static), firstprivate(pbuff, nrows, ncols), num_threads(1) for (int i=0; i<nrows; i++) { for...

Intersection of sorted vectors

c++,openmp,simd
I know that intersection between two sorted vectors or sets can be performed using std::set_intersection(). Is it possible to perform the same set intersection using openMP 4.0 SIMD. I need to perform set intersection between two sorted vectors many times in my code, so c++ set_intersection() turns out to be...

Parallel for loop for addition of local matrices in OpenMP

matrix,parallel-processing,openmp
I have n local copies of matrices,say 'local', in n threads. I want to update a global shared matrix 's' with its elements being sum of corresponding elements of all local matrices. For eg. s[0][0] = local_1[0][0] + local_2[0][0]+...+local_n[0][0]. I wrote the following loop to achieve it - #pragma omp...

Visual Studio 2013 OMP release mode

c++,visual-studio-2013,openmp
I'm trying to use OpenMP in Visual Studio 2013. It's working very well in Debug Mode and there is a huge performance boost, however when I switch to Release Mode I get worst results with OpenMP activated. Printing thread number will give always 0 in Release Mode. printf("%d\n", omp_get_thread_num()); So...

f2py with OMP: can't import module, undefined symbol GOMP_*

python,numpy,fortran,openmp,f2py
I was hoping to use openmp to speed up my Fortran code that I run through f2py. However, after compiling succesfully, I can't import the module in Python. For a Fortran95 module like this: module test implicit none contains subroutine readygo() real(kind = 8), dimension(10000) :: q !$OMP WORKSHARE q...

Parallel for loop with reduction and manipulating arrays

c,for-loop,openmp,pragma
I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below : #pragma omp parallel for default(shared) reduction(+...) for(i =...

How to disable omp in Torch nn package?

lua,openmp,torch
Specifically I would like nn.LogSoftMax to not use omp when the size of the input tensor is small. I have a small script to test the run time. require 'nn' my_lsm = function(t) o = torch.zeros((#t)[1]) sum = 0.0 for i = 1,(#t)[1] do o[i] = torch.exp(t[i]) sum = sum...

The time of execution doesn't change whether I increase the number of threads or not

openmp,execution-time
I am executing the following code snippet as explained in the openMP tutorial. But what I see is the time of execution doesn't change with NUM_THREADS, infact, the time of execution just keeps changing a lot..I am wondering if the way I am trying to measure the time is wrong....

Performance problems using OpenMP in nested loops

c++,multithreading,openmp
I'm using the following code, which contains an OpenMP parallel for loop nested in another for-loop. Somehow the performance of this code is 4 Times slower than the sequential version (omitting #pragma omp parallel for). Is it possible that OpenMp has to create Threads every time the method is called?...

OpenMP: is there a timeout for a parallel section?

c++,parallel-processing,timeout,scheduled-tasks,openmp
I'm having a problem here with OpenMP. There are two functions that shall be executed in parallel. In foo() there's a loop that shall be interrupted with stop. And as you can see it is assigned in the the other OMP-section. The code is: char stop; #pragma omp parallel {...

different OpenMP output in different machine

openmp
When I m trying to run the following code in my system centos running virtually i am getting right output but when i am trying to run the same code on compact supercomputer "Param Shavak" I am getting incorrect output.... :( #include<stdio.h> #include<omp.h> int main() { int p=1,s=1,ti #pragma omp...

Why OpenMP 'simd' has better performance than 'parallel for simd'?

c++,performance,concurrency,openmp
I'm working on a Intel E5 (6 cores, 12 threads) with intel compiler OpenMP 4.0 Why is this piece of code SIMD-ed quicker than parallel SIMD-ed? for (int suppv = 0; suppv < sSize; suppv++) { Value *gptr = &grid[gind]; const Value * cptr = &C[cind]; #pragma omp simd //...

openMP reduction and thread number control

openmp
I use OpenMP as: #pragma omp parallel for reduction(+:average_stroke_width) for(int i = 0; i < GB_name.size(); ++i) {...} I know I can use : #pragma omp parallel for num_threads(thread) for(int index = 0; index < GB_name.size(); ++index){...} How can I control the thread number when I use reduction? ...

Hybrid OpenMP+MPI : I need an explanation from this example

c,mpi,openmp,hybrid
I found this example on internet, but I can't understand what exactly is sent from the master node if it's A[5] for example what will be sent to other slaves? The 5th row or all elements till 5th row or all elements from 5th row and so on??? #include #include...

Using atomic operation in OpenMP for struct (x,y,z) variable

c++,struct,openmp,atomic
I am developing an OpenMP code in C++ (the compiler is g++ 4.8.2). In a part of my code I need to perform an add atomically on a struct data. The strcut is defined as: struct real3 { float x; float y; float z; }; and I defined addition operator...

OpenMP Dot Product and Pointers

c,pointers,for-loop,openmp,reduction
I'm trying to implement dotproduct in OpenMP with large arrays allocated with malloc. However, when I use reduction(+:result) it produces different results for each program run. Why do I get different results? How can I remedy that? And how can this example be optimized? Here's my code: #include <stdlib.h> #include...

openMp : parallelize std::map iteration

c++,openmp,stdmap
There are some posts about this issue but none of them satisfies me. I don't have openMp 3.0 support and I need to parallelize a iteration over a map. I want to know if this solution would work or not : auto element = myMap.begin(); #pragma omp parallel for shared(element)...

Why might the “fatal error C1001” error occur intermittently when using openmp?

c++,visual-studio-2010,boost,openmp
My code works well without #openmp but I got this error when I added #openmp compiler 1>c:\users\hdd amd ali\documents\v studio 10 projects\visual studio 2010\projects\escaledesvols2 - copy\escaledesvols2\djikstra.cpp(116): fatal error C1001: An internal error occurred in the compiler. 1> (compiler file 'f:\dd\vctools\compiler\utc\src\p2\wvm\mdmiscw.c', ligne 1098) note: i use many different libraries (like #boost) #include...

I need help to parallelize this code using OpenMP

c,parallel-processing,openmp
I wrote a C code that I would like to parallelize using OpenMP (I am a beginner and I have just a few days to solve this task); let's start from the main: first of all I have initialized 6 vectors (Vx,Vy,Vz,thetap,phip,theta); then there is a for loop that cycles...

17653 Segmentation fault (core dumped)

pointers,malloc,openmp,double-pointer
I am trying implement a matrix multiplication with dynamic memory allocation with OpenMP. I manage to get my program to compile fine but when i am trying to execute it i am getting ./ line 14: 17653 Segmentation fault (core dumped) ./matrix.exe $matrix_size int main(int argc, char *argv[]){ if(argc <...

What happens if one OpenMP thread crashes?

multithreading,parallel-processing,openmp
Consider the following case of a parallel for/do-loop: PARALLEL DO thread 1 thread 2 line 1 line 1 line k -> line k -> line l line l line n line n Now, thread 1 encounters an exception or an error (segmentation fault) on line l and terminates. What will...

What does gcc without multilib mean?

osx,gcc,g++,openmp
I was trying to use omh.h header file and I realized it was missing. I tried reinstalling gcc on my mac using brew. This is the message I got at the end of the installation. .. GCC has been built with multilib support. Notably, OpenMP may not work: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60670 If...

Adi program with openmp

c,openmp
This is my first post here so sorry if I make an easy/silly question. I have an assignment for my parallel programming class.I need some programs to be parallelized. So my problem is the following; I can't parallelize all sections of the program. If I parallelize 2 blocks of for,...

OMP parallel for reduction

c++,cluster-analysis,openmp
I'm trying to write a k-means clustering class. I want to make my function parallel. void kMeans::findNearestCluster() { short closest; int moves = 0; #pragma omp parallel for reduction(+:moves) for(int i = 0; i < n; i++) { float min_dist=FLT_MAX; for(int k=0; k < clusters; k++) { float dist_sum =...

Idea for beginner's OpenMP project [closed]

c++,parallel-processing,openmp
I have a parallel programming project that I have to do in C++ and openMP that's due in a week, and I was wondering if someone could give me an idea on something a beginner in both C++ and OpenMP can accomplish in this time. I've got pretty extensive experience...

Cannot compile with openmp

c++,compilation,openmp
omp.cpp #include <iostream> #include <omp.h> int main() { std::cout << "Start" << std::endl; #pragma omp parallel { std::cout << "Hello "; std::cout << "World! " << std::endl; } std::cout << "End" << std::endl; } I've tried to compile the above code with g++ omp.cpp -fopenmp but I get the error:...

OpenMP “for” in realtime audio processing

openmp
I'm trying to use OpenMP to get some performance for realtime audio processing. I took an algorithm looking like this: preparation for (int I=0; I<1024; I++) something quite demanding finalization When not parallelized, it took about 3% of CPU according to the system meter. Now, if I parallelized the main...

Reduction(op:var) has the same effect as shared(var)

c++,openmp,shared-memory,shared,reduction
I've tried this code snippet for reduction(op:var) proof of concept, it worked fine and gave a result = 656700 int i, n, chunk; float a[100], b[100], result; /* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i...

Boost.python and OMP

parallel-processing,openmp,boost-python
I can't figure out why the following code (chi2 distance) takes longer when compiled with OMP. Following this question I released the GIL but still no improvement whatsoever. np::ndarray additive_chi2_kernel(const np::ndarray& _h0, const np::ndarray& _h1) { auto dtype = np::dtype::get_builtin<float>(); auto h0 = _h0.astype(dtype); auto h1 = _h1.astype(dtype); enter code...

cython.parallel: variable assignment without thread-locality

python,multithreading,parallel-processing,openmp,cython
Using cython.parallel I am looking to assign a shared-memory variable value from the prange-threads without the implicit thread-locality. Or formulated more differently: how can I define a variable as openmp shared rather than private with cython.parallel? how can different threads or a prange block communicate? Some very simple (and useless)...

OpenMP shared variable seems to be private

c,parallel-processing,openmp
I don't understand why in this code only the thread 0 has n = 1 while the other ones have n = 0 with shared n: int main() { int n, tid; #pragma omp parallel shared(n) private(tid) { tid = omp_get_thread_num(); n = 0; if (tid == 0) { n++;...