FAQ Database Discussion Community


17653 Segmentation fault (core dumped)

pointers,malloc,openmp,double-pointer
I am trying implement a matrix multiplication with dynamic memory allocation with OpenMP. I manage to get my program to compile fine but when i am trying to execute it i am getting ./ line 14: 17653 Segmentation fault (core dumped) ./matrix.exe $matrix_size int main(int argc, char *argv[]){ if(argc <...

Reason to use declare target pragma in OpenMP

openmp,offloading
I wonder what is the reason to use the declare target directive. I can simply use target {, data} map (to/from/tofrom ...) in order to specify which variables should be used by the device. As for the functions, is that compulsory for a function called from a target region to...

Why might the “fatal error C1001” error occur intermittently when using openmp?

c++,visual-studio-2010,boost,openmp
My code works well without #openmp but I got this error when I added #openmp compiler 1>c:\users\hdd amd ali\documents\v studio 10 projects\visual studio 2010\projects\escaledesvols2 - copy\escaledesvols2\djikstra.cpp(116): fatal error C1001: An internal error occurred in the compiler. 1> (compiler file 'f:\dd\vctools\compiler\utc\src\p2\wvm\mdmiscw.c', ligne 1098) note: i use many different libraries (like #boost) #include...

Cannot compile with openmp

c++,compilation,openmp
omp.cpp #include <iostream> #include <omp.h> int main() { std::cout << "Start" << std::endl; #pragma omp parallel { std::cout << "Hello "; std::cout << "World! " << std::endl; } std::cout << "End" << std::endl; } I've tried to compile the above code with g++ omp.cpp -fopenmp but I get the error:...

Simple speed up of C++ OpenMP kernel

c++,opencv,openmp
I have never worked with OpenMP or optimization of C++, so all help is welcome. I'm probably doing some very stupid things that slow down the process drastically. It doesn't need to be the fastest, but I think some easy tricks will significantly speed it up. Anyone? Thanks a lot!...

installing Rcpp on R compiled with intel composer on OSX Yosemite

r,clang,openmp,rcpp,intel-composer
Inspite of succeeding with the compilation of R-3.1.2 using the intel suite of compilers ver.2015.0.077 including MKL on my late 2014 MacBook Pro running Yosemite (outlined here), I am unable to install the excellent Rcpp package that I have been thoroughly enjoying thus far via the prepackaged binary R for...

Intersection of sorted vectors

c++,openmp,simd
I know that intersection between two sorted vectors or sets can be performed using std::set_intersection(). Is it possible to perform the same set intersection using openMP 4.0 SIMD. I need to perform set intersection between two sorted vectors many times in my code, so c++ set_intersection() turns out to be...

Idea for beginner's OpenMP project [closed]

c++,parallel-processing,openmp
I have a parallel programming project that I have to do in C++ and openMP that's due in a week, and I was wondering if someone could give me an idea on something a beginner in both C++ and OpenMP can accomplish in this time. I've got pretty extensive experience...

OpenMP specify thread number of a for loop iteration

c++,multithreading,parallel-processing,openmp
I'm using the following command to parallelize a single loop over the available threads of the program: #pragma omp parallel for num_threads(threads) for(long i = 0; i < threads; i++) { array[i] = calculateStuff(i,...); } For technical reasons, I would like to guarantee that thread number 0 executes i=0, and...

Boost.python and OMP

parallel-processing,openmp,boost-python
I can't figure out why the following code (chi2 distance) takes longer when compiled with OMP. Following this question I released the GIL but still no improvement whatsoever. np::ndarray additive_chi2_kernel(const np::ndarray& _h0, const np::ndarray& _h1) { auto dtype = np::dtype::get_builtin<float>(); auto h0 = _h0.astype(dtype); auto h1 = _h1.astype(dtype); enter code...

openMP slows down when passing from 2 to 4 threads doing binary searches in a custom container

c++,multithreading,openmp,sparse-matrix,slowdown
I'm currently having a problem parallelizing a program in c++ using openMP. I am implementing a recommendation system with a user-based collaborative filtering method. To do that, I implemented a sparse_matrix class as a dictionary of dictionaries (where I mean a sort of python dictionary). In my case, since insertion...

OpenMP - Parallel code give different result from sequential one

openmp
I got some problem on openmp. I've written some computational codes and parallize the code using openmp. But sequential and parallel gave me different result. Here is the code for(i=0; i<grid_number; i++) { double norm = 0; const double alpha = gsl_vector_get(valpha, i); for(j=0; j<n_sim; j++) { gsl_matrix_complex *sub_data =...

Labeling data for Bag Of Words

c++,opencv,openmp,pragma,labeling
I've been looking at this tutorial and the labeling part confuses me. Not the act of labeling itself, but the way the process is shown in the tutorial. More specifically the #pragma omp sections: #pragma omp parallel for schedule(dynamic,3) for(..loop a directory?..) { ... #pragma omp critical { if(classes_training_data.count(class_) ==...

Adi program with openmp

c,openmp
This is my first post here so sorry if I make an easy/silly question. I have an assignment for my parallel programming class.I need some programs to be parallelized. So my problem is the following; I can't parallelize all sections of the program. If I parallelize 2 blocks of for,...

Use of if clause in OpenMP

synchronization,task,openmp
Can't figure out the use of the if (0) clause in the following code as there also exists the #pragma omp single clause. Any ideas? ...

How to calculate how many times each thread executed a critical section in OpenMP?

c,multithreading,openmp
I have an OpenMP code, where I need to calculate how many times each thread executes the critical section, any idea how to do it? Code samples are highly welcomed.

I need help to parallelize this code using OpenMP

c,parallel-processing,openmp
I wrote a C code that I would like to parallelize using OpenMP (I am a beginner and I have just a few days to solve this task); let's start from the main: first of all I have initialized 6 vectors (Vx,Vy,Vz,thetap,phip,theta); then there is a for loop that cycles...

Reduction(op:var) has the same effect as shared(var)

c++,openmp,shared-memory,shared,reduction
I've tried this code snippet for reduction(op:var) proof of concept, it worked fine and gave a result = 656700 int i, n, chunk; float a[100], b[100], result; /* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i...

Performance problems using OpenMP in nested loops

c++,multithreading,openmp
I'm using the following code, which contains an OpenMP parallel for loop nested in another for-loop. Somehow the performance of this code is 4 Times slower than the sequential version (omitting #pragma omp parallel for). Is it possible that OpenMp has to create Threads every time the method is called?...

windows - visual studio 2013 : OpenMP: omp_set_num_threads() not working

c++,openmp
I want to run this program : #include <iostream> #include <omp.h> using namespace std; int main() { int numThread, myId; cout << "num_procs=" << omp_get_num_procs(); omp_set_num_threads(omp_get_num_procs()); #pragma omp parallel { cout << "\nid=" << omp_get_thread_num(); numThread = omp_get_num_threads(); cout << "\nmax-thread=" << omp_get_max_threads(); } getchar(); } The result is: num_procs=4...

OpenMP Matrix-Vector Multiplication Executes on Only One Thread

c++,multithreading,parallel-processing,openmp,mex
I have this code (outlined below) for parallelizing matrix-vector multiplication. But whenever I run it, I discover that it is executing on just one thread (even though I specified 4). How can I separate parts of the parallel code to run on separate threads. Any help will be highly appreciated....

different OpenMP output in different machine

openmp
When I m trying to run the following code in my system centos running virtually i am getting right output but when i am trying to run the same code on compact supercomputer "Param Shavak" I am getting incorrect output.... :( #include<stdio.h> #include<omp.h> int main() { int p=1,s=1,ti #pragma omp...

Compile OpenMP programs with gcc compiler on OS X Yosemite

c++,c,xcode,gcc,openmp
$ gcc 12.c -fopenmp 12.c:9:9: fatal error: 'omp.h' file not found #include<omp.h> ^ 1 error generated. While compiling openMP programs I get the above error. I am using OS X Yosemite. I first tried by installing native gcc compiler by typing gcc in terminal and later downloaded Xcode too still...

OMP parallel for reduction

c++,cluster-analysis,openmp
I'm trying to write a k-means clustering class. I want to make my function parallel. void kMeans::findNearestCluster() { short closest; int moves = 0; #pragma omp parallel for reduction(+:moves) for(int i = 0; i < n; i++) { float min_dist=FLT_MAX; for(int k=0; k < clusters; k++) { float dist_sum =...

Using atomic operation in OpenMP for struct (x,y,z) variable

c++,struct,openmp,atomic
I am developing an OpenMP code in C++ (the compiler is g++ 4.8.2). In a part of my code I need to perform an add atomically on a struct data. The strcut is defined as: struct real3 { float x; float y; float z; }; and I defined addition operator...

incomprehensible performance improvement with openmp even when num_threads(1)

c++,openmp
The following lines of code int nrows = 4096; int ncols = 4096; size_t numel = nrows * ncols; unsigned char *buff = (unsigned char *) malloc( numel ); unsigned char *pbuff = buff; #pragma omp parallel for schedule(static), firstprivate(pbuff, nrows, ncols), num_threads(1) for (int i=0; i<nrows; i++) { for...

Visual Studio 2013 OMP release mode

c++,visual-studio-2013,openmp
I'm trying to use OpenMP in Visual Studio 2013. It's working very well in Debug Mode and there is a huge performance boost, however when I switch to Release Mode I get worst results with OpenMP activated. Printing thread number will give always 0 in Release Mode. printf("%d\n", omp_get_thread_num()); So...

OpenMP SIMD on Power8

openmp,vectorization,simd,powerpc
I'm wondering whether there is any compiler (gcc, xlc, etc.) on Power8 that supports OpenMP SIMD constructs on Power8? I tried with XL (13.1) but I couldn't compile successfully. Probably it doesn't support simd construct yet. I could compile with gcc 4.9.1 (with these flags -fopenmp -fopenmp-simd and -O1). I...

How to disable omp in Torch nn package?

lua,openmp,torch
Specifically I would like nn.LogSoftMax to not use omp when the size of the input tensor is small. I have a small script to test the run time. require 'nn' my_lsm = function(t) o = torch.zeros((#t)[1]) sum = 0.0 for i = 1,(#t)[1] do o[i] = torch.exp(t[i]) sum = sum...

openMP reduction and thread number control

openmp
I use OpenMP as: #pragma omp parallel for reduction(+:average_stroke_width) for(int i = 0; i < GB_name.size(); ++i) {...} I know I can use : #pragma omp parallel for num_threads(thread) for(int index = 0; index < GB_name.size(); ++index){...} How can I control the thread number when I use reduction? ...

openMp : parallelize std::map iteration

c++,openmp,stdmap
There are some posts about this issue but none of them satisfies me. I don't have openMp 3.0 support and I need to parallelize a iteration over a map. I want to know if this solution would work or not : auto element = myMap.begin(); #pragma omp parallel for shared(element)...

Why do my runtime images taken with eztrace not show the idleness of threads?

c,multithreading,openmp,trace
I was doing a college work parallelizing a code in C with OpenMP and then getting a runtime image with eztrace, converting and displaying on vite. But it's not showing the idle time on threads. My code obviously has an idle thanks to the use of static clause int prime_v2(int...

OpenMP “for” in realtime audio processing

openmp
I'm trying to use OpenMP to get some performance for realtime audio processing. I took an algorithm looking like this: preparation for (int I=0; I<1024; I++) something quite demanding finalization When not parallelized, it took about 3% of CPU according to the system meter. Now, if I parallelized the main...

What does gcc without multilib mean?

osx,gcc,g++,openmp
I was trying to use omh.h header file and I realized it was missing. I tried reinstalling gcc on my mac using brew. This is the message I got at the end of the installation. .. GCC has been built with multilib support. Notably, OpenMP may not work: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60670 If...

Does OMP Pragmas nesting have significance?

openmp,pragma
I'm looking at some code like below (in a reviewer/auditor capacity). The nesting shown below was created with TABS in the source code. #pragma omp parallel #pragma omp sections { #pragma omp section p2 = ModularExponentiation((a % p), dp, p); #pragma omp section q2 = ModularExponentiation((a % q), dq, q);...

What happens if one OpenMP thread crashes?

multithreading,parallel-processing,openmp
Consider the following case of a parallel for/do-loop: PARALLEL DO thread 1 thread 2 line 1 line 1 line k -> line k -> line l line l line n line n Now, thread 1 encounters an exception or an error (segmentation fault) on line l and terminates. What will...

Is dynamic scheduling better or static scheduling (Parallel Programming)?

multithreading,parallel-processing,openmp,scheduling
I understand my question title is rather broad, I am new to parallel programming and openmp. I tried to parallelize a C++ solution for the N-body problem and study it for different schedule types and granularity. I collected data by running program for different cases and plotted the data, this...

OpenMP shared variable seems to be private

c,parallel-processing,openmp
I don't understand why in this code only the thread 0 has n = 1 while the other ones have n = 0 with shared n: int main() { int n, tid; #pragma omp parallel shared(n) private(tid) { tid = omp_get_thread_num(); n = 0; if (tid == 0) { n++;...

The time of execution doesn't change whether I increase the number of threads or not

openmp,execution-time
I am executing the following code snippet as explained in the openMP tutorial. But what I see is the time of execution doesn't change with NUM_THREADS, infact, the time of execution just keeps changing a lot..I am wondering if the way I am trying to measure the time is wrong....

Segmentation fault in openMP program with SSE instructions with threads > 4

c++,multithreading,segmentation-fault,openmp,sse
I wrote a simple C++ openMP program that uses SSE instructions, and I am facing a segmentation fault when the number of threads is bigger than 4. I am using g++ on Linux. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> #include <emmintrin.h> #include <assert.h> #include <stdint.h> #include <omp.h> unsigned...

Parallel for loop for addition of local matrices in OpenMP

matrix,parallel-processing,openmp
I have n local copies of matrices,say 'local', in n threads. I want to update a global shared matrix 's' with its elements being sum of corresponding elements of all local matrices. For eg. s[0][0] = local_1[0][0] + local_2[0][0]+...+local_n[0][0]. I wrote the following loop to achieve it - #pragma omp...

How to run two set of code in parallel using openmp in c++

c++,multithreading,parallel-processing,openmp
I have two function which are not related to each other for example: int add(int num) { int sum=0; for(i=0;i<num;++i) sum+=i; return sum; } int mul(int num) { int mul=1; for(int i=1;i<num;++i) mul * i; return mul; } and I am suing them as follow: auto x=add(100); auto m=mul(200); cout<<a<<...

OpenMP over Summation

parallel-processing,fortran,openmp,fortran90,gfortran
I have been trying to apply OpenMP on a simple summation operation inside two nested loop, but it produced incorrect result so far. I have been looking around in here and here, also in here. All suggest to use reduction clause, but it does not work for my case by...

Understanding the collapse clause in openmp

openmp
I came across an OpenMP code that had the collapse clause, which was new to me. I'm trying to understand what it means, but I don't think I have fully grasped it's implications; One definition that I found is: COLLAPSE: Specifies how many loops in a nested loop should be...

Compilation error using FindCUDA.cmake and Thrust with THRUST_DEVICE_SYSTEM_OMP

cuda,cmake,openmp,thrust
I recently discovered that Thrust was able to handle automatic OMP and TBB parallelisation in addition to its classic cuda capability. Although I was able to use this extremely verstile feature on a simple example, my cmake configuration generated compilation error, maybe I am using FindCUDA.cmake the wrong way, or...

cython.parallel: variable assignment without thread-locality

python,multithreading,parallel-processing,openmp,cython
Using cython.parallel I am looking to assign a shared-memory variable value from the prange-threads without the implicit thread-locality. Or formulated more differently: how can I define a variable as openmp shared rather than private with cython.parallel? how can different threads or a prange block communicate? Some very simple (and useless)...

OpenMP: is there a timeout for a parallel section?

c++,parallel-processing,timeout,scheduled-tasks,openmp
I'm having a problem here with OpenMP. There are two functions that shall be executed in parallel. In foo() there's a loop that shall be interrupted with stop. And as you can see it is assigned in the the other OMP-section. The code is: char stop; #pragma omp parallel {...

OMP For parallel thread ID hello world

c,multithreading,for-loop,openmp,parallel-for
I'm trying to get started with using basic OpenMP functionality in C. My basic understanding of 'omp parallel for' leads me to believe the following should distribute the following iterations of the loop between threads and should execute concurrently. The output I am getting is as follows. Code Below. Is...

C++ OpenMP object counter incorrect counts with std::vector of objects

c++,multithreading,openmp
I need a threadsafe counter for the number of current objects of type Apple. I have tried to make a simple one with OpenMP, but I don't understand why the counting is incorrect. Here is a simplification of the class, with actual test code and actual output: Class class Apple...

c++ & OpenMP : undefined reference to GOMP_loop_dynamic_start

c++,openmp
I'm stuck in the following problem : at first I compile the following file cancme.cpp : void funct() { int i,j,k,N; double s; #pragma omp parallel for default(none) schedule(dynamic,10) private(i,k,s) shared(j,N) for(i=j+1;i<N;i++) {} } by: mingw32-g++.exe -O3 -std=c++11 -mavx -fopenmp -c C:\pathtofile\cancme.cpp -o C:\pathtofile\cancme.o Next I build a second file,...

Performance issue of OpenMP code called from a pthread

c++,openmp
I try to perform some computation asynchronously from an I/O bound operation. To do that I have used a pthread in which a loop is parallelized using OpenMP. However, this results in a performance degradation compared to the case where I perform the I/O bound operation in a pthread or...

ANT doesnt terminate openmp executable (C++)

c++,linux,ant,openmp,icc
When I'm starting an executable (OpenMP, C++, icc) in an ANT exec-task, the task does not terminate. After looking in the processes, I discovered that my process was died (defunct). The executable writes output and it seems quite properly. There is no problem without using OpenMP. There is also no...

Disabling OpenMP when Profiling Enabled

c,macros,profiling,openmp
When profiling my C code, I would like to disable/reduce the number of OMP threads to 1. After a brief search, I found this question. I would therefore decided to do something like #ifdef foo #define omp_get_thread_num() 0 #endif where foo is a macro that is true if the -pg...

Padding array manually

c,performance,openmp,xeon-phi
I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!): #define PAD64 0 #define...

Why OpenMP 'simd' has better performance than 'parallel for simd'?

c++,performance,concurrency,openmp
I'm working on a Intel E5 (6 cores, 12 threads) with intel compiler OpenMP 4.0 Why is this piece of code SIMD-ed quicker than parallel SIMD-ed? for (int suppv = 0; suppv < sSize; suppv++) { Value *gptr = &grid[gind]; const Value * cptr = &C[cind]; #pragma omp simd //...

core dumped using lock in openMP

parallel-processing,locking,openmp
I want to parallelize function S and lock every node but I keep getting core dump. I'm trying to use a lock in every node of the graph. It will work if I use a single lock on my nodes. for (l = 0; l < n; l++) omp_init_lock(&(lock[l])); #pragma...

OpenMP Dot Product and Pointers

c,pointers,for-loop,openmp,reduction
I'm trying to implement dotproduct in OpenMP with large arrays allocated with malloc. However, when I use reduction(+:result) it produces different results for each program run. Why do I get different results? How can I remedy that? And how can this example be optimized? Here's my code: #include <stdlib.h> #include...

What preprocessor define does -fopenmp provide?

c,openmp,c-preprocessor
I've got some code that can run with (or without) OpenMP - it depends on how the user sets up the makefile. If they want to run with OpenMP, then they just add -fopenmp to CFLAGS and CXXFLAGS. I'm trying to determine what preprocessor macro I can use to tell...

Parallel for loop with reduction and manipulating arrays

c,for-loop,openmp,pragma
I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below : #pragma omp parallel for default(shared) reduction(+...) for(i =...

Hybrid OpenMP+MPI : I need an explanation from this example

c,mpi,openmp,hybrid
I found this example on internet, but I can't understand what exactly is sent from the master node if it's A[5] for example what will be sent to other slaves? The 5th row or all elements till 5th row or all elements from 5th row and so on??? #include #include...

f2py with OMP: can't import module, undefined symbol GOMP_*

python,numpy,fortran,openmp,f2py
I was hoping to use openmp to speed up my Fortran code that I run through f2py. However, after compiling succesfully, I can't import the module in Python. For a Fortran95 module like this: module test implicit none contains subroutine readygo() real(kind = 8), dimension(10000) :: q !$OMP WORKSHARE q...