Folks, I'm calling `LAPACKE_dptsv`

If all arguments are 1d arrays, does it matter if I call the data `LAPACK_ROW_MAJOR`

or `LAPACK_COL_MAJOR`

?

Answer:

The function `LAPACKE_dptsv()`

corresponds to the lapack function `dptsv()`

, which does not feature the switch between `LAPACK_ROW_MAJOR`

and `LAPACK_COL_MAJOR`

. `dptsv()`

is implemented for column-major ordering, corresponding to matrices in Fortran, while most of C matrices are row-major. So `LAPACKE_dptsv(LAPACK_ROW_MAJOR,...)`

performs the following steps :

- transpose the right-end side
`b`

- call
`dptsv()`

of Lapack - transpose the response (that is
`b`

again)

You can check this in the source of Lapacke, in `/lapacke/src/lapacke_dptsv_work.c`

.

There is one question left : ***does it largely affect the wall-clock time ? *** Looking at dpttrs_8f_source, **it is possible** : the decomposition `L*D*L**T`

is performed by a single for loop (+loop unrolling). A piece of code is therefore necessary to answer the question. The following code is compiled by `gcc main.c -o main -llapacke -llapack -lblas`

```
#include <stdio.h>
#include "lapacke.h"
#include <malloc.h>
#include <time.h>
int main ()
{
//double a[3][2] = {{1,0},{1,1},{1,2}};
double **outputArray;
int designs=3;
int i,j;
lapack_int info,n,ldb,nrhs;
n = 420000;
nrhs = 42;
//double outputArray[3][1] = {{6},{0},{0}};
double* ad=malloc(n*sizeof(double));
if(ad==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n;i++){
ad[i]=3;
}
double* ae=malloc((n-1)*sizeof(double));
if(ae==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n-1;i++){
ae[i]=-1;
}
double* b=malloc(n*nrhs* sizeof(double));
if(b==NULL){printf("malloc failed\n");exit(1);}
for(j=0;j<nrhs;j++){
for(i=0;i<n;i++){
b[i*nrhs+j]=i+2*j;
}
}
ldb=nrhs;
clock_t t;
t = clock();
info = LAPACKE_dptsv(LAPACK_ROW_MAJOR,n,nrhs,ad,ae,b,ldb);
if(info!=0){printf("failed, info %d\n",info);}
t = clock() - t;
printf ("LAPACK_ROW_MAJOR : %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
for(i=0;i<n;i++){
ad[i]=3;
}
if(ae==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n-1;i++){
ae[i]=-1;
}
double* b2=malloc(n*nrhs* sizeof(double));
if(b2==NULL){printf("malloc failed\n");exit(1);}
for(j=0;j<nrhs;j++){
for(i=0;i<n;i++){
b2[j*n+i]=i+2*j;
}
}
t = clock();
ldb=n;
info = LAPACKE_dptsv(LAPACK_COL_MAJOR,n,nrhs,ad,ae,b2,ldb);
if(info!=0){printf("failed, info %d\n",info);}
t = clock() - t;
printf ("LAPACK_COL_MAJOR : %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
double delta=0,temp,deltal=0;
for(i=0;i<n;i++){
deltal=0;
for(j=0;j<nrhs;j++){
temp=(b[i*nrhs+j]-b2[j*n+i]);
deltal+=temp*temp;
}
delta+=deltal;
}
printf("delta %g\n",delta);
free(ad);
free(ae);
free(b);
free(b2);
return (info);
}
```

My output is :

LAPACK_ROW_MAJOR : 770000 clicks (0.770000 seconds).

LAPACK_COL_MAJOR : 310000 clicks (0.310000 seconds).

**So LAPACKE_dptsv() runs almost twice faster with LAPACK_COL_MAJOR with `nbrhs=42**

**If the number of rhs is reduced to one (and n larger)** :

LAPACK_ROW_MAJOR : 250000 clicks (0.250000 seconds).

LAPACK_COL_MAJOR : 180000 clicks (0.180000 seconds).

`LAPACK_COL_MAJOR`

and `LAPACK_ROW_MAJOR`

lead to approximately the same wall-clock time with a single RHS. And the output is the same.

I have not installed intel mkl on my pc and i am curious about the way it changes the conclusion of this test...

linux,cloud,netlogo,hpc

I submit jobs using headless NetLogo to a HPC server by the following code: #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to...

c++,vector,parallel-processing,mpi,hpc

I am trying to send std:vector using MPI send and recv functions but I have reached no where. I get errors like Fatal error in MPI_Recv: Invalid buffer pointer, error stack: MPI_Recv(186): MPI_Recv(buf=(nil), count=2, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x7fff9e5e0c80) failed MPI_Recv(124): Null buffer pointer I tried multiple combinations A) like...

c,lapack,eigenvalue,eigenvector

I am trying for the first time to use LAPACK from C to diagonalize a matrix and I am stuck. I have been trying to modify this example http://rcabreral.blogspot.co.uk/2010/05/eigenvalues-clapack.html from zgeev to dgeev. I have looked at the DGEEV input parameters, http://www.netlib.org/lapack/explore-html/d9/d28/dgeev_8f.html but it seems I don't understand the well...

bash,batch-processing,hpc,torque

I have several (idle) jobs scheduled on a cluster that I want to move to another queue. I can move a single job like this (where 1234 is the job id): qmove newQueue 1234 But now I have hundreds of jobs that I want to move to newQueue. Is it...

python,python-2.7,hpc,sungridengine,drmaa

When I try to submit a job using the python drmaa wrapper, I get a DeniedByDrmException: code 17: job rejected: positive submission priority requires operator privileges. How do I change the priority of jobs that I submit using the Python DRMAA wrapper?...

c++,lapack,equation-solving

At the beginning I would like to apologise for my English. Now, let's go to my problem. I try to write a simple code which will find a solution of a system of linear equations: Ax = b where A is a square matrix nxn. In this program I use...

netlogo,hpc

$ wget https://ccl.northwestern.edu/netlogo/5.1.0/netlogo-5.1.0.tar.gz $ tar -xzf netlogo-5.1.0.tar.gz $ ~/netlogo-5.1.0/netlogo-headless.sh \ --model ~/myproject/MyModel.nlogo \ --experiment MyExperiment \ --table ~/myproject/MyNewOutputData.csv Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take...

mpi,openmpi,hpc

I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores. I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1. For instance, I can run my executable with 8 MPI ranks,...

lapack

I installed lapack 3.4.2 and 3.5.0 by compiling them with cmake. In my installation dir of lapack 3.5.0 I find a file ./pkgconfig/lapack.pc saying Version: 3.4.2. So I'm not sure I really installed lapack 3.5.0 there. Is there a way to obtain the version of lapack directly form the lib...

lapack,hpc,scientific-computing,intel-mkl

Folks, I'm calling LAPACKE_dptsv If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?...

matlab,mex,lapack

Summary I am trying to write some MEX files that use LAPACK, and I thought I'd start by looking through the examples provided by MathWorks. However, I am having some trouble compiling them. After copying the matrixDivide.c example from [MATLAB root]/extern/examples/refbook into a working directory, I tried to build a...

c++,ifstream,lapack

I have a very simple piece of C++ code justread.cc reading numbers from a file. After the command [email protected]:~/Desktop/tests$ g++ -Wall -pedantic -o justread.x justread.cc && ./justread.x it compiles without any errors or warnings. This is the code: #include <fstream> #include <vector> int read_covariance () { std::vector<double> data; double tmp;...

c++,matrix,vector,lapack,blas

Using Lapack with C++ is giving me a small headache. I found the functions defined for fortran a bit eccentric, so I tried to make a few functions on C++ to make it easier for me to read what's going on. Anyway, I'm not getting th matrix-vector product working as...

fortran,lapack

Hey I am trying to get my LAPACK libraries to work and I have searched and searched but I can't seem to figure out what I am doing wrong. I try running my code, and I get the following error Program received signal SIGSEGV: Segmentation fault - invalid memory reference....

performance,parallel-processing,julia-lang,hpc

I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes...

c,gcc,lapack

I have two programmes using the LAPACK routine dgeev in C. One appears to be working, the other does not compile claiming undefined reference to dgeev. I seek to understand why. The first code below - called mamapack.c - produces sensible results when compiled and run like this: [email protected]:~/Desktop/tests$ gcc...

parallel-processing,fortran,mpi,hpc

I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results. Part of the code is like this: program heat_transfer_1D_parallel implicit none include 'mpif.h' integer myid,np,rc,ierror,status(MPI_STATUS_SIZE) integer :: N,N_loc,i,k,e...

c,linear-algebra,lapack,lapacke

I'm trying to solve some linear equation (which is symmetrical, tridiagonal and positive). I have to use LAPACKE. My code is as follows: #include <lapacke.h> #include <stdio.h> void print_mtrx(double * mtrx, int n, int m) { int i, j; for(i = 0; i < n; i++) { for(j = 0;...

cmake,lapack,blas

I use blas/lapack in my C++ code built with CMake 2.8.9. I want to find BLAS and LAPACK libraries with the CMake commands : find_package(BLAS REQUIRED) find_package(LAPACK REQUIRED) But it can found it because the libraries are in a specific directory. The error is the following : CMake Error at...

c++,parallel-processing,mpi,cluster-computing,hpc

I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish. Version 1 with MPI_Request bookkeeping -- this works: MPI_Request requests[]; MPI_Iallreduce(..., requests[0]); ... MPI_Iallreduce(...,...

multithreading,fortran,lapack,blas

I am a novice using the LAPACK routines, so I don't deeply know them, and I want to use them in parallelized loops (openmp). I use Ubuntu 14.04LTS and have LAPACK installed using my package manager. The version installed is: liblapack3 3.5.0-2ubuntu1 Library of linear algebra routines 3 - shared...

c++,lapack,blas,absolute-value,argmax

Why is there a blas subroutine ISAMAX for argmax abs but not for argmax ? In C++ using std::max_element with compiler optimisation flag -O3 I am getting speeds comparable to blas_isamax (16 ms vs 9 ms), so at the moment my question is more out of interest than out of...

c,lapack,lapacke

I am trying to solve a linear system using the following code: #include <stdio.h> #include <lapacke.h> int main () { lapack_complex_double mat[4]; lapack_complex_double vec[2]; lapack_int p[2]; mat[0] = lapack_make_complex_double(1,0); mat[1] = lapack_make_complex_double(1,0); mat[2] = lapack_make_complex_double(1,0); mat[3] = lapack_make_complex_double(-1,0); vec[0] = lapack_make_complex_double(1,0); vec[1] = lapack_make_complex_double(1,0); LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);...

c++,boost,lapack

I've done my best recently to set up boost's numeric bindings to allow me to use LAPACK from C++, but I've run into some roadblocks. First off, I have confirmed that boost is working fine, so it's something to do with my LAPACK libraries or the boost numeric bindings. Here's...

opencl,mpi,cluster-computing,hpc

In the lab we have an heterogeneous cluster setup, with many Intel CPUs, a few AMD CPUs and a couple of Nvidia GPUs. For HPC development, the one thing I know that I could write once and run everywhere on this setup is OpenCL (not even Java ;) ). But...

javascript,matlab,linear-algebra,lapack

According to MATLAB's documentation: [V,D] = eig(A,B) returns diagonal matrix D of generalized eigenvalues and full matrix V whose columns are the corresponding right eigenvectors, so that A*V = B*V*D. As I read available source code (It seems all implementations I've looked at Octave, R, Scipy) lead to LAPACK's DGGEV...

fortran,lapack

I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A': (input/output) COMPLEX array, dimension (LDA,N) On entry, the...

hpc,torque

I want to change the attributes (walltime and queue) of several (idle) jobs scheduled on a cluster. When I do (where 1234 is the job id): qalter -l walltime=24:00:00 -q newQueue 1234 I get the following error: qalter: illegally formed job identifier: newQueue What can I do?...