Folks, I'm calling `LAPACKE_dptsv`

If all arguments are 1d arrays, does it matter if I call the data `LAPACK_ROW_MAJOR`

or `LAPACK_COL_MAJOR`

?

Answer:

The function `LAPACKE_dptsv()`

corresponds to the lapack function `dptsv()`

, which does not feature the switch between `LAPACK_ROW_MAJOR`

and `LAPACK_COL_MAJOR`

. `dptsv()`

is implemented for column-major ordering, corresponding to matrices in Fortran, while most of C matrices are row-major. So `LAPACKE_dptsv(LAPACK_ROW_MAJOR,...)`

performs the following steps :

- transpose the right-end side
`b`

- call
`dptsv()`

of Lapack - transpose the response (that is
`b`

again)

You can check this in the source of Lapacke, in `/lapacke/src/lapacke_dptsv_work.c`

.

There is one question left : ***does it largely affect the wall-clock time ? *** Looking at dpttrs_8f_source, **it is possible** : the decomposition `L*D*L**T`

is performed by a single for loop (+loop unrolling). A piece of code is therefore necessary to answer the question. The following code is compiled by `gcc main.c -o main -llapacke -llapack -lblas`

```
#include <stdio.h>
#include "lapacke.h"
#include <malloc.h>
#include <time.h>
int main ()
{
//double a[3][2] = {{1,0},{1,1},{1,2}};
double **outputArray;
int designs=3;
int i,j;
lapack_int info,n,ldb,nrhs;
n = 420000;
nrhs = 42;
//double outputArray[3][1] = {{6},{0},{0}};
double* ad=malloc(n*sizeof(double));
if(ad==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n;i++){
ad[i]=3;
}
double* ae=malloc((n-1)*sizeof(double));
if(ae==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n-1;i++){
ae[i]=-1;
}
double* b=malloc(n*nrhs* sizeof(double));
if(b==NULL){printf("malloc failed\n");exit(1);}
for(j=0;j<nrhs;j++){
for(i=0;i<n;i++){
b[i*nrhs+j]=i+2*j;
}
}
ldb=nrhs;
clock_t t;
t = clock();
info = LAPACKE_dptsv(LAPACK_ROW_MAJOR,n,nrhs,ad,ae,b,ldb);
if(info!=0){printf("failed, info %d\n",info);}
t = clock() - t;
printf ("LAPACK_ROW_MAJOR : %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
for(i=0;i<n;i++){
ad[i]=3;
}
if(ae==NULL){printf("malloc failed\n");exit(1);}
for(i=0;i<n-1;i++){
ae[i]=-1;
}
double* b2=malloc(n*nrhs* sizeof(double));
if(b2==NULL){printf("malloc failed\n");exit(1);}
for(j=0;j<nrhs;j++){
for(i=0;i<n;i++){
b2[j*n+i]=i+2*j;
}
}
t = clock();
ldb=n;
info = LAPACKE_dptsv(LAPACK_COL_MAJOR,n,nrhs,ad,ae,b2,ldb);
if(info!=0){printf("failed, info %d\n",info);}
t = clock() - t;
printf ("LAPACK_COL_MAJOR : %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
double delta=0,temp,deltal=0;
for(i=0;i<n;i++){
deltal=0;
for(j=0;j<nrhs;j++){
temp=(b[i*nrhs+j]-b2[j*n+i]);
deltal+=temp*temp;
}
delta+=deltal;
}
printf("delta %g\n",delta);
free(ad);
free(ae);
free(b);
free(b2);
return (info);
}
```

My output is :

LAPACK_ROW_MAJOR : 770000 clicks (0.770000 seconds).

LAPACK_COL_MAJOR : 310000 clicks (0.310000 seconds).

**So LAPACKE_dptsv() runs almost twice faster with LAPACK_COL_MAJOR with `nbrhs=42**

**If the number of rhs is reduced to one (and n larger)** :

LAPACK_ROW_MAJOR : 250000 clicks (0.250000 seconds).

LAPACK_COL_MAJOR : 180000 clicks (0.180000 seconds).

`LAPACK_COL_MAJOR`

and `LAPACK_ROW_MAJOR`

lead to approximately the same wall-clock time with a single RHS. And the output is the same.

I have not installed intel mkl on my pc and i am curious about the way it changes the conclusion of this test...

c++,lapack,equation-solving

At the beginning I would like to apologise for my English. Now, let's go to my problem. I try to write a simple code which will find a solution of a system of linear equations: Ax = b where A is a square matrix nxn. In this program I use...

matlab,mex,lapack

Summary I am trying to write some MEX files that use LAPACK, and I thought I'd start by looking through the examples provided by MathWorks. However, I am having some trouble compiling them. After copying the matrixDivide.c example from [MATLAB root]/extern/examples/refbook into a working directory, I tried to build a...

cmake,lapack,blas

I use blas/lapack in my C++ code built with CMake 2.8.9. I want to find BLAS and LAPACK libraries with the CMake commands : find_package(BLAS REQUIRED) find_package(LAPACK REQUIRED) But it can found it because the libraries are in a specific directory. The error is the following : CMake Error at...

c++,ifstream,lapack

I have a very simple piece of C++ code justread.cc reading numbers from a file. After the command [email protected]:~/Desktop/tests$ g++ -Wall -pedantic -o justread.x justread.cc && ./justread.x it compiles without any errors or warnings. This is the code: #include <fstream> #include <vector> int read_covariance () { std::vector<double> data; double tmp;...

linux,cloud,netlogo,hpc

I submit jobs using headless NetLogo to a HPC server by the following code: #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to...

c,linear-algebra,lapack,lapacke

I'm trying to solve some linear equation (which is symmetrical, tridiagonal and positive). I have to use LAPACKE. My code is as follows: #include <lapacke.h> #include <stdio.h> void print_mtrx(double * mtrx, int n, int m) { int i, j; for(i = 0; i < n; i++) { for(j = 0;...

performance,parallel-processing,julia-lang,hpc

I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes...

opencl,mpi,cluster-computing,hpc

In the lab we have an heterogeneous cluster setup, with many Intel CPUs, a few AMD CPUs and a couple of Nvidia GPUs. For HPC development, the one thing I know that I could write once and run everywhere on this setup is OpenCL (not even Java ;) ). But...

parallel-processing,fortran,mpi,hpc

I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results. Part of the code is like this: program heat_transfer_1D_parallel implicit none include 'mpif.h' integer myid,np,rc,ierror,status(MPI_STATUS_SIZE) integer :: N,N_loc,i,k,e...

lapack,hpc,scientific-computing,intel-mkl

Folks, I'm calling LAPACKE_dptsv If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?...

javascript,matlab,linear-algebra,lapack

According to MATLAB's documentation: [V,D] = eig(A,B) returns diagonal matrix D of generalized eigenvalues and full matrix V whose columns are the corresponding right eigenvectors, so that A*V = B*V*D. As I read available source code (It seems all implementations I've looked at Octave, R, Scipy) lead to LAPACK's DGGEV...

lapack

I installed lapack 3.4.2 and 3.5.0 by compiling them with cmake. In my installation dir of lapack 3.5.0 I find a file ./pkgconfig/lapack.pc saying Version: 3.4.2. So I'm not sure I really installed lapack 3.5.0 there. Is there a way to obtain the version of lapack directly form the lib...

c,lapack,lapacke

I am trying to solve a linear system using the following code: #include <stdio.h> #include <lapacke.h> int main () { lapack_complex_double mat[4]; lapack_complex_double vec[2]; lapack_int p[2]; mat[0] = lapack_make_complex_double(1,0); mat[1] = lapack_make_complex_double(1,0); mat[2] = lapack_make_complex_double(1,0); mat[3] = lapack_make_complex_double(-1,0); vec[0] = lapack_make_complex_double(1,0); vec[1] = lapack_make_complex_double(1,0); LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);...

mpi,openmpi,hpc

I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores. I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1. For instance, I can run my executable with 8 MPI ranks,...

fortran,lapack

Hey I am trying to get my LAPACK libraries to work and I have searched and searched but I can't seem to figure out what I am doing wrong. I try running my code, and I get the following error Program received signal SIGSEGV: Segmentation fault - invalid memory reference....

fortran,lapack

I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A': (input/output) COMPLEX array, dimension (LDA,N) On entry, the...

c++,parallel-processing,mpi,cluster-computing,hpc

I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish. Version 1 with MPI_Request bookkeeping -- this works: MPI_Request requests[]; MPI_Iallreduce(..., requests[0]); ... MPI_Iallreduce(...,...

c++,lapack,blas,absolute-value,argmax

Why is there a blas subroutine ISAMAX for argmax abs but not for argmax ? In C++ using std::max_element with compiler optimisation flag -O3 I am getting speeds comparable to blas_isamax (16 ms vs 9 ms), so at the moment my question is more out of interest than out of...

c,gcc,lapack

I have two programmes using the LAPACK routine dgeev in C. One appears to be working, the other does not compile claiming undefined reference to dgeev. I seek to understand why. The first code below - called mamapack.c - produces sensible results when compiled and run like this: [email protected]:~/Desktop/tests$ gcc...

c++,vector,parallel-processing,mpi,hpc

I am trying to send std:vector using MPI send and recv functions but I have reached no where. I get errors like Fatal error in MPI_Recv: Invalid buffer pointer, error stack: MPI_Recv(186): MPI_Recv(buf=(nil), count=2, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x7fff9e5e0c80) failed MPI_Recv(124): Null buffer pointer I tried multiple combinations A) like...

netlogo,hpc

$ wget https://ccl.northwestern.edu/netlogo/5.1.0/netlogo-5.1.0.tar.gz $ tar -xzf netlogo-5.1.0.tar.gz $ ~/netlogo-5.1.0/netlogo-headless.sh \ --model ~/myproject/MyModel.nlogo \ --experiment MyExperiment \ --table ~/myproject/MyNewOutputData.csv Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take...

c++,boost,lapack

I've done my best recently to set up boost's numeric bindings to allow me to use LAPACK from C++, but I've run into some roadblocks. First off, I have confirmed that boost is working fine, so it's something to do with my LAPACK libraries or the boost numeric bindings. Here's...

c,lapack,eigenvalue,eigenvector

I am trying for the first time to use LAPACK from C to diagonalize a matrix and I am stuck. I have been trying to modify this example http://rcabreral.blogspot.co.uk/2010/05/eigenvalues-clapack.html from zgeev to dgeev. I have looked at the DGEEV input parameters, http://www.netlib.org/lapack/explore-html/d9/d28/dgeev_8f.html but it seems I don't understand the well...

python,python-2.7,hpc,sungridengine,drmaa

When I try to submit a job using the python drmaa wrapper, I get a DeniedByDrmException: code 17: job rejected: positive submission priority requires operator privileges. How do I change the priority of jobs that I submit using the Python DRMAA wrapper?...

multithreading,fortran,lapack,blas

I am a novice using the LAPACK routines, so I don't deeply know them, and I want to use them in parallelized loops (openmp). I use Ubuntu 14.04LTS and have LAPACK installed using my package manager. The version installed is: liblapack3 3.5.0-2ubuntu1 Library of linear algebra routines 3 - shared...

c++,matrix,vector,lapack,blas

Using Lapack with C++ is giving me a small headache. I found the functions defined for fortran a bit eccentric, so I tried to make a few functions on C++ to make it easier for me to read what's going on. Anyway, I'm not getting th matrix-vector product working as...

hpc,torque

I want to change the attributes (walltime and queue) of several (idle) jobs scheduled on a cluster. When I do (where 1234 is the job id): qalter -l walltime=24:00:00 -q newQueue 1234 I get the following error: qalter: illegally formed job identifier: newQueue What can I do?...

bash,batch-processing,hpc,torque

I have several (idle) jobs scheduled on a cluster that I want to move to another queue. I can move a single job like this (where 1234 is the job id): qmove newQueue 1234 But now I have hundreds of jobs that I want to move to newQueue. Is it...