lapack,hpc,scientific-computing,intel-mkl , LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors


LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors

Question:

Tag: lapack,hpc,scientific-computing,intel-mkl

Folks, I'm calling LAPACKE_dptsv

If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?


Answer:

The function LAPACKE_dptsv() corresponds to the lapack function dptsv(), which does not feature the switch between LAPACK_ROW_MAJOR and LAPACK_COL_MAJOR. dptsv() is implemented for column-major ordering, corresponding to matrices in Fortran, while most of C matrices are row-major. So LAPACKE_dptsv(LAPACK_ROW_MAJOR,...) performs the following steps :

You can check this in the source of Lapacke, in /lapacke/src/lapacke_dptsv_work.c.

There is one question left : *does it largely affect the wall-clock time ? * Looking at dpttrs_8f_source, it is possible : the decomposition L*D*L**T is performed by a single for loop (+loop unrolling). A piece of code is therefore necessary to answer the question. The following code is compiled by gcc main.c -o main -llapacke -llapack -lblas

#include <stdio.h>
#include "lapacke.h"
#include <malloc.h>
#include <time.h>

int main ()
{
    //double a[3][2] = {{1,0},{1,1},{1,2}};

    double **outputArray;
    int designs=3;
    int i,j;
    lapack_int info,n,ldb,nrhs;

    n = 420000;
    nrhs = 42;

    //double outputArray[3][1] = {{6},{0},{0}};
    double* ad=malloc(n*sizeof(double));
    if(ad==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n;i++){
        ad[i]=3;
    }
    double* ae=malloc((n-1)*sizeof(double));
    if(ae==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n-1;i++){
        ae[i]=-1;
    }

    double* b=malloc(n*nrhs* sizeof(double));
    if(b==NULL){printf("malloc failed\n");exit(1);}

    for(j=0;j<nrhs;j++){
        for(i=0;i<n;i++){
            b[i*nrhs+j]=i+2*j;
        }
    }

    ldb=nrhs;

    clock_t t;
    t = clock();
    info = LAPACKE_dptsv(LAPACK_ROW_MAJOR,n,nrhs,ad,ae,b,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_ROW_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    for(i=0;i<n;i++){
        ad[i]=3;
    }
    if(ae==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n-1;i++){
        ae[i]=-1;
    }

    double* b2=malloc(n*nrhs* sizeof(double));
    if(b2==NULL){printf("malloc failed\n");exit(1);}
    for(j=0;j<nrhs;j++){
        for(i=0;i<n;i++){
            b2[j*n+i]=i+2*j;
        }
    }

    t = clock();
    ldb=n;
    info = LAPACKE_dptsv(LAPACK_COL_MAJOR,n,nrhs,ad,ae,b2,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_COL_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    double delta=0,temp,deltal=0;
    for(i=0;i<n;i++){
        deltal=0;
        for(j=0;j<nrhs;j++){
            temp=(b[i*nrhs+j]-b2[j*n+i]);
            deltal+=temp*temp;
        }
        delta+=deltal;
    }
    printf("delta %g\n",delta);
    free(ad);
    free(ae);
    free(b);
    free(b2);
    return (info);
} 

My output is :

LAPACK_ROW_MAJOR : 770000 clicks (0.770000 seconds).

LAPACK_COL_MAJOR : 310000 clicks (0.310000 seconds).

So LAPACKE_dptsv() runs almost twice faster with LAPACK_COL_MAJOR with `nbrhs=42

If the number of rhs is reduced to one (and n larger) :

LAPACK_ROW_MAJOR : 250000 clicks (0.250000 seconds).

LAPACK_COL_MAJOR : 180000 clicks (0.180000 seconds).

LAPACK_COL_MAJOR and LAPACK_ROW_MAJOR lead to approximately the same wall-clock time with a single RHS. And the output is the same.

I have not installed intel mkl on my pc and i am curious about the way it changes the conclusion of this test...


Related:


Solving equation using LU factorization without pivoting (Lapack library)


c++,lapack,equation-solving
At the beginning I would like to apologise for my English. Now, let's go to my problem. I try to write a simple code which will find a solution of a system of linear equations: Ax = b where A is a square matrix nxn. In this program I use...

Cannot compile MEX LAPACK example


matlab,mex,lapack
Summary I am trying to write some MEX files that use LAPACK, and I thought I'd start by looking through the examples provided by MathWorks. However, I am having some trouble compiling them. After copying the matrixDivide.c example from [MATLAB root]/extern/examples/refbook into a working directory, I tried to build a...

Find BLAS giving the path to the lib


cmake,lapack,blas
I use blas/lapack in my C++ code built with CMake 2.8.9. I want to find BLAS and LAPACK libraries with the CMake commands : find_package(BLAS REQUIRED) find_package(LAPACK REQUIRED) But it can found it because the libraries are in a specific directory. The error is the following : CMake Error at...

How to integrate c++ code into working LAPACK code


c++,ifstream,lapack
I have a very simple piece of C++ code justread.cc reading numbers from a file. After the command [email protected]:~/Desktop/tests$ g++ -Wall -pedantic -o justread.x justread.cc && ./justread.x it compiles without any errors or warnings. This is the code: #include <fstream> #include <vector> int read_covariance () { std::vector<double> data; double tmp;...

Netlogo HPC CPU Percentage Use Increase


linux,cloud,netlogo,hpc
I submit jobs using headless NetLogo to a HPC server by the following code: #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to...

Solving linear equation with LAPACKE


c,linear-algebra,lapack,lapacke
I'm trying to solve some linear equation (which is symmetrical, tridiagonal and positive). I have to use LAPACKE. My code is as follows: #include <lapacke.h> #include <stdio.h> void print_mtrx(double * mtrx, int n, int m) { int i, j; for(i = 0; i < n; i++) { for(j = 0;...

Operating in parallel on a large constant datastructure in Julia


performance,parallel-processing,julia-lang,hpc
I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes...

Is OpenCL ready for use on CPU?


opencl,mpi,cluster-computing,hpc
In the lab we have an heterogeneous cluster setup, with many Intel CPUs, a few AMD CPUs and a couple of Nvidia GPUs. For HPC development, the one thing I know that I could write once and run everywhere on this setup is OpenCL (not even Java ;) ). But...

MPI Fortran WTIME not working well


parallel-processing,fortran,mpi,hpc
I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results. Part of the code is like this: program heat_transfer_1D_parallel implicit none include 'mpif.h' integer myid,np,rc,ierror,status(MPI_STATUS_SIZE) integer :: N,N_loc,i,k,e...

LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors


lapack,hpc,scientific-computing,intel-mkl
Folks, I'm calling LAPACKE_dptsv If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?...

How would I implement MATLAB's “eig(A, B)” function in JavaScript


javascript,matlab,linear-algebra,lapack
According to MATLAB's documentation: [V,D] = eig(A,B) returns diagonal matrix D of generalized eigenvalues and full matrix V whose columns are the corresponding right eigenvectors, so that A*V = B*V*D. As I read available source code (It seems all implementations I've looked at Octave, R, Scipy) lead to LAPACK's DGGEV...

Is there a way to obtain the version of liblapack.a?


lapack
I installed lapack 3.4.2 and 3.5.0 by compiling them with cmake. In my installation dir of lapack 3.5.0 I find a file ./pkgconfig/lapack.pc saying Version: 3.4.2. So I'm not sure I really installed lapack 3.5.0 there. Is there a way to obtain the version of lapack directly form the lib...

Using LAPACKE_zgetrs with LAPACK_ROW_MAJOR causes illegal memory access


c,lapack,lapacke
I am trying to solve a linear system using the following code: #include <stdio.h> #include <lapacke.h> int main () { lapack_complex_double mat[4]; lapack_complex_double vec[2]; lapack_int p[2]; mat[0] = lapack_make_complex_double(1,0); mat[1] = lapack_make_complex_double(1,0); mat[2] = lapack_make_complex_double(1,0); mat[3] = lapack_make_complex_double(-1,0); vec[0] = lapack_make_complex_double(1,0); vec[1] = lapack_make_complex_double(1,0); LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);...

OpenMPI and rank/core bindings


mpi,openmpi,hpc
I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores. I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1. For instance, I can run my executable with 8 MPI ranks,...

Segmentation fault - invalid memory reference


fortran,lapack
Hey I am trying to get my LAPACK libraries to work and I have searched and searched but I can't seem to figure out what I am doing wrong. I try running my code, and I get the following error Program received signal SIGSEGV: Segmentation fault - invalid memory reference....

Fortran - Passing a variable into CGESV


fortran,lapack
I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A': (input/output) COMPLEX array, dimension (LDA,N) On entry, the...

Barrier after MPI non-blocking call, without bookkeeping?


c++,parallel-processing,mpi,cluster-computing,hpc
I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish. Version 1 with MPI_Request bookkeeping -- this works: MPI_Request requests[]; MPI_Iallreduce(..., requests[0]); ... MPI_Iallreduce(...,...

Why is there a blas subroutine (ISAMAX) for argmax abs but none for argmax?


c++,lapack,blas,absolute-value,argmax
Why is there a blas subroutine ISAMAX for argmax abs but not for argmax ? In C++ using std::max_element with compiler optimisation flag -O3 I am getting speeds comparable to blas_isamax (16 ms vs 9 ms), so at the moment my question is more out of interest than out of...

What distinguishes these LAPACK programmes? One compiles, the other does not


c,gcc,lapack
I have two programmes using the LAPACK routine dgeev in C. One appears to be working, the other does not compile claiming undefined reference to dgeev. I seek to understand why. The first code below - called mamapack.c - produces sensible results when compiled and run like this: [email protected]:~/Desktop/tests$ gcc...

cannot send std::vector using MPI_Send and MPI_Recv


c++,vector,parallel-processing,mpi,hpc
I am trying to send std:vector using MPI send and recv functions but I have reached no where. I get errors like Fatal error in MPI_Recv: Invalid buffer pointer, error stack: MPI_Recv(186): MPI_Recv(buf=(nil), count=2, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x7fff9e5e0c80) failed MPI_Recv(124): Null buffer pointer I tried multiple combinations A) like...

Running NetLogo on HPC machine: how to specify the number of cores to be used?


netlogo,hpc
$ wget https://ccl.northwestern.edu/netlogo/5.1.0/netlogo-5.1.0.tar.gz $ tar -xzf netlogo-5.1.0.tar.gz $ ~/netlogo-5.1.0/netlogo-headless.sh \ --model ~/myproject/MyModel.nlogo \ --experiment MyExperiment \ --table ~/myproject/MyNewOutputData.csv Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take...

Errors using boost numeric bindings and lapack call to gesvd


c++,boost,lapack
I've done my best recently to set up boost's numeric bindings to allow me to use LAPACK from C++, but I've run into some roadblocks. First off, I have confirmed that boost is working fine, so it's something to do with my LAPACK libraries or the boost numeric bindings. Here's...

On entry to DGEEV parameter number 9 had an illegal value


c,lapack,eigenvalue,eigenvector
I am trying for the first time to use LAPACK from C to diagonalize a matrix and I am stuck. I have been trying to modify this example http://rcabreral.blogspot.co.uk/2010/05/eigenvalues-clapack.html from zgeev to dgeev. I have looked at the DGEEV input parameters, http://www.netlib.org/lapack/explore-html/d9/d28/dgeev_8f.html but it seems I don't understand the well...

Changing priority of job in SGE using python drmaa wrapper


python,python-2.7,hpc,sungridengine,drmaa
When I try to submit a job using the python drmaa wrapper, I get a DeniedByDrmException: code 17: job rejected: positive submission priority requires operator privileges. How do I change the priority of jobs that I submit using the Python DRMAA wrapper?...

Are the LAPACK routines thread safe?


multithreading,fortran,lapack,blas
I am a novice using the LAPACK routines, so I don't deeply know them, and I want to use them in parallelized loops (openmp). I use Ubuntu 14.04LTS and have LAPACK installed using my package manager. The version installed is: liblapack3 3.5.0-2ubuntu1 Library of linear algebra routines 3 - shared...

Matrix-vector product with dgemm/dgemv


c++,matrix,vector,lapack,blas
Using Lapack with C++ is giving me a small headache. I found the functions defined for fortran a bit eccentric, so I tried to make a few functions on C++ to make it easier for me to read what's going on. Anyway, I'm not getting th matrix-vector product working as...

Modify attributes (including queue) of a pbs jobs


hpc,torque
I want to change the attributes (walltime and queue) of several (idle) jobs scheduled on a cluster. When I do (where 1234 is the job id): qalter -l walltime=24:00:00 -q newQueue 1234 I get the following error: qalter: illegally formed job identifier: newQueue What can I do?...

Move a range of jobs to another queue with qmove


bash,batch-processing,hpc,torque
I have several (idle) jobs scheduled on a cluster that I want to move to another queue. I can move a single job like this (where 1234 is the job id): qmove newQueue 1234 But now I have hundreds of jobs that I want to move to newQueue. Is it...