lapack,hpc,scientific-computing,intel-mkl , LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors


LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors

Question:

Tag: lapack,hpc,scientific-computing,intel-mkl

Folks, I'm calling LAPACKE_dptsv

If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?


Answer:

The function LAPACKE_dptsv() corresponds to the lapack function dptsv(), which does not feature the switch between LAPACK_ROW_MAJOR and LAPACK_COL_MAJOR. dptsv() is implemented for column-major ordering, corresponding to matrices in Fortran, while most of C matrices are row-major. So LAPACKE_dptsv(LAPACK_ROW_MAJOR,...) performs the following steps :

You can check this in the source of Lapacke, in /lapacke/src/lapacke_dptsv_work.c.

There is one question left : *does it largely affect the wall-clock time ? * Looking at dpttrs_8f_source, it is possible : the decomposition L*D*L**T is performed by a single for loop (+loop unrolling). A piece of code is therefore necessary to answer the question. The following code is compiled by gcc main.c -o main -llapacke -llapack -lblas

#include <stdio.h>
#include "lapacke.h"
#include <malloc.h>
#include <time.h>

int main ()
{
    //double a[3][2] = {{1,0},{1,1},{1,2}};

    double **outputArray;
    int designs=3;
    int i,j;
    lapack_int info,n,ldb,nrhs;

    n = 420000;
    nrhs = 42;

    //double outputArray[3][1] = {{6},{0},{0}};
    double* ad=malloc(n*sizeof(double));
    if(ad==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n;i++){
        ad[i]=3;
    }
    double* ae=malloc((n-1)*sizeof(double));
    if(ae==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n-1;i++){
        ae[i]=-1;
    }

    double* b=malloc(n*nrhs* sizeof(double));
    if(b==NULL){printf("malloc failed\n");exit(1);}

    for(j=0;j<nrhs;j++){
        for(i=0;i<n;i++){
            b[i*nrhs+j]=i+2*j;
        }
    }

    ldb=nrhs;

    clock_t t;
    t = clock();
    info = LAPACKE_dptsv(LAPACK_ROW_MAJOR,n,nrhs,ad,ae,b,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_ROW_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    for(i=0;i<n;i++){
        ad[i]=3;
    }
    if(ae==NULL){printf("malloc failed\n");exit(1);}
    for(i=0;i<n-1;i++){
        ae[i]=-1;
    }

    double* b2=malloc(n*nrhs* sizeof(double));
    if(b2==NULL){printf("malloc failed\n");exit(1);}
    for(j=0;j<nrhs;j++){
        for(i=0;i<n;i++){
            b2[j*n+i]=i+2*j;
        }
    }

    t = clock();
    ldb=n;
    info = LAPACKE_dptsv(LAPACK_COL_MAJOR,n,nrhs,ad,ae,b2,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_COL_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    double delta=0,temp,deltal=0;
    for(i=0;i<n;i++){
        deltal=0;
        for(j=0;j<nrhs;j++){
            temp=(b[i*nrhs+j]-b2[j*n+i]);
            deltal+=temp*temp;
        }
        delta+=deltal;
    }
    printf("delta %g\n",delta);
    free(ad);
    free(ae);
    free(b);
    free(b2);
    return (info);
} 

My output is :

LAPACK_ROW_MAJOR : 770000 clicks (0.770000 seconds).

LAPACK_COL_MAJOR : 310000 clicks (0.310000 seconds).

So LAPACKE_dptsv() runs almost twice faster with LAPACK_COL_MAJOR with `nbrhs=42

If the number of rhs is reduced to one (and n larger) :

LAPACK_ROW_MAJOR : 250000 clicks (0.250000 seconds).

LAPACK_COL_MAJOR : 180000 clicks (0.180000 seconds).

LAPACK_COL_MAJOR and LAPACK_ROW_MAJOR lead to approximately the same wall-clock time with a single RHS. And the output is the same.

I have not installed intel mkl on my pc and i am curious about the way it changes the conclusion of this test...


Related:


Cannot compile MEX LAPACK example


matlab,mex,lapack
Summary I am trying to write some MEX files that use LAPACK, and I thought I'd start by looking through the examples provided by MathWorks. However, I am having some trouble compiling them. After copying the matrixDivide.c example from [MATLAB root]/extern/examples/refbook into a working directory, I tried to build a...

Move a range of jobs to another queue with qmove


bash,batch-processing,hpc,torque
I have several (idle) jobs scheduled on a cluster that I want to move to another queue. I can move a single job like this (where 1234 is the job id): qmove newQueue 1234 But now I have hundreds of jobs that I want to move to newQueue. Is it...

Fortran - Passing a variable into CGESV


fortran,lapack
I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A': (input/output) COMPLEX array, dimension (LDA,N) On entry, the...

Running NetLogo on HPC machine: how to specify the number of cores to be used?


netlogo,hpc
$ wget https://ccl.northwestern.edu/netlogo/5.1.0/netlogo-5.1.0.tar.gz $ tar -xzf netlogo-5.1.0.tar.gz $ ~/netlogo-5.1.0/netlogo-headless.sh \ --model ~/myproject/MyModel.nlogo \ --experiment MyExperiment \ --table ~/myproject/MyNewOutputData.csv Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take...

Errors using boost numeric bindings and lapack call to gesvd


c++,boost,lapack
I've done my best recently to set up boost's numeric bindings to allow me to use LAPACK from C++, but I've run into some roadblocks. First off, I have confirmed that boost is working fine, so it's something to do with my LAPACK libraries or the boost numeric bindings. Here's...

Solving equation using LU factorization without pivoting (Lapack library)


c++,lapack,equation-solving
At the beginning I would like to apologise for my English. Now, let's go to my problem. I try to write a simple code which will find a solution of a system of linear equations: Ax = b where A is a square matrix nxn. In this program I use...

How would I implement MATLAB's “eig(A, B)” function in JavaScript


javascript,matlab,linear-algebra,lapack
According to MATLAB's documentation: [V,D] = eig(A,B) returns diagonal matrix D of generalized eigenvalues and full matrix V whose columns are the corresponding right eigenvectors, so that A*V = B*V*D. As I read available source code (It seems all implementations I've looked at Octave, R, Scipy) lead to LAPACK's DGGEV...

cannot send std::vector using MPI_Send and MPI_Recv


c++,vector,parallel-processing,mpi,hpc
I am trying to send std:vector using MPI send and recv functions but I have reached no where. I get errors like Fatal error in MPI_Recv: Invalid buffer pointer, error stack: MPI_Recv(186): MPI_Recv(buf=(nil), count=2, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x7fff9e5e0c80) failed MPI_Recv(124): Null buffer pointer I tried multiple combinations A) like...

Using LAPACKE_zgetrs with LAPACK_ROW_MAJOR causes illegal memory access


c,lapack,lapacke
I am trying to solve a linear system using the following code: #include <stdio.h> #include <lapacke.h> int main () { lapack_complex_double mat[4]; lapack_complex_double vec[2]; lapack_int p[2]; mat[0] = lapack_make_complex_double(1,0); mat[1] = lapack_make_complex_double(1,0); mat[2] = lapack_make_complex_double(1,0); mat[3] = lapack_make_complex_double(-1,0); vec[0] = lapack_make_complex_double(1,0); vec[1] = lapack_make_complex_double(1,0); LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);...

What distinguishes these LAPACK programmes? One compiles, the other does not


c,gcc,lapack
I have two programmes using the LAPACK routine dgeev in C. One appears to be working, the other does not compile claiming undefined reference to dgeev. I seek to understand why. The first code below - called mamapack.c - produces sensible results when compiled and run like this: [email protected]:~/Desktop/tests$ gcc...

Netlogo HPC CPU Percentage Use Increase


linux,cloud,netlogo,hpc
I submit jobs using headless NetLogo to a HPC server by the following code: #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to...

Modify attributes (including queue) of a pbs jobs


hpc,torque
I want to change the attributes (walltime and queue) of several (idle) jobs scheduled on a cluster. When I do (where 1234 is the job id): qalter -l walltime=24:00:00 -q newQueue 1234 I get the following error: qalter: illegally formed job identifier: newQueue What can I do?...

On entry to DGEEV parameter number 9 had an illegal value


c,lapack,eigenvalue,eigenvector
I am trying for the first time to use LAPACK from C to diagonalize a matrix and I am stuck. I have been trying to modify this example http://rcabreral.blogspot.co.uk/2010/05/eigenvalues-clapack.html from zgeev to dgeev. I have looked at the DGEEV input parameters, http://www.netlib.org/lapack/explore-html/d9/d28/dgeev_8f.html but it seems I don't understand the well...

Matrix-vector product with dgemm/dgemv


c++,matrix,vector,lapack,blas
Using Lapack with C++ is giving me a small headache. I found the functions defined for fortran a bit eccentric, so I tried to make a few functions on C++ to make it easier for me to read what's going on. Anyway, I'm not getting th matrix-vector product working as...

LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors


lapack,hpc,scientific-computing,intel-mkl
Folks, I'm calling LAPACKE_dptsv If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?...

How to integrate c++ code into working LAPACK code


c++,ifstream,lapack
I have a very simple piece of C++ code justread.cc reading numbers from a file. After the command [email protected]:~/Desktop/tests$ g++ -Wall -pedantic -o justread.x justread.cc && ./justread.x it compiles without any errors or warnings. This is the code: #include <fstream> #include <vector> int read_covariance () { std::vector<double> data; double tmp;...

Is OpenCL ready for use on CPU?


opencl,mpi,cluster-computing,hpc
In the lab we have an heterogeneous cluster setup, with many Intel CPUs, a few AMD CPUs and a couple of Nvidia GPUs. For HPC development, the one thing I know that I could write once and run everywhere on this setup is OpenCL (not even Java ;) ). But...

Barrier after MPI non-blocking call, without bookkeeping?


c++,parallel-processing,mpi,cluster-computing,hpc
I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish. Version 1 with MPI_Request bookkeeping -- this works: MPI_Request requests[]; MPI_Iallreduce(..., requests[0]); ... MPI_Iallreduce(...,...

Changing priority of job in SGE using python drmaa wrapper


python,python-2.7,hpc,sungridengine,drmaa
When I try to submit a job using the python drmaa wrapper, I get a DeniedByDrmException: code 17: job rejected: positive submission priority requires operator privileges. How do I change the priority of jobs that I submit using the Python DRMAA wrapper?...

Why is there a blas subroutine (ISAMAX) for argmax abs but none for argmax?


c++,lapack,blas,absolute-value,argmax
Why is there a blas subroutine ISAMAX for argmax abs but not for argmax ? In C++ using std::max_element with compiler optimisation flag -O3 I am getting speeds comparable to blas_isamax (16 ms vs 9 ms), so at the moment my question is more out of interest than out of...

Find BLAS giving the path to the lib


cmake,lapack,blas
I use blas/lapack in my C++ code built with CMake 2.8.9. I want to find BLAS and LAPACK libraries with the CMake commands : find_package(BLAS REQUIRED) find_package(LAPACK REQUIRED) But it can found it because the libraries are in a specific directory. The error is the following : CMake Error at...

Are the LAPACK routines thread safe?


multithreading,fortran,lapack,blas
I am a novice using the LAPACK routines, so I don't deeply know them, and I want to use them in parallelized loops (openmp). I use Ubuntu 14.04LTS and have LAPACK installed using my package manager. The version installed is: liblapack3 3.5.0-2ubuntu1 Library of linear algebra routines 3 - shared...

Is there a way to obtain the version of liblapack.a?


lapack
I installed lapack 3.4.2 and 3.5.0 by compiling them with cmake. In my installation dir of lapack 3.5.0 I find a file ./pkgconfig/lapack.pc saying Version: 3.4.2. So I'm not sure I really installed lapack 3.5.0 there. Is there a way to obtain the version of lapack directly form the lib...

OpenMPI and rank/core bindings


mpi,openmpi,hpc
I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores. I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1. For instance, I can run my executable with 8 MPI ranks,...

MPI Fortran WTIME not working well


parallel-processing,fortran,mpi,hpc
I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results. Part of the code is like this: program heat_transfer_1D_parallel implicit none include 'mpif.h' integer myid,np,rc,ierror,status(MPI_STATUS_SIZE) integer :: N,N_loc,i,k,e...

Operating in parallel on a large constant datastructure in Julia


performance,parallel-processing,julia-lang,hpc
I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes...

Segmentation fault - invalid memory reference


fortran,lapack
Hey I am trying to get my LAPACK libraries to work and I have searched and searched but I can't seem to figure out what I am doing wrong. I try running my code, and I get the following error Program received signal SIGSEGV: Segmentation fault - invalid memory reference....

Solving linear equation with LAPACKE


c,linear-algebra,lapack,lapacke
I'm trying to solve some linear equation (which is symmetrical, tridiagonal and positive). I have to use LAPACKE. My code is as follows: #include <lapacke.h> #include <stdio.h> void print_mtrx(double * mtrx, int n, int m) { int i, j; for(i = 0; i < n; i++) { for(j = 0;...