lapack,hpc,scientific-computing,intel-mkl , LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors

LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors


Tag: lapack,hpc,scientific-computing,intel-mkl

Folks, I'm calling LAPACKE_dptsv

If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?


The function LAPACKE_dptsv() corresponds to the lapack function dptsv(), which does not feature the switch between LAPACK_ROW_MAJOR and LAPACK_COL_MAJOR. dptsv() is implemented for column-major ordering, corresponding to matrices in Fortran, while most of C matrices are row-major. So LAPACKE_dptsv(LAPACK_ROW_MAJOR,...) performs the following steps :

You can check this in the source of Lapacke, in /lapacke/src/lapacke_dptsv_work.c.

There is one question left : *does it largely affect the wall-clock time ? * Looking at dpttrs_8f_source, it is possible : the decomposition L*D*L**T is performed by a single for loop (+loop unrolling). A piece of code is therefore necessary to answer the question. The following code is compiled by gcc main.c -o main -llapacke -llapack -lblas

#include <stdio.h>
#include "lapacke.h"
#include <malloc.h>
#include <time.h>

int main ()
    //double a[3][2] = {{1,0},{1,1},{1,2}};

    double **outputArray;
    int designs=3;
    int i,j;
    lapack_int info,n,ldb,nrhs;

    n = 420000;
    nrhs = 42;

    //double outputArray[3][1] = {{6},{0},{0}};
    double* ad=malloc(n*sizeof(double));
    if(ad==NULL){printf("malloc failed\n");exit(1);}
    double* ae=malloc((n-1)*sizeof(double));
    if(ae==NULL){printf("malloc failed\n");exit(1);}

    double* b=malloc(n*nrhs* sizeof(double));
    if(b==NULL){printf("malloc failed\n");exit(1);}



    clock_t t;
    t = clock();
    info = LAPACKE_dptsv(LAPACK_ROW_MAJOR,n,nrhs,ad,ae,b,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_ROW_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    if(ae==NULL){printf("malloc failed\n");exit(1);}

    double* b2=malloc(n*nrhs* sizeof(double));
    if(b2==NULL){printf("malloc failed\n");exit(1);}

    t = clock();
    info = LAPACKE_dptsv(LAPACK_COL_MAJOR,n,nrhs,ad,ae,b2,ldb);
    if(info!=0){printf("failed, info %d\n",info);}
    t = clock() - t;
    printf ("LAPACK_COL_MAJOR :  %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);

    double delta=0,temp,deltal=0;
    printf("delta %g\n",delta);
    return (info);

My output is :

LAPACK_ROW_MAJOR : 770000 clicks (0.770000 seconds).

LAPACK_COL_MAJOR : 310000 clicks (0.310000 seconds).

So LAPACKE_dptsv() runs almost twice faster with LAPACK_COL_MAJOR with `nbrhs=42

If the number of rhs is reduced to one (and n larger) :

LAPACK_ROW_MAJOR : 250000 clicks (0.250000 seconds).

LAPACK_COL_MAJOR : 180000 clicks (0.180000 seconds).

LAPACK_COL_MAJOR and LAPACK_ROW_MAJOR lead to approximately the same wall-clock time with a single RHS. And the output is the same.

I have not installed intel mkl on my pc and i am curious about the way it changes the conclusion of this test...


On entry to DGEEV parameter number 9 had an illegal value

I am trying for the first time to use LAPACK from C to diagonalize a matrix and I am stuck. I have been trying to modify this example from zgeev to dgeev. I have looked at the DGEEV input parameters, but it seems I don't understand the well...

Cannot compile MEX LAPACK example

Summary I am trying to write some MEX files that use LAPACK, and I thought I'd start by looking through the examples provided by MathWorks. However, I am having some trouble compiling them. After copying the matrixDivide.c example from [MATLAB root]/extern/examples/refbook into a working directory, I tried to build a...

Errors using boost numeric bindings and lapack call to gesvd

I've done my best recently to set up boost's numeric bindings to allow me to use LAPACK from C++, but I've run into some roadblocks. First off, I have confirmed that boost is working fine, so it's something to do with my LAPACK libraries or the boost numeric bindings. Here's...

Fortran - Passing a variable into CGESV

I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A': (input/output) COMPLEX array, dimension (LDA,N) On entry, the...

Solving linear equation with LAPACKE

I'm trying to solve some linear equation (which is symmetrical, tridiagonal and positive). I have to use LAPACKE. My code is as follows: #include <lapacke.h> #include <stdio.h> void print_mtrx(double * mtrx, int n, int m) { int i, j; for(i = 0; i < n; i++) { for(j = 0;...

Barrier after MPI non-blocking call, without bookkeeping?

I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish. Version 1 with MPI_Request bookkeeping -- this works: MPI_Request requests[]; MPI_Iallreduce(..., requests[0]); ... MPI_Iallreduce(...,...

LAPACK C (mkl) dptsv row major/column major: Does it make a different for vectors

Folks, I'm calling LAPACKE_dptsv If all arguments are 1d arrays, does it matter if I call the data LAPACK_ROW_MAJOR or LAPACK_COL_MAJOR ?...

Are the LAPACK routines thread safe?

I am a novice using the LAPACK routines, so I don't deeply know them, and I want to use them in parallelized loops (openmp). I use Ubuntu 14.04LTS and have LAPACK installed using my package manager. The version installed is: liblapack3 3.5.0-2ubuntu1 Library of linear algebra routines 3 - shared...

Why is there a blas subroutine (ISAMAX) for argmax abs but none for argmax?

Why is there a blas subroutine ISAMAX for argmax abs but not for argmax ? In C++ using std::max_element with compiler optimisation flag -O3 I am getting speeds comparable to blas_isamax (16 ms vs 9 ms), so at the moment my question is more out of interest than out of...

Changing priority of job in SGE using python drmaa wrapper

When I try to submit a job using the python drmaa wrapper, I get a DeniedByDrmException: code 17: job rejected: positive submission priority requires operator privileges. How do I change the priority of jobs that I submit using the Python DRMAA wrapper?...

cannot send std::vector using MPI_Send and MPI_Recv

I am trying to send std:vector using MPI send and recv functions but I have reached no where. I get errors like Fatal error in MPI_Recv: Invalid buffer pointer, error stack: MPI_Recv(186): MPI_Recv(buf=(nil), count=2, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x7fff9e5e0c80) failed MPI_Recv(124): Null buffer pointer I tried multiple combinations A) like...

Is OpenCL ready for use on CPU?

In the lab we have an heterogeneous cluster setup, with many Intel CPUs, a few AMD CPUs and a couple of Nvidia GPUs. For HPC development, the one thing I know that I could write once and run everywhere on this setup is OpenCL (not even Java ;) ). But...

Find BLAS giving the path to the lib

I use blas/lapack in my C++ code built with CMake 2.8.9. I want to find BLAS and LAPACK libraries with the CMake commands : find_package(BLAS REQUIRED) find_package(LAPACK REQUIRED) But it can found it because the libraries are in a specific directory. The error is the following : CMake Error at...

How would I implement MATLAB's “eig(A, B)” function in JavaScript

According to MATLAB's documentation: [V,D] = eig(A,B) returns diagonal matrix D of generalized eigenvalues and full matrix V whose columns are the corresponding right eigenvectors, so that A*V = B*V*D. As I read available source code (It seems all implementations I've looked at Octave, R, Scipy) lead to LAPACK's DGGEV...

What distinguishes these LAPACK programmes? One compiles, the other does not

I have two programmes using the LAPACK routine dgeev in C. One appears to be working, the other does not compile claiming undefined reference to dgeev. I seek to understand why. The first code below - called mamapack.c - produces sensible results when compiled and run like this: [email protected]:~/Desktop/tests$ gcc...

Modify attributes (including queue) of a pbs jobs

I want to change the attributes (walltime and queue) of several (idle) jobs scheduled on a cluster. When I do (where 1234 is the job id): qalter -l walltime=24:00:00 -q newQueue 1234 I get the following error: qalter: illegally formed job identifier: newQueue What can I do?...

Segmentation fault - invalid memory reference

Hey I am trying to get my LAPACK libraries to work and I have searched and searched but I can't seem to figure out what I am doing wrong. I try running my code, and I get the following error Program received signal SIGSEGV: Segmentation fault - invalid memory reference....

How to integrate c++ code into working LAPACK code

I have a very simple piece of C++ code reading numbers from a file. After the command [email protected]:~/Desktop/tests$ g++ -Wall -pedantic -o justread.x && ./justread.x it compiles without any errors or warnings. This is the code: #include <fstream> #include <vector> int read_covariance () { std::vector<double> data; double tmp;...

Using LAPACKE_zgetrs with LAPACK_ROW_MAJOR causes illegal memory access

I am trying to solve a linear system using the following code: #include <stdio.h> #include <lapacke.h> int main () { lapack_complex_double mat[4]; lapack_complex_double vec[2]; lapack_int p[2]; mat[0] = lapack_make_complex_double(1,0); mat[1] = lapack_make_complex_double(1,0); mat[2] = lapack_make_complex_double(1,0); mat[3] = lapack_make_complex_double(-1,0); vec[0] = lapack_make_complex_double(1,0); vec[1] = lapack_make_complex_double(1,0); LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);...

Running NetLogo on HPC machine: how to specify the number of cores to be used?

$ wget $ tar -xzf netlogo-5.1.0.tar.gz $ ~/netlogo-5.1.0/ \ --model ~/myproject/MyModel.nlogo \ --experiment MyExperiment \ --table ~/myproject/MyNewOutputData.csv Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take...

MPI Fortran WTIME not working well

I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results. Part of the code is like this: program heat_transfer_1D_parallel implicit none include 'mpif.h' integer myid,np,rc,ierror,status(MPI_STATUS_SIZE) integer :: N,N_loc,i,k,e...

Move a range of jobs to another queue with qmove

I have several (idle) jobs scheduled on a cluster that I want to move to another queue. I can move a single job like this (where 1234 is the job id): qmove newQueue 1234 But now I have hundreds of jobs that I want to move to newQueue. Is it...

Solving equation using LU factorization without pivoting (Lapack library)

At the beginning I would like to apologise for my English. Now, let's go to my problem. I try to write a simple code which will find a solution of a system of linear equations: Ax = b where A is a square matrix nxn. In this program I use...

Matrix-vector product with dgemm/dgemv

Using Lapack with C++ is giving me a small headache. I found the functions defined for fortran a bit eccentric, so I tried to make a few functions on C++ to make it easier for me to read what's going on. Anyway, I'm not getting th matrix-vector product working as...

Operating in parallel on a large constant datastructure in Julia

I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes...

Netlogo HPC CPU Percentage Use Increase

I submit jobs using headless NetLogo to a HPC server by the following code: #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/ \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to...

Is there a way to obtain the version of liblapack.a?

I installed lapack 3.4.2 and 3.5.0 by compiling them with cmake. In my installation dir of lapack 3.5.0 I find a file ./pkgconfig/lapack.pc saying Version: 3.4.2. So I'm not sure I really installed lapack 3.5.0 there. Is there a way to obtain the version of lapack directly form the lib...

OpenMPI and rank/core bindings

I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores. I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1. For instance, I can run my executable with 8 MPI ranks,...