compilation,cuda,nvcc,caffe , caffe Debug build: stray '"' character in nvcc command


caffe Debug build: stray '"' character in nvcc command

Question:

Tag: compilation,cuda,nvcc,caffe

I am trying to build my C++ application that uses caffe, in Debug Mode, VS2013 community, x64. To be able to build version that do not need cuda to run, I added to wrapped each .cu file as indicated below:

#ifndef CPU_ONLY
// .cu file contents
#endif

The project was built and ran fine in CPU_ONLY mode. Undefininig the CPU_ONLY flag, the project builds and runs OK in Release mode, but in Debug, I am getting the following error when trying to compile the *.cu files:

Compiling CUDA source file ..\..\src\caffe\layers\base_data_layer.cu...
>  
>  >COMMAND
>  nvcc fatal   : Stray '"' character in command line
COMMAND  exited with code 1.

Where COMMAND is the nvcc compiler call command below, newlined for readability.

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe"         
-gencode=arch=compute_30,code=\" sm_30,compute_30\" 
--use-local-env 
--cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  
-I"C:\Users\username\Downloads\liblinear-1.8\liblinear-1.8" 
-I"C:\Users\username\Downloads\poco-1.6.0\Foundation\include" 
-I"C:\Users\username\Downloads\poco-1.6.0\Net\include" 
-IC:\opencv_gpu\include -I"C:\Users\username\Downloads\caffe-master\src" 
-I"C:\Users\username\Downloads\caffe-master\include" 
-IC:\local\boost_1_56_0 -I"C:\Users\username\Downloads\caffe-master\3rdparty\include\openblas" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\lmdb" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\leveldb" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\hdf5" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\google" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\glog" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include\gflags" 
-I"C:\Users\username\Downloads\caffe-master\3rdparty\include" 
-I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" 
-I"C:\Users\username\Downloads\cudnn-6.5-win-R1" 
-I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"  
-G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile 
-cudart static  -g   -D_SCL_SECURE_NO_WARNINGS -D_CRT_SECURE_NO_WARNINGS -DWIN32 -D_DEBUG -D_CONSOLE -D_LIB -D_UNICODE -DUNICODE 
-Xcompiler "/EHsc /W0 /nologo /Od /Zi /RTC1 /MDd  " 
-o x64\Debug\base_data_layer.cu.obj "C:\Users\username\Downloads\caffe-master\src\caffe\layers\base_data_layer.cu"

The project was able to build successfully in debug mode before adding the CPU_ONLY flags. Any ideas?


Answer:

Turns out it was a typo. In project properties->Debug->CUDA C/C++->Device, instead of compute_30,sm_30

I had

`compute_30, sm_30`

that is, with a space separator.


Related:


Java Import Static error


java,compilation
I have this import source : import static MinecraftDungeonTileTypes.*; on class in same package as the class MinecraftDungeonTileTypes source of class: package mod.dungeonworld; public class MinecraftDungeonTileTypes { public static int TILE_WALL = 0; public static int TILE_ROAD = 1; public static int TILE_DOOR = 2; public static int TILE_CHEST =...

How to load data in global memory into shared memory SAFELY in CUDA?


c++,cuda,shared-memory
My kernel: __global__ void myKernel(float * devData, float * devVec, float * devStrFac, int Natom, int vecNo) { extern __shared__ float sdata[]; int idx = blockIdx.x * blockDim.x + threadIdx.x; float qx=devVec[3*idx]; float qy=devVec[3*idx+1]; float qz=devVec[3*idx+2]; __syncthreads();//sync_1 float c=0.0,s=0.0; for (int iatom=0; iatom<Natom; iatom += blockDim.x) { float rtx =...

Using a data pointer with CUDA (and integrated memory)


c++,memory-management,cuda
I am using a board with integrated gpu and cpu memory. I am also using an external matrix library (Blitz++). I would like to be able to grab the pointer to my data from the matrix object and pass it into a cuda kernel. After doing some digging, it sounds...

Change version constant before compile


java,intellij-idea,gradle,compilation
I'm looking for a simple solution to find and replace a constant before compile. For example @[email protected] replaced with 1.0.0 so the program has access to the correct version #s.

Not able to execute a class with external class path


java,compilation,openoffice.org
I have a class Test.java import com.sun.star.bridge.XUnoUrlResolver; import com.sun.star.comp.helper.Bootstrap; import com.sun.star.lang.XMultiComponentFactory; import com.sun.star.lang.XMultiServiceFactory; import com.sun.star.uno.UnoRuntime; import com.sun.star.uno.XComponentContext; public class Test { XMultiServiceFactory ooConnect() { final String sConnectionString = "uno:socket,host=localhost,port=8100;urp;StarOffice.ServiceManager"; // create the initial component context XComponentContext rComponentContext = null; try {...

Eclipse not compiling because of ClassNotFoundException


java,eclipse,amazon-web-services,compilation,aspectj
After following an AWS tutorial for Eclipse, my code no longer compiles and runs. I decided to undo what the tutorial told me, so I may have changed some settings that I forgot to unchanged but I really cannot find the root of my problem. Eclipse seems to be back...

Understanding Memory Replays and In-Flight Requests


caching,cuda
I'm trying to understand how a matrix transpose can be faster reading naively from columns vs. rows. (example is from Professional CUDA C Programming) The matrix is in memory by row, i.e. (0,1),(0,2),(0,3)...(1,1),(1,2) __global__ void transposeNaiveCol(float *out, float *in, const int nx, const int ny) { unsigned int ix =...

Java Generic type is failing without an intermediate variable


java,generics,compilation
I'm having an unexpected error while compiling this example code (in the fails() method). IntelliJ used to not report the error in the IDE, but it has since started to report it (some of the classes were in a library, which seemed to confuse it) public class Main { //...

Tesla k20m interoperability with Direct3D 11


cuda,direct3d,tesla
I would like to know if I can work with Nvidia Tesla K20 and Direct3D 11? I'd like to render an image using Direct3D, Then process the rendered image with CUDA, [ I know how to work out the CUDA interoperability]. Tesla k20 doesn't have a display adapter (physically remote...

AngularJS directive within ng-if won't run


javascript,angularjs,compilation,angular-directive,angular-ng-if
I have a custom directive myDirective that performs a task on an element. I have this directive in an ng-if block <div ng-if="condition"> <div my-directive></div> </div> Something like this fiddle: http://jsfiddle.net/hGnvv/ only the ng-if condition turns to true after my $http requests are loaded. The directive is probably compiled during...

cudaMalloc vs cudaMalloc3D performance for a 2D array


c,cuda
I want to know the impact on performance when using cudaMalloc or cudaMalloc3D when allocating, copying and accessing memory for a 2D array. I have code that I tried to test the run time on where on one I use cudaMalloc and on the other cudaMalloc3D. I have included the...

Inheritance classes (Java), explicit constructor error message


java,inheritance,compilation,extends
so I am trying to learn about inheritance classes. First I created a class called Box to calculate the area of the box. Then I created a TestBox Class in which I have created a box object called fedEx. Box Class: public class Box { private String boxName; public void...

How does CUDA's cudaMemcpyFromSymbol work?


cuda
I understand the concept of passing a symbol, but was wondering what exactly is going on behind the scenes. If it's not the address of the variable, then what is it?

Understanding Dynamic Parallelism in CUDA


multithreading,cuda
Example of dynamic parallelism: __global__ void nestedHelloWorld(int const iSize,int iDepth) { int tid = threadIdx.x; printf("Recursion=%d: Hello World from thread %d" "block %d\n",iDepth,tid,blockIdx.x); // condition to stop recursive execution if (iSize == 1) return; // reduce block size to half int nthreads = iSize>>1; // thread 0 launches child grid...

NVCC CUDA cross compiling cannot find “-lcudart”


linux,cuda,ld,nvcc
I have installed CUDA 5.0 and NVCC on my Ubuntu virtual machine and have had problems compiling even a basic CUDA C program. The error is as follows: [email protected]:~/CUDA$ nvcc helloworld.cu -o helloworld.o -target-cpu-arch=ARM -ccbin=/usr/bin/arm-linux-gnueabi-gcc-4.6 --machine=32 /usr/lib/gcc/arm-linux-gnueabi/4.6/../../../../arm-linux-gnueabi/bin/ld: skipping incompatible /usr/local/cuda-5.0/bin/../lib/libcudart.so when searching for -lcudart /usr/lib/gcc/arm-linux-gnueabi/4.6/../../../../arm-linux-gnueabi/bin/ld: skipping incompatible...

Angular - form.$dirty not updating after $compile


angularjs,compilation
I have a directive attached to an input field that calls $compile(element)(scope); in its Link method. Everything works well, except when trying to use the following: ng-class="{ 'has-error' : frm.first_name.$invalid && frm.last_name.$dirty }" The $invalid property updates, but $dirty (and $pristine) always retain their initial values. plnkr example I'm not...

Access violation reading location when calling cudaMemcpy2DToArray


c++,arrays,opencv,cuda
I allocated a 2D array in device and want to copy a 2D float array to device. ImgSrc is a Mat type in openCV that I copied the elements of it into a 2D float array named ImgSrc_f.then by using cudaMemcpy2DToArray() I copied my host 2D array(ImgSrc_f) to device 2D...

How to make my if statement output the correct printf C program


c,if-statement,compilation,generator
In the code below I have made a program that can make the amount of money you put in transfer into words. For example, "1234.56" turn into "One Thousand Two Hundred Thirty Four and ... 56 Cents". The thing is that when I try to input the amount "0.01" into...

how to generalize square matrix multiplication to handle arbitrary dimensions


c,cuda,parallel-processing,matrix-multiplication
I have written this program and I am having some trouble understanding how to use multiple blocks by using dim3 variable in the kernel call line. This code works fine when I am doing 1000*1000 matrix multiplication, but not getting correct answer for lower dimensions like 100*100 , 200*200. #include...

Update a D3D9 texture from CUDA


c#,cuda,sharpdx,direct3d9,managed-cuda
I’m working on a prototype that integrates WPF, Direct3D9 (using Microsoft’s D3DImage WPF class), and CUDA (I need to be able to generate a texture for the D3DImage on the GPU). The problem is, CUDA doesn’t update my texture. No error codes are returned, the texture just stays unchanged. Even...

cuMemcpyDtoH yields CUDA_ERROR_INVALID_VALUE


java,scala,ubuntu,cuda,jcuda
I have a very simple scala jcuda program that adds a very large array. Everything compiles and runs just fine until I want to copy more than 4 bytes from my device to host. I am getting CUDA_ERROR_INVALID_VALUE when I try to copy more than 4 bytes. // This does...

How to make my cheque generator output exactly what it is instead of “Zero” in C


c,compilation,generator,zero,money
My Cheque generator program has worked flawlessly for any input you give it to make it output the numerals in words. for example if I were to input "1234.56" it will out put "One Thousand Two Hundred Thirty Four Dollars and ... 56 Cents". However whenever I want to output...

Ant debug and ant release failed


java,android,eclipse,ant,compilation
I am trying to generate apk on command line using ant. I am able to use ant clean but for ant debug and ant release command I am getting following error. BUILD FAILED C:\Android\sdk\tools\ant\build.xml:649: The following error occurred while executing this line: C:\Android\sdk\tools\ant\build.xml:694: Execute failed: java.io.IOException: Cannot run program "C:\Workspace\SampleApp\${aapt}":...

Cannot resolve the name to a(n) 'element declaration' component


java,xsd,compilation,maven-jaxb2-plugin
while compiling maven-jaxb2-plugin I get below error [INFO] --- maven-jaxb2-plugin:0.8.3:generate (default) @ customer-project --- [ERROR] Error while parsing schema(s).Location [ file:....Customer.xsd{12,97}]. org.xml.sax.SAXParseException: src-resolve: Cannot resolve the name 'ttadcustomer:CustomerApplicationDetail' to a(n) 'element declaration' component. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195) at...

Threads syncronization in CUDA


c++,multithreading,cuda
I have a 3D grid of 3D blocks, and within each block I need to compute sequentially on the "z" layers of the block. In other words, I want to execute first all (x,y,0) threads, then all (x,y,1), etc. I need to execute my threads layer by layer (counting layers...

Faster Matrix Multiplication in CUDA


c,cuda,matrix-multiplication
Currently, I made a neural networks program in the cuda c. Because I needed to manipulate the matrix multiplication, I did not use CUBLAS for MM. I use the following code for MM. I was wondering if any one has some advice to make it faster which can be very...

How many parallel threads i can run on my nvidia graphic card in cuda programming?


cuda
Operating System: Windows 8.1 Single Language, 64-bit DirectX version: 11.0 GPU processor: GeForce 840M Driver version: 353.06 Direct3D API version: 11.2 Direct3D feature level: 11_0 CUDA Cores: 384 Core clock: 1029 MHz Memory data rate: 1800 MHz Memory interface: 64-bit Memory bandwidth: 14.40 GB/s Total available graphics memory: 4096 MB...

Stream compaction with Thrust; best practices and fastest way?


c++,cuda,gpgpu,thrust,sparse-array
I am interested in porting some existing code to use thrust to see if I can speed it up on the GPU with relative ease. What I'm looking to accomplish is a stream compaction operation, where only nonzero elements will be kept. I have this mostly working, per the example...

What is version of cuda for nvidia 304.125


ubuntu,cuda,ubuntu-14.04,nvidia
I am using ubuntu 14.04. I want to install CUDA. But I don't know which version is good for my laptop. I trace my drive that is $cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.125 Mon Dec 1 19:58:28 PST 2014 GCC version: gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)...

Is prefix scan CUDA sample code in gpugems3 correct?


cuda,gpu,nvidia,prefix-sum
I've written a piece of code to call the kernel in gpugem3 but the results that I got is a bunch of negative numbers instead of prefix scan. I'm wondering if my kernel call is wrong or there is something wrong with the gpugem3 code? here is my code: #include...

Multiple definition and file management


c,arrays,compilation,compiler-errors,include
I'm writing a program for vocabulary training, for myself. And the program itself should be available in different languages, atm in German and English. What I want is to have a Main File which manage all and two separate files for the functions in the right language. I compile all...

Troubleshoot slow compilation


java,performance,compilation,javac
What should I do to investigate and troubleshoot a slow compilation problem? My project has about 100 classes and takes more than 45 seconds to compile, which seems very slow to me. As a reference, I have another project with 50 classes that compiles in 3 seconds. ps: I use...

Why do we need to install a C++ compiler? [closed]


c++,c,compilation,operating-system
I have heard that all the popular OSes (Windows, Linux, Mac OS X) are built from C++ or C. Why, then, do we need to install GCC or any C++ compiler? Shouldn't programs be compiled by the computer itself as these operating systems support C++ and C?

direct global memory access using cuda


c++,cuda
q1- lets say i have copy one array onto device through stream1 using cudaMemCpyAsync; would i be able to access the values of that array in different stream say 2? cudaMemcpyAsync(da,a,10*sizeof(float),cudaMemcpyHostToDevice,stream[0]); kernel<<<n,1,0,stream[0]>>>(da); kernel<<<n,1,0,stream[1]>>>(da){//calculation involving da} ; q2- would i have to include pointer to global memory array as argument in...

Unknown compiling error java


java,compilation,syntax-error
I have been going through some online exercises in Java, and I can't figure out why this tid-bit of code won't compile. I am forgetting something obvious I know it. import java.util.Scanner; class age { public static void main (String[] args) { Scanner keyboard = new Scanner (System.in); int age;...

Why does Hyper-Q selectively overlap async HtoD and DtoH transfer on my cc5.2 hardware?


cuda
There's an old Parallel ForAll blog post that demonstrates using streams and async memcpys to generate overlap between kernels and memcpys, and between HtoD and DtoH memcpys. So I ran the full Async sample given on my GTX Titan X, and here's the result: http://i.stack.imgur.com/rT676.png As you can see, when...

How do you build the example CUDA Thrust device sort?


c++,visual-studio-2010,sorting,cuda,thrust
I am trying to build and run the Thrust example code in Visual Studio 2010 with the latest version (7.0) of CUDA and the THURST install that comes with it. I cannot get the example code to build and run. By eliminating parts of the code, I found the problem...

cuda-memcheck fails to detect memory leak in an R package


r,memory-leaks,cuda,valgrind
I'm building CUDA-accelerated R packages, and I want to debug with cuda-memcheck. So in this minimal example (in the deliberate_memory_leak GitHub branch), I create a memory leak in someCUDAcode.c by commenting out a necessary call to cudaFree. Then, I see if cuda-memcheck can find the leak. $ cuda-memcheck --leak-check full...

Can an unsigned long long int be used to store the output from clock64()?


cuda
I need to update a global array storing clock64() from different threads atomically. All of the atomic functions in CUDA support only unsigned for long long int sizes. But the return type of clock64() is signed. Is it safe to store the output from clock64() in an unsigned?

What's the meaning of “mt” in QuantLib-vc110-mt.lib?


c++,compilation,quantlib
I compiled Quantlib library in vs2012 under "release" and got the lib file, QuantLib-vc110-mt.lib. My question is what's the meaning of "mt" in this file name? My guess is that it is related to "release". Is there any standard I could follow? Or if there is any introduction to the...

How can I pass a struct to a kernel in JCuda


java,struct,cuda,jni,jcuda
I have already looked at this http://www.javacodegeeks.com/2011/10/gpgpu-with-jcuda-good-bad-and-ugly.html which says I must modify my kernel to take only single dimensional arrays. However I refuse to believe that it is impossible to create a struct and copy it to device memory in JCuda. I would imagine the usual implementation would be to...

Prelink Error: prelink-cross: simple hello world example


c++,compilation,arm,cross-compiling,prelink
I am trying to cross-prelink a simple hello world program. I use the cross-compile toolchain arm-2012.03-57-arm-none-linux-gnueabi-i686-pc-linux-gnu and I am not sure if I have used the prelink-cross options correclty. I'll be glad if someone could point me to the right direction. More details about the source code on github. Thank...

How do I compile QScintilla and Eric6 on Linux?


python,linux,compilation,pyqt,qscintilla
First I install QScintilla by following steps: 1: cd Qt4Qt5 qmake qscintilla.pro sudo make make install 2: cd ../designer-Qt4Qt5 qmake designer.pro sudo make sudo make install 3: cd ../Python python3 configure.py --pyqt=PyQt5 sudo make And here I met the problem : QAbstractScrollArea: No such file or directory and problem: qprinter.h:...

CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments


cuda,gpgpu,gpu-programming,cublas
I've attempted to copy the matrix [1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ] stored in column-major format as x, by first copying it to a matrix in an NVIDIA GPU d_x using cublasSetMatrix, and then copying d_x to y using cublasGetMatrix(). #include<stdio.h>...

SFML 2.3 and CodeBlocks error compilation


c++,compilation,codeblocks,sfml
I'm trying to make sfml works with codeblocks. I did everything said in this video tutorial : https://www.youtube.com/watch?v=gEGWO8ug2bY Everything works if I only add SFML/Graphics.hpp, so my config isn't completely bad. But if I try to add SFML/Audio.hpp (I need to add sounds to my project) and write "sf::Music background_music;"...

Unchecked warnings are not removed with Javac @SuppressWarnings annotation


java,compilation,warnings,javac,compiler-warnings
I cannot remove my warnings with unchecked cast. I believe this is very strange because I have added the @SuppressWarnings("unchecked") annotation on the method but the Javac still showing the warnings. [unchecked] unchecked cast (List<Integer>) getObject(LIST); return (List<Integer>) getObject(LIST); required: List<Integer> found: Object I have create an Example, It throws...

Via Windows command line, how can we compile a Netbeans C/C++ application?


c++,netbeans,command-line,ant,compilation
Let's take this simple C/C++ application Netbeans project folder. In Netbeans IDE, we just hit build button on the toolbar to build the application. I want to do that automatically via Windows command line, how can I do that? I did google, and found some related posts though not very...

Palindrome Check in Java using two Methods will not compile [closed]


java,methods,compilation,palindrome
I was working on this for a class assignment, and I cannot for the life of me get it to compile. I keep getting this error: CPT236PalindromeCheckMethod.java:52: error: reached end of file while parsing } ^ 1 error I have tried adding, removing, and checking all of my braces, but...

Reduce by key on device array


cuda,parallel-processing,thrust
I am using reduce_by_key to find the number of elements in an array of type int2 which has same first values . For example Array: <1,2> <1,3> <1,4> <2,5> <2,7> so no. elements with 1 as first element are 3 and with 2 are 2. CODE: struct compare_int2 : public...

'an illegal memory access' when trying to write to a 2D array allocated using cudaMalloc3D


c,cuda
I am trying to allocate and copy memory of a flattened 2D array on to the device using cudaMalloc3D to test the performance of cudaMalloc3D. But when I try to write to the array from the kernel it throws 'an illegal memory access was encountered' exception. The program runs fine...