jar,cassandra,apache-spark,sbt , OutofMemoryErrory creating fat jar with sbt assembly


OutofMemoryErrory creating fat jar with sbt assembly

Question:

Tag: jar,cassandra,apache-spark,sbt

We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra):

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.SparkConf

object VMProcessProject {

    def main(args: Array[String]) {
        val conf = new SparkConf()
            .set("spark.cassandra.connection.host", "127.0.0.1")
             .set("spark.executor.extraClassPath", "C:\\Users\\SNCUser\\dataquest\\ScalaProjects\\lib\\spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar")
        println("got config")
        val sc = new SparkContext("spark://US-L15-0027:7077", "test", conf)
        println("Got spark context")

        val rdd = sc.cassandraTable("test_ks", "test_col")

        println("Got RDDs")

        println(rdd.count())

        val newRDD = rdd.map(x => 1)
        val count1 = newRDD.reduce((x, y) => x + y)

    }
}

We do not have a build.sbt file, instead putting jars into a lib folder and source files in the src/main/scala directory and running with sbt run. Our assembly.sbt file looks as follows:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

When we run sbt assembly we get the following error message:

...
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: java heap space
    at java.util.concurrent...

We're not sure how to change the jvm settings to increase the memory since we are using sbt assembly to make the jar. Also, if there is something egregiously wrong with how we are writing the code or building our project that'd help us out a lot too; there's been so many headaches trying to set up a basic spark program!


Answer:

I was including spark as an unmanaged dependency (putting the jar file in the lib folder) which used a lot of memory because it is a huge jar. Instead, I made a build.sbt file which included spark as a provided, unmanaged dependency. Secondly, I created the environment variable JAVA_OPTS with the value "-Xms256m -Xmx4g", which sets the minimum heap size to 256 megabytes, while allowing the heap to grow to a maximum size of 4 gigabytes. These two combined allowed me to create a jar file with 'sbt assembly'.

More info on provided dependencies:

https://github.com/sbt/sbt-assembly


Related:


Connecting from Spark/pyspark to PostgreSQL


postgresql,jdbc,jar,apache-spark,pyspark
I've installed Spark on a Windows machine and want to use it via Spyder. After some troubleshooting the basics seems to work: import os os.environ["SPARK_HOME"] = "D:\Analytics\Spark\spark-1.4.0-bin-hadoop2.6" from pyspark import SparkContext, SparkConf from pyspark.sql import SQLContext spark_config = SparkConf().setMaster("local[8]") sc = SparkContext(conf=spark_config) sqlContext = SQLContext(sc) textFile = sc.textFile("D:\\Analytics\\Spark\\spark-1.4.0-bin-hadoop2.6\\README.md") textFile.count() textFile.filter(lambda...

Is it possible to use a timestamp in ms since epoch in select statement for Cassandra?


cassandra,timestamp,cql
I know that using the formats listed here (http://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html) work to query cassandra. However, I'm having a hard time determining if it is even possible to use ms since epoch in the select statement. I feel like it should since it data can be sent to cassandra in the ms...

Error when running job that queries against Cassandra via Spark SQL through Spark Jobserver


cassandra,apache-spark,apache-spark-sql,spark-jobserver,spark-cassandra-connector
So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with...

Cassandra data model to store embedded documents


mongodb,database-design,cassandra
In mongodb we can able to store embedded documents into a collection.Then, How do we store embedded documents into cassandra??? For this sample JSON representation??? UserProfile = { name: "user profile", Dave Jones: { email: {name: "email", value: "[email protected]", timestamp: 125555555}, userName: {name: "userName", value: "Dave", timestamp: 125555555} }, Paul...

How to handle empty parameters in a main method java call


java,methods,jar,call,main
I would like to have a dynamic way of passing in parameters to a java main method call which is done via the Command Line(cmd) to a Runnable JAR file. At the moment my main() method takes 6 parameters and sets each one to a variable before calling another method...

OutofMemoryErrory creating fat jar with sbt assembly


jar,cassandra,apache-spark,sbt
We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra): import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import com.datastax.spark.connector._ import org.apache.spark.SparkConf object VMProcessProject { def main(args: Array[String]) { val conf = new SparkConf() .set("spark.cassandra.connection.host", "127.0.0.1") .set("spark.executor.extraClassPath",...

Dataframe is not saved into Cassandra


java,cassandra,apache-spark,apache-spark-sql,spark-cassandra-connector
I have one application with Spark (version 1.4.0) and Spark-Cassandra-connector (version 1.3.0-M1). In which, I am trying to store one dataframe into Cassandra table which has two columns (total, message). And i already created table into Cassandra with these two columns. Here is my Code, scoredTweet.foreachRDD(new Function2<JavaRDD<Message>,Time,Void>(){ @Override public Void...

Spark Cassandra SQL can't perform DataFrame methods on query results


scala,cassandra,apache-spark-sql,spark-cassandra-connector
So I have a Spark-Cassandra cluster that I am trying to execute sql queries on. I build a jar with sbt assembly then I submit it with spark-submit. This works fine when I am not using spark-sql. When I am using spark sql I get an error, below is the...

How to change the flush queue size of cassandra


cassandra,datastax-enterprise,datastax-java-driver
How to assign more memory for the flush queue between memtable and sstable in Cassandra. I have getting timeout errors and the heap and young region usage seems to within limits. There is no other processing happening except Cassandra in the machine. Also how to find if any requests are...

cassandra search a row by secondary index returns null


cassandra,secondary-indexes
I have created a TABLE and index As follows CREATE TABLE refresh_token ( user_id bigint, refresh_token text, access_token text, device_desc text, device_type text, expire_time timestamp, org_id bigint, PRIMARY KEY (user_id, refresh_token) ) WITH CLUSTERING ORDER BY (refresh_token ASC) CREATE INDEX i_access_token ON demodb.refresh_token (access_token); After i insert or delete data...

Preparing Cassandra SELECT Statements in Python


python,cassandra
I'm trying to run prepared select queries against a Cassandra table. The table is defined as such: class EmailAddressLookup(Model, ModelOperations, JSONSerializer): __table_name__ = 'email_address_lookup' email_address = columns.Text(primary_key=True) user_id = columns.Integer(primary_key=True) My INSERT works great. It looks like this: i_email_lookup = session.prepare("""INSERT INTO email_address_lookup (user_id, email_address) VALUES (?, ?)""") session.execute(i_email_lookup, (user_id,...

Slicing over partition rows using tuple operation in CQL


cassandra,cql,datastax
I am trying to understand the behavior of tuple operator with clustering keys. Here is what I was trying to do: create table sampletable (a int,b int,c int, d int, e int, primary key((a,b,c),d,e)); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,2); insert into sampletable(a,b,c,d,e) values(1,1,2,1,1);...

Does Spark from DSE laod all data into RDD before running SQL Query?


cassandra,apache-spark,datastax
Running DSE 4.7 So say I have a 4 node DSE Cassandra/Spark cluster... I have a Cassandra table with say 4,000,000 records in it. On Spark running the following Spark SQL "select * from table where email = ? or mobile = ?" Will Spark load all the data into...

New Datastax driver for Tableau is not working


cassandra,odbc,tableau,datastax
trying to run Tableau on top of DSE 4.7. It fails. I can't do something in worksheet or preview the data. Get this error: "Missing EOF at 'tablename_i_try_to_query' " What is the right way to fix it?...

Schedule to run a executable jar file on windows 7


jar,windows-7,scheduled-tasks
I created a task in task scheduler on Windows 7 system and made it repeatable every 10 minutes. In program, i selected the executable java jar file. But it does not run the jar file at the scheduled time. When i double click and run the jar file, it runs...

Does Cassandra works with IBM JVM


cassandra,j9
Can I install and start Cassandra into a x-linux OS with a IBM SDK for Java? Will that work? Any specific version? 2.1, 2.0 that will work ? Thanks in advance.

Why doesn't java honour the class path when executing a jar file with the -jar switch?


java,jar,classpath,rxtx
This works: $ java -cp ".:/PATH/TO/RXTXcomm.jar:./jobexftp.jar" -Djava.library.path=/usr/lib/jni com.lhf.jobexftp.StandAloneApp JObexFTP 2.0 beta (15/10/2010) Java Obex File Transfer Protocol application and library Developed under/using 100% free software. For more information access: http://www.lhf.ind.br/jobexftp/ Usage: jobexftp <serialPort> [<commands>] [<options>] ... This doesn't: $ java -cp ".:/PATH/TO/RXTXcomm.jar" -Djava.library.path=/usr/lib/jni -jar jobexftp.jar Error: A JNI error has...

Error in executing a Jar file in remote machine


java,ssh,jar
I am trying to executing a jar file which is present in remote machine. When I execute below command from my local machine I get error: ssh -i /root/.ssh/pem_file [email protected][host_ip]:/home/user/folder1/java -cp jar1.jar -a option1 -e [email protected] -f TextFile.txt /home/user/folder1/ is the location where jar file is present on remote machine.The...

Shell script with jar file at the end


java,shell,jar
I download an archive file. In the archive there will be a file that has a .sh. extension. When I opened that file with VI I found the below code in the beginning of the file: #!/bin/sh MYSELF=`which "$0" 2>/dev/null` [ $? -gt 0 -a -f "$0" ] && MYSELF="./$0"...

IntelliJ correctly using Libraries


java,intellij-idea,jar
I'm using IntelliJ without a build tool for a project, I build it as a jar. Now I'm using the iText library for PDF(s), if I put all the iText jars into a folder and add it in IntelliJ, I can use the iText functions and if the program works,...

How to delete a record in Cassandra?


cassandra,cassandra-2.0,cql3
I have a table like this: CREATE TABLE mytable ( user_id int, device_id ascii, record_time timestamp, timestamp timeuuid, info_1 text, info_2 int, PRIMARY KEY (user_id, device_id, record_time, timestamp) ); When I ask Cassandra to delete a record (an entry in the columnfamily) like this: DELETE from my_table where user_id =...

to alter or create a new table in cassandra to add new columns


database-design,cassandra,datastax,datastax-enterprise
I am using DSE cassandra. I wanted to add new attributes to the existing table. I wanted to know what is the best practice to achieve this? Should i be adding new columns to existing table or creating new table? What are the pros and cons for either approach?...

can a class in a jar use a class in a different jar


java,eclipse,maven,jar
I am new to java so not sure I fully understand jar files. I want to put some common code in a library jar, which I then use from applications that are in different jars. I have searched on this but only come up with people saying yes and then...

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))


scala,cassandra,apache-spark
Its a nested map with contents like this when i print it onto screen (5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3) (1, Map ( "DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3) I need to get something like this Case...

Package Java 1.7+ Only Classes with JAR 1.6


java,eclipse,jar,copy,compliance
I have an issue where I want to copy Files from one directory to another directory. Path source = Paths.get(files[i].getPath()); Path target = Paths.get(dir.getName() + "/" + files[i].getName()); try { Files.copy(source, target, StandardCopyOption.REPLACE_EXISTING); files[i] = target.toFile(); } catch (IOException e) { throw new IOException("\t\tUnable to move " + files[i].getName(), e);...

Access resource from inside jar


java,jar,classloader
I got a jar file test.jar that contains a folder resources which contains txtFile.txt. I'm trying to access the file but the file seems to be null. package main; import java.net.URL; public class Test { private Test() { URL file = this.getClass().getClassLoader().getResource("/resources/txtFile.txt"); System.out.println(file == null); } public static final void...

Gradle Jar structure is different than the real project structure


java,jar,gradle,build
I'm trying to build my project with gradle but for some reason the resources are put on a different level than their real level. Here's the build: apply plugin: 'java' version = '1.1' archivesBaseName = 'DesktopOmegle' repositories { //mavenCentral() maven { url 'http://oss.sonatype.org/content/repositories/snapshots/' url "http://repo1.maven.org/maven2" } } dependencies { compile...

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?


cassandra
One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup...

how to read and write SQLite database file in java (after making jar file)


java,database,sqlite,jar
I want to create a java application which will use Sq Lite database and after making jar file my application will read and write Sq lite database file. So how to achieve this.... please give me overview how to access Sq lite database file and modify it. after making jar...

Eclipse error using Maven filtering in persistence.xml


maven,jar,persistence,filtering
I have a Maven project with JPA using hibernate. I had to specify a jar file to load external classes in persistence.xml located in src/main/resources/META-INF <persistence unit name="PersistenceUnit" transaction-type="JTA" ... <jar-file>lib/${project.persistencejar}.jar</jar-file> using Maven filtering (the filename can change based on various Maven settings). I instruct then Maven to filter by...

Array Out of Bounce Exception for Jar, but not Eclipse


java,eclipse,jar
When I convert my Java project into a Jar file, an ArrayIndexOutOfBounds exception of -2 occurs in line 3 of this section of code: for (int i = 0; i < copy.get(copy.size() - 2).size(); i++) { if (!copy.get(copy.size() - 2).get(i).toString().equals(" ")) { startLocations[index] = Integer.parseInt(copy.get(copy.size() - 2).get(i).toString()); index++; } }...

Cassandra: Insert with older timestamp


cassandra,cql3
(Cassandra 2.0.9, using CQL) I've accidentally updated a row in a table which was managing its own timestamp (100 * a specific sequence number). Now, because my timestamp is the current time, none of the updates are working. I understand why this is, but I'm trying to recover from it....

Exporting Data from Cassandra to CSV file


apache,csv,cassandra,export,export-to-csv
Table Name : Product uid | productcount | term | timestamp 304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000 6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000 Command : COPY product (uid, productcount, term, timestamp) TO 'temp.csv'; Error: Improper COPY command. Am I missing something? ...

Splash screen not display when execute by Jar File but it works when execute from Netbeans IDE


java,netbeans,jar
Set application Splash Screen from Project properties -> Run Option SplashScreen which draw by StartApp Class public class StartApp { public static void main(String[] args) { new Thread(new Runnable() { public void run() { splashInit(); // initialize splash overlay drawing parameters appInit(); // simulate what an application would do before...

Extracting a zip file containing a jar file in Java


java,jar,zip,unzip
I want to extract a zip file which contains a jar file. This file has complex folder structure and in one of the folders there is a jar file. When I am trying to use the following code to extract the jar file the program goes in infinite loop in...

Error running spark app using spark-cassandra connector


cassandra,apache-spark,spark-cassandra-connector
I have written a basic spark app that reads and writes to Cassandra following this guide (https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md) This is what the .sbt for this app looks like: name := "test Project" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.2.1", "com.google.guava" % "guava" % "14.0.1",...

Using partition key along with secondary index


cassandra,nosql,bigdata,cassandra-2.0
Following are the two queries that I need to perform. select * from where dept = 100 and emp_id = 1; select * from where dept = 100 and name = 'One'; Which of the below options is better ? Option 1: Use secondary index along with a partition key....

Gradle: Adding sources.jar file within /lib folder of published dist.zip along with all my other dependencies


java,maven,jar,gradle
I have a java project that I am building using gradle. I am releasing a dist.zip folder for my project and I want to add the sources.jar for my project to the /lib subfolder within dist.zip. I am able to create a sources.jar file along with my dist.zip file but...

File Processing with Spark and Cassandra


cassandra,apache-spark
Right now I'm working on loading a table from a Cassandra cluster into a Spark cluster with the Datastax Cassandra Spark Connector. Right now the spark program performs a simple mapreduce job that counts the number of rows in the Cassandra table. Everything is set up and run locally. The...

How to update a field which is indexed?


scala,cassandra,phantom-dsl
I want to update a field in Cassandra which is indexed using phantom scala sdk like: this.update.where(_.id eqs folderId) .and(_.owner eqs owner) .modify(_.parent setTo parentId) the parent field is a indexed field in table. But the operation is not allowed when compile the code, there will have compile exception like:...

Import java package from Matlab deploytool to Android Studio App


java,android,image,matlab,jar
I managed to create a java package from a Matlab function (for image processing) using deploytool. I tested it in Eclipse and it runs perfectly. The problem is that I want to use this Matlab function for an Android Studio project and I can´t find any way to make it...

ant jar error: Execute failed: java.io.IOException: Cannot run program…${aapt}": error=2, No such file or directory


java,android,osx,unity3d,jar
I'm trying to compile a simple Java library for Unity, and after running ant jar, I get the following message: /Applications/adt-bundle-mac-x86_64-20140702/sdk/tools/ant/build.xml:649: The following error occurred while executing this line: /Applications/adt-bundle-mac-x86_64-20140702/sdk/tools/ant/build.xml:694: Execute failed: java.io.IOException: Cannot run program "/Users/****/UnityProjects/****/JavaTestPlugin/${aapt}": error=2, No such file or directory This is strange, because I've compiled this...

How to run multiple JAR files at differnt locations from a command prompt


java,jar,command-prompt
I have 2 jar files at different locations. I need to run both these jar files from a single command prompt window. I referred to many SO links and google links and created this batch job. START SET JPOS_DIR=D:\Installable\JPOS\Iso8583jPOSJavaAgent-0.0.11 cd /D %JPOS_DIR% "C:\Program Files\Java\jdk1.7.0_75\bin\java" -cp Iso8583jPOSJavaAgent-0.0.11.jar com.hp.sv.iso8583.jpos.ISOForwarderMain START SET JPOS_DIR=D:\Installable\JPOS_Instance-2\Iso8583jPOSJavaAgent-0.0.11...

Apache Cassandra - cqlsh operation timeout


cassandra,cqlsh
I am trying to start cqlsh and this is what I get: /bin$ ./cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=None, last_host=None',)}) I tried removing ~/.cassandra, did not work. I also compared cassandra.yaml with a version that worked. Any ideas?...

Select first N rows of Cassandra table


cassandra,cql
As stated in this doc to select a range of rows i have to write this: select first 100 col1..colN from table; but when I launch this on cql shell I get this error: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:13 no viable alternative at input '100' (select...

how to create a jar file from android studio


java,android,jar,aar
I have a fairly latest version of android studio, I have created a module under a project which is basically supposed to be a library, when I build it, it creates an ".aar" file , what I want is .jar file as this library is supposed to be used with...

Timeout using SSTableloader for Cassandra Aws Instance


amazon-ec2,cassandra
I'm trying to use sstableloader to load SSTable (.db) files into a Cassandra Cluster running on an AWS EC2 instance. This error occurrs: Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of C:\Users\SNCUser\dataquest\CassandraLoader\WrDir\beed5b97-0b52-45d7-be5d-fbbac00ac607\device_data\blob\device_data-blob-ka-1-Data.db to [/172.*.*.*] ERROR 16:08:36 [Stream #1114a0d0-1054-11e5-9ccc-65ee5fdd8902] Streaming error occurred java.net.ConnectException:...

NoClassDefFoundError() with custom Kafka Producer


java,eclipse,jar,apache-kafka,kafka-consumer-api
I am trying to include Kafka module in my project. I have added the following jars as external jar libraries in eclipse and have also update the build.xml to include the references to the jar: kafka-clients-0.8.2.0.jar kafka_2.10-0.8.2.0.jar scala-library-2.10.4.jar I wrote a sample Producer class public class KafkaWriteRequestProducer extends Thread {...

dse cassandra solr doesnt return _uniqueKey in response


solr,cassandra,datastax,datastax-enterprise
Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...