cassandra,datastax-enterprise,datastax-java-driver , How to change the flush queue size of cassandra


How to change the flush queue size of cassandra

Question:

Tag: cassandra,datastax-enterprise,datastax-java-driver

How to assign more memory for the flush queue between memtable and sstable in Cassandra. I have getting timeout errors and the heap and young region usage seems to within limits. There is no other processing happening except Cassandra in the machine. Also how to find if any requests are dropped at the ethernet card and not by Cassandra. I use Cassandra Datastax 4.7 and Java 1.8.


Answer:

How to change the flush queue size of cassandra

You can increase the number of flush writers by editing memtable_flush_writers in your cassandra.yaml file. See the related docs.

How to assign more memory for the flush queue between memtable and sstable in Cassandra.

You don't assign memory to a queue. You can only increase the threads for the queue. You can increase the memory for the JVM in it's entirety. Heap size for Cassandra is usually recommended to be 8GB though some will go higher with specific JVM tuning.

Also how to find if any requests are dropped at the ethernet card and not by Cassandra. Depending on your Consistency Level, your reads or writes may fail if not enough replicas acknowledge back.

If you're interested on network packets dropped specifically take a look at the output of:

# ifconfig
eth0      Link encap:Ethernet  HWaddr XX:34:XX:XX:10:03
          inet addr:xxx.xxx.xxx.xxx  Bcast:xxx.xxx.xxx.xxx  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:89549235 errors:0 dropped:0 overruns:0 frame:0
          TX packets:100307025 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:757464549909 (757.4 GB)  TX bytes:27216611814 (27.2 GB)

ifconfig keeps counts of packets that have errored out, been dropped, etc. at the NIC level.


Related:


How to get good performance on reading cassandra partitions in spark?


scala,cassandra,apache-spark,datastax-enterprise
I am reading data from cassandra partition to spark using cassandra-connector.I tried below solutions for reading partitions.I tried to parallelize the task by creating rdds as much as possible but both solution ONE and solution TWO had same performance . In solution ONE , I could see the stages in...

Is it possible to use a timestamp in ms since epoch in select statement for Cassandra?


cassandra,timestamp,cql
I know that using the formats listed here (http://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html) work to query cassandra. However, I'm having a hard time determining if it is even possible to use ms since epoch in the select statement. I feel like it should since it data can be sent to cassandra in the ms...

New Datastax driver for Tableau is not working


cassandra,odbc,tableau,datastax
trying to run Tableau on top of DSE 4.7. It fails. I can't do something in worksheet or preview the data. Get this error: "Missing EOF at 'tablename_i_try_to_query' " What is the right way to fix it?...

Apache Cassandra - cqlsh operation timeout


cassandra,cqlsh
I am trying to start cqlsh and this is what I get: /bin$ ./cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=None, last_host=None',)}) I tried removing ~/.cassandra, did not work. I also compared cassandra.yaml with a version that worked. Any ideas?...

Unbalanced Cassandra replicas storage


java,cassandra,datastax
In our setup we have 2 DCs, 21 Cassandra nodes in each DC, and a total of 4 replicas per record (in one of the keyspaces) - two replicas per site. Every Cassandra node is setup with 16 VNodes. We did not manually set the initial_token for each node in...

Cassandra with uneven hardware, how to configure?


cassandra,cassandra-2.0
We are building a Cassandra (2.1.5) cluster for storing large amount of timeseries data, and we are planning to utilize existing hardware, problem is the hardware available is really different. 2 machines with: 4 core, 8 GB, SSD 2 machines with: 8 core, 16 GB, SSD 2 machines with: 32...

How to update a field which is indexed?


scala,cassandra,phantom-dsl
I want to update a field in Cassandra which is indexed using phantom scala sdk like: this.update.where(_.id eqs folderId) .and(_.owner eqs owner) .modify(_.parent setTo parentId) the parent field is a indexed field in table. But the operation is not allowed when compile the code, there will have compile exception like:...

Cassandra: Insert with older timestamp


cassandra,cql3
(Cassandra 2.0.9, using CQL) I've accidentally updated a row in a table which was managing its own timestamp (100 * a specific sequence number). Now, because my timestamp is the current time, none of the updates are working. I understand why this is, but I'm trying to recover from it....

How to export data from Cassandra to mongodb?


java,mongodb,cassandra,export,storm
I am using Apache (Kafka-Storm-Cassandra) for real time processing.The problem I am facing is that I can't use aggregation queries on Cassandra directly(Datastax can be used but it is a paid service).Moreover, I also considered using mongodb but It is not good for more and frequent writes. So, I am...

Titan start fails: management.properties not found


cassandra,titan
I downloaded and unziped the titan.zip and used the command ./titan.sh -v start. Now I get the output: ./titan.sh -v start Forking Cassandra... OpenJDK 64-Bit Server VM warning: The UseParNewGC flag is deprecated and will likely be removed in a future release Running nodetool statusthrift.Error: Config file not found: /usr/lib64/jvm/java-1.9.0-openjdk-1.9.0/jre/conf/management/management.properties...

Using partition key along with secondary index


cassandra,nosql,bigdata,cassandra-2.0
Following are the two queries that I need to perform. select * from where dept = 100 and emp_id = 1; select * from where dept = 100 and name = 'One'; Which of the below options is better ? Option 1: Use secondary index along with a partition key....

Dataframe is not saved into Cassandra


java,cassandra,apache-spark,apache-spark-sql,spark-cassandra-connector
I have one application with Spark (version 1.4.0) and Spark-Cassandra-connector (version 1.3.0-M1). In which, I am trying to store one dataframe into Cassandra table which has two columns (total, message). And i already created table into Cassandra with these two columns. Here is my Code, scoredTweet.foreachRDD(new Function2<JavaRDD<Message>,Time,Void>(){ @Override public Void...

dse cassandra solr doesnt return _uniqueKey in response


solr,cassandra,datastax,datastax-enterprise
Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...

Error when running job that queries against Cassandra via Spark SQL through Spark Jobserver


cassandra,apache-spark,apache-spark-sql,spark-jobserver,spark-cassandra-connector
So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with...

ImportError: No module named cassandra


django,cassandra,install
I'm trying to get Cassandra to work with Django using Windows 7 for the first time. I've installed Django, Cassandra and django_cassandra_engine. However when I open a new Django project I get the following errors. C:\Python27\python.exe -u C:\Program Files (x86)\JetBrains\PyCharm 4.5.1\helpers\pydev\pydevconsole.py 54467 54468 PyDev console: starting. import sys; print('Python %s...

Where are the API docs for org.apache.spark.sql.cassandra for Spark 1.3.x?


cassandra,apache-spark,apache-spark-sql
I'm writing a Spark job that uses Spark-Cassandra connector to connect to Cassandra from spark and then runs queries on Spark/Cassandra using Spark SQL. I was wondering where I could find the API docs for this? Looking at the api here https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.package It would seem like the package doesn't even...

Can't install Previous version of Datastax DSE


datastax,datastax-enterprise
trying to: apt-get install 'dse-full=4.6.5-1' and getting: failed: [dsenode01] => {"failed": true} stderr: E: Unable to correct problems, you have held broken packages. stdout: Reading package lists... Building dependency tree... Reading state information... Some packages could not be installed. This may mean that you have requested an impossible situation or...

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?


cassandra
One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup...

Preparing Cassandra SELECT Statements in Python


python,cassandra
I'm trying to run prepared select queries against a Cassandra table. The table is defined as such: class EmailAddressLookup(Model, ModelOperations, JSONSerializer): __table_name__ = 'email_address_lookup' email_address = columns.Text(primary_key=True) user_id = columns.Integer(primary_key=True) My INSERT works great. It looks like this: i_email_lookup = session.prepare("""INSERT INTO email_address_lookup (user_id, email_address) VALUES (?, ?)""") session.execute(i_email_lookup, (user_id,...

Does Spark from DSE laod all data into RDD before running SQL Query?


cassandra,apache-spark,datastax
Running DSE 4.7 So say I have a 4 node DSE Cassandra/Spark cluster... I have a Cassandra table with say 4,000,000 records in it. On Spark running the following Spark SQL "select * from table where email = ? or mobile = ?" Will Spark load all the data into...

Commenting Cassandra's keyspace, table, column


cassandra,documentation
In Oracle there is possibility to add a comment about a table, view, materialized view, or column into the data dictionary, e.g. COMMENT ON COLUMN employees.job_id IS 'abbreviated job title'; I found this particularly usefull as a tester when trying to understand ideas behind the names which are not necessarily...

Cassandra data model to store embedded documents


mongodb,database-design,cassandra
In mongodb we can able to store embedded documents into a collection.Then, How do we store embedded documents into cassandra??? For this sample JSON representation??? UserProfile = { name: "user profile", Dave Jones: { email: {name: "email", value: "[email protected]", timestamp: 125555555}, userName: {name: "userName", value: "Dave", timestamp: 125555555} }, Paul...

Weird dse hive integration in DSE 4.7


hadoop,hive,datastax,datastax-enterprise
I'm trying to run Hive query over existing C* table. Here is my C* table definition: drop table IF EXISTS mydata.site_users; CREATE TABLE IF NOT EXISTS appdata.site_users ( user_id text, user_test_uuid uuid, --for testing purposes, if we can use it in queries, there could be some serde problems? user_name text,...

Select first N rows of Cassandra table


cassandra,cql
As stated in this doc to select a range of rows i have to write this: select first 100 col1..colN from table; but when I launch this on cql shell I get this error: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:13 no viable alternative at input '100' (select...

Partially indexing Cassandra table with SOLR


solr,cassandra,datastax-enterprise
One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr. So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause"...

gremlin hangs with sigle node hbase,titan-all 0.4.4


cassandra,hbase,analytics,zookeeper,titan
I have set up single node hadoop and hbase onto it. I also set up titan onto it. But as soon as I start gremlin and do TitanFactory.open(conf) , it hangs and nothing happens. my titan-hbase.properties is as follows: storage.backend=hbase storage.hostname=127.0.0.1 storage.port=2181 cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time =...

Timeout using SSTableloader for Cassandra Aws Instance


amazon-ec2,cassandra
I'm trying to use sstableloader to load SSTable (.db) files into a Cassandra Cluster running on an AWS EC2 instance. This error occurrs: Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of C:\Users\SNCUser\dataquest\CassandraLoader\WrDir\beed5b97-0b52-45d7-be5d-fbbac00ac607\device_data\blob\device_data-blob-ka-1-Data.db to [/172.*.*.*] ERROR 16:08:36 [Stream #1114a0d0-1054-11e5-9ccc-65ee5fdd8902] Streaming error occurred java.net.ConnectException:...

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))


scala,cassandra,apache-spark
Its a nested map with contents like this when i print it onto screen (5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3) (1, Map ( "DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3) I need to get something like this Case...

Slicing over partition rows using tuple operation in CQL


cassandra,cql,datastax
I am trying to understand the behavior of tuple operator with clustering keys. Here is what I was trying to do: create table sampletable (a int,b int,c int, d int, e int, primary key((a,b,c),d,e)); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,2); insert into sampletable(a,b,c,d,e) values(1,1,2,1,1);...

how to create a keyspace in cassandra?


java,eclipse,cassandra,apache-spark
i'm using a snippet to understand cassandra and syntax: import com.datastax.driver.core.Cluster; import com.datastax.driver.core.ResultSet; import com.datastax.driver.core.Row; import com.datastax.driver.core.Session; public class App { public static void main(String[] args) { Cluster cluster; Session session; // Connect to the cluster and key space "demo" cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); session = cluster.connect("demo"); // Insert one record...

java.sql.SQLNonTransientConnectionException:Keyspace names must be composed of alphanumerics and underscores (parsed: '')


java,database,eclipse,hadoop,cassandra
I'm trying connect to cassandra db and verify users to login and sign up I'm getting this error: Keyspace names must be composed of alphanumerics and underscores (parsed: '') at org.apache.cassandra.cql.jdbc.Utils.parseURL(Utils.java:195) at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:85) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at com.rest.inndata.services.ConnectCassandra.createConnection(ConnectCassandra.java:56)...

File Processing with Spark and Cassandra


cassandra,apache-spark
Right now I'm working on loading a table from a Cassandra cluster into a Spark cluster with the Datastax Cassandra Spark Connector. Right now the spark program performs a simple mapreduce job that counts the number of rows in the Cassandra table. Everything is set up and run locally. The...

Lucene how to index in Database (Cassandra)


java,indexing,lucene,cassandra
I am just experimenting with Lucene and want to indexing objects in Database(Cassandra) as a table. But, I didnt realized out, how the indexing does work on Cassandra. Especially searching... When i take a simple Example Indexing in Lucene: Document doc = new Document(); doc.add(new TextField("id", "Hotel-1345", Field.Store.YES)); doc.add(new TextField("description",...

OutofMemoryErrory creating fat jar with sbt assembly


jar,cassandra,apache-spark,sbt
We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra): import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import com.datastax.spark.connector._ import org.apache.spark.SparkConf object VMProcessProject { def main(args: Array[String]) { val conf = new SparkConf() .set("spark.cassandra.connection.host", "127.0.0.1") .set("spark.executor.extraClassPath",...

How to change the flush queue size of cassandra


cassandra,datastax-enterprise,datastax-java-driver
How to assign more memory for the flush queue between memtable and sstable in Cassandra. I have getting timeout errors and the heap and young region usage seems to within limits. There is no other processing happening except Cassandra in the machine. Also how to find if any requests are...

How to delete a record in Cassandra?


cassandra,cassandra-2.0,cql3
I have a table like this: CREATE TABLE mytable ( user_id int, device_id ascii, record_time timestamp, timestamp timeuuid, info_1 text, info_2 int, PRIMARY KEY (user_id, device_id, record_time, timestamp) ); When I ask Cassandra to delete a record (an entry in the columnfamily) like this: DELETE from my_table where user_id =...

Error running spark app using spark-cassandra connector


cassandra,apache-spark,spark-cassandra-connector
I have written a basic spark app that reads and writes to Cassandra following this guide (https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md) This is what the .sbt for this app looks like: name := "test Project" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.2.1", "com.google.guava" % "guava" % "14.0.1",...

Is it possible to configure OpsCenter Backup Service differently per datacenter?


datastax-enterprise,opscenter
DSE 4.5.8, OpsCenter 5.1.3. We are running a multi-region cluster, with 6-nodes running in one DC, and 1 node running as a backup in a remote DC. RF is 3 in DC1, 1 in DC2. After enabling the OpsCenter backup service, the single node in the remote DC is reaching...

Exporting Data from Cassandra to CSV file


apache,csv,cassandra,export,export-to-csv
Table Name : Product uid | productcount | term | timestamp 304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000 6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000 Command : COPY product (uid, productcount, term, timestamp) TO 'temp.csv'; Error: Improper COPY command. Am I missing something? ...

cassandra search a row by secondary index returns null


cassandra,secondary-indexes
I have created a TABLE and index As follows CREATE TABLE refresh_token ( user_id bigint, refresh_token text, access_token text, device_desc text, device_type text, expire_time timestamp, org_id bigint, PRIMARY KEY (user_id, refresh_token) ) WITH CLUSTERING ORDER BY (refresh_token ASC) CREATE INDEX i_access_token ON demodb.refresh_token (access_token); After i insert or delete data...

Where can I observer writes to Cassandra database, aka where are they logged?


cassandra,datastax-enterprise
Trying to track down a problem with one of our developers, mainly a program he wrote that modifies (adds some flags) to existing entries in the various tables in our Cassandra keyspace. The issue is that it seems to work just fine for many of the tables, but at least...

to alter or create a new table in cassandra to add new columns


database-design,cassandra,datastax,datastax-enterprise
I am using DSE cassandra. I wanted to add new attributes to the existing table. I wanted to know what is the best practice to achieve this? Should i be adding new columns to existing table or creating new table? What are the pros and cons for either approach?...

Does Cassandra works with IBM JVM


cassandra,j9
Can I install and start Cassandra into a x-linux OS with a IBM SDK for Java? Will that work? Any specific version? 2.1, 2.0 that will work ? Thanks in advance.

Cassandra WordCount Hadoop


hadoop,cassandra
Can anyone explain to me the following lines from Cassandra 2.1.15 WordCount example? CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "3"); CqlConfigHelper.setInputCql(job.getConfiguration(), "select * from " + COLUMN_FAMILY + " where token(id) > ? and token(id) <= ? allow filtering"); How do I define concrete values which will be used to replace "?" in the query?...

Spark Cassandra SQL can't perform DataFrame methods on query results


scala,cassandra,apache-spark-sql,spark-cassandra-connector
So I have a Spark-Cassandra cluster that I am trying to execute sql queries on. I build a jar with sbt assembly then I submit it with spark-submit. This works fine when I am not using spark-sql. When I am using spark sql I get an error, below is the...