FAQ Database Discussion Community


How to export data from Cassandra to mongodb?

java,mongodb,cassandra,export,storm
I am using Apache (Kafka-Storm-Cassandra) for real time processing.The problem I am facing is that I can't use aggregation queries on Cassandra directly(Datastax can be used but it is a paid service).Moreover, I also considered using mongodb but It is not good for more and frequent writes. So, I am...

Out of memory error in Cassandra when querying big rows containing a collection (set)

java,cassandra,out-of-memory,datastax,cql3
I am using Cassandra 2.0.8 and I have got a cql3 table defined like this: CREATE TABLE search_scf_tdr ( fieldname text, fieldvalue text, scalability int, timestamptdr bigint, tdrkeys set<blob>, PRIMARY KEY ((fieldname, fieldvalue, scalability), timestamptdr) ) I use a replication factor of 2 per DC for this keyspace. I am...

Cassandra SSL with own Certificate Authority

java,ssl,cassandra
I want to setup my own CA for use with a cassandra cluster so that I do not have to copy all of the certificates around every time I add a new node. I have read a few tutorials for Cassandra and SSL but they all work with copying certificates...

storm - handling exceptions in bolt.execute

cassandra,storm
Im using storm to process a stream wherein one of the bolts is writing to cassandra. The cassandra session.execute() command can throw an exception and I'm wondering about trapping this to 'fail' the tuple so it gets retried. The docs for IRichBolt don't show it throwing anything so I'm wondering...

How to update a field which is indexed?

scala,cassandra,phantom-dsl
I want to update a field in Cassandra which is indexed using phantom scala sdk like: this.update.where(_.id eqs folderId) .and(_.owner eqs owner) .modify(_.parent setTo parentId) the parent field is a indexed field in table. But the operation is not allowed when compile the code, there will have compile exception like:...

Commenting Cassandra's keyspace, table, column

cassandra,documentation
In Oracle there is possibility to add a comment about a table, view, materialized view, or column into the data dictionary, e.g. COMMENT ON COLUMN employees.job_id IS 'abbreviated job title'; I found this particularly usefull as a tester when trying to understand ideas behind the names which are not necessarily...

Modeling account for rest communication cassandra

cassandra,data-modeling
I need to model account (first name, last name, email as username etc.) in cassandra along with currently active token. My initial idea was to create account_by_email which will have skinny row partitioned by email with static columns and have clustering by access_token (and maybe TTL) and than you can...

Select first N rows of Cassandra table

cassandra,cql
As stated in this doc to select a range of rows i have to write this: select first 100 col1..colN from table; but when I launch this on cql shell I get this error: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:13 no viable alternative at input '100' (select...

Executing a cassandra insert query through Python multiprocessing queue

python,cassandra,queue,multiprocessing
I have a cassandra keyspace sujata.I am connecting to cassandra using python driver cassandra.cluster.The column family of sujata is hello. Following is my code:- from multiprocessing import Process,Queue from cassandra.cluster import Cluster import os queue=Queue() cluster = Cluster(['127.0.0.1']) metadata = cluster.metadata session = cluster.connect("sujata") def hi(): global session global queue...

How do I model multiple “many to many” relationships in Cassandra?

cassandra,nosql,schema
I've been reading up on Cassandra, I've done some tutorials and played around with CQL but now that it is time for me to design a schema I'm having some difficulty. I'm trying to create a schema that will handle the following use case. I need to keep track of...

Cassandra - querying on clustering keys

cassandra,primary-key,cql,clustering-key
I am just getting start on Cassandra and I was trying to create tables with different partition and clustering keys to see how they can be queried differently. I created a table with primary key of the form - (a),b,c where a is the partition key and b,c are clustering...

how to run asynchronous queries with Spring

spring,cassandra,spring-batch
I need to use asynchronous queries using Spring framework. I use Cassandra and Java driver from Datastax. How can call the executeAsync method and get the results.

Partially indexing Cassandra table with SOLR

solr,cassandra,datastax-enterprise
One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr. So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause"...

Store countByKey result into Cassandra

cassandra,apache-spark
I want to count the number of IndicatePresence messages for each user for any given day (out of a Cassandra table), and then store this in a separate Cassandra table to drive some dashboard pages. I managed to get the 'countByKey' working, but now cannot figure out how to use...

cqlengine model default values and uuid

python,cassandra,cqlengine
Working with cqlengine models i have found an unexpected behaviour with default_values and uuid. I am using python 3.4, and cqlengine from cassandra-driver 2.5.0. with following code: from cassandra.cqlengine import columns, connection, management from cassandra.cqlengine.models import Model import uuid class Person(Model): id = columns.UUID(primary_key=True, default=uuid.uuid4()) first_name = columns.Text() last_name =...

Apache Cassandra - cqlsh operation timeout

cassandra,cqlsh
I am trying to start cqlsh and this is what I get: /bin$ ./cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=None, last_host=None',)}) I tried removing ~/.cassandra, did not work. I also compared cassandra.yaml with a version that worked. Any ideas?...

Order By any field in Cassandra

sorting,cassandra
I am researching cassandra as a possible solution for my up coming project. The more I research the more I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created. Is it possible to sort on any...

Spark Cassandra SQL can't perform DataFrame methods on query results

scala,cassandra,apache-spark-sql,spark-cassandra-connector
So I have a Spark-Cassandra cluster that I am trying to execute sql queries on. I build a jar with sbt assembly then I submit it with spark-submit. This works fine when I am not using spark-sql. When I am using spark sql I get an error, below is the...

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?

cassandra
One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup...

Cassandra: Insert with older timestamp

cassandra,cql3
(Cassandra 2.0.9, using CQL) I've accidentally updated a row in a table which was managing its own timestamp (100 * a specific sequence number). Now, because my timestamp is the current time, none of the updates are working. I understand why this is, but I'm trying to recover from it....

Cassandra: Selecting a Range of TimeUUIDs using the DataStax Java Driver

java,cassandra,datastax,datastax-java-driver
The use case that we are working to solve with Cassandra is this: We need to retrieve a list of entity UUIDs that have been updated within a certain time range within the last 90 days. Imagine that we're building a document tracking system, so our relevant entity is a...

How to generate UUID(Long) using cassandra timestamp in cluster environment?

cassandra,uuid
I have the requirement where we need to generate UUID as Long value using Java based on Cassandra timestamp which is in cluster. Can anyone help how to geranate it using java and cassandra cluster timestamp combination?

Coordinator node timed out waiting for replica nodes in Cassandra Datastax while insert data

cassandra,cql,datastax-enterprise
When I try to Insert data in Cassandra using the below query I am getting the below mentioned error cqlsh:assign> insert into tblFiles1(rec_no,clientid,contenttype,datafiles,filename) values(1,2,'gd','dgfsdg','aww'); WriteTimeout: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} My...

No viable alternative at input 'mytable1' cassanda cql python session.execute error

python,session,cassandra,cql,execute
I'm trying to execute a simple cql query in python and I keep getting an error. table1 = "mytable1" table2 = "mytable2" query1 = "SELECT * FROM %s" table1Rows = session.execute(query1, (table1,)) table2Rows = session.execute(query1, (table2,)) The table variables are actually passed in as arguments but I just made my...

to alter or create a new table in cassandra to add new columns

database-design,cassandra,datastax,datastax-enterprise
I am using DSE cassandra. I wanted to add new attributes to the existing table. I wanted to know what is the best practice to achieve this? Should i be adding new columns to existing table or creating new table? What are the pros and cons for either approach?...

java.sql.SQLNonTransientConnectionException:Keyspace names must be composed of alphanumerics and underscores (parsed: '')

java,database,eclipse,hadoop,cassandra
I'm trying connect to cassandra db and verify users to login and sign up I'm getting this error: Keyspace names must be composed of alphanumerics and underscores (parsed: '') at org.apache.cassandra.cql.jdbc.Utils.parseURL(Utils.java:195) at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:85) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at com.rest.inndata.services.ConnectCassandra.createConnection(ConnectCassandra.java:56)...

Does Cassandra works with IBM JVM

cassandra,j9
Can I install and start Cassandra into a x-linux OS with a IBM SDK for Java? Will that work? Any specific version? 2.1, 2.0 that will work ? Thanks in advance.

Cassandra cleanup on several servers at once

cassandra,cassandra-2.0,nodetool
We have a big Cassandra cluster 18 Servers (on one server near 5T data ) http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html - We have added a new nodes following this documentation . After we have added new servers, we began the process of cleaning data (nodetool cleanup) In the documentation advise: After all new nodes...

Cassandra add TTL to existing entries

cassandra,cassandra-2.0,cqlsh,ttl
How can I update an entire table and set a TTL for every entry? Current Scenario (Cassandra 2.0.11): table: CREATE TABLE external_users ( external_id text, type int, user_id text, PRIMARY KEY (external_id, type) ) currently there are ~40mio entries in this table and i want to add a TTL for...

Count Number of Users in Cassandra column family?

java,cassandra,datastax-java-driver,cqlsh
I have a table like this in Cassandra- CREATE TABLE DATA_HOLDER (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); I want to count distinct USER_ID in my above table? Is there any way I can do that? My Cassandra version is: [cqlsh 4.1.1 | Cassandra 2.0.10.71 | DSE...

Too slow to connect to Cassandra cluster with datastax java driver 2.1.4

cassandra,datastax-java-driver
I have set up a cassandra cluster with only two nodes on two servers with identical hardware configuration in the same internal network recently. It works well with cqlsh, everything seems perfect. Then I followed the code example in datastax's website to write java code to work with the cluster...

Load connecting tables from Cassandra in QlikView with DataSatx ODBC

cassandra,datastax,qlikview
I am new to both Cassandra (2.0) and QlikView (11). I have two keyspaces (tables) with large amount of data in Cassandra and I want to load them to QlikView. Since I can not load the entire set, filtering is necessary. // In QlikView's edit script ODBC CONNECT TO [DataStax...

Cassandra/Parquet union RDD

cassandra,apache-spark
I am just getting started with spark-cassandra connector and am running into the following issue: I have a dataset that is partially in cassandra, partially in HDFS(same exact schema). I would like to create a single UnionRDD of the two sets and proceed from there. The code I have so...

Cassandra data model to store embedded documents

mongodb,database-design,cassandra
In mongodb we can able to store embedded documents into a collection.Then, How do we store embedded documents into cassandra??? For this sample JSON representation??? UserProfile = { name: "user profile", Dave Jones: { email: {name: "email", value: "[email protected]", timestamp: 125555555}, userName: {name: "userName", value: "Dave", timestamp: 125555555} }, Paul...

Can Cassandra LWTs have an IF part in terms of a PRIMARY KEY

java,transactions,cassandra,datastax-java-driver
Do Cassandra lightweight transactions use an implied SET, and so can not have an IF part in terms of a primary key? I ask because the Datastax Cassandra Java driver (version 2.1.5) throws an InvalidQueryException while I'm preparing a statement. The exception message is clear enough: PRIMARY KEY part name...

Where are the API docs for org.apache.spark.sql.cassandra for Spark 1.3.x?

cassandra,apache-spark,apache-spark-sql
I'm writing a Spark job that uses Spark-Cassandra connector to connect to Cassandra from spark and then runs queries on Spark/Cassandra using Spark SQL. I was wondering where I could find the API docs for this? Looking at the api here https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.package It would seem like the package doesn't even...

Cassandra with uneven hardware, how to configure?

cassandra,cassandra-2.0
We are building a Cassandra (2.1.5) cluster for storing large amount of timeseries data, and we are planning to utilize existing hardware, problem is the hardware available is really different. 2 machines with: 4 core, 8 GB, SSD 2 machines with: 8 core, 16 GB, SSD 2 machines with: 32...

Cassandra WordCount Hadoop

hadoop,cassandra
Can anyone explain to me the following lines from Cassandra 2.1.15 WordCount example? CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "3"); CqlConfigHelper.setInputCql(job.getConfiguration(), "select * from " + COLUMN_FAMILY + " where token(id) > ? and token(id) <= ? allow filtering"); How do I define concrete values which will be used to replace "?" in the query?...

ImportError: No module named cassandra

django,cassandra,install
I'm trying to get Cassandra to work with Django using Windows 7 for the first time. I've installed Django, Cassandra and django_cassandra_engine. However when I open a new Django project I get the following errors. C:\Python27\python.exe -u C:\Program Files (x86)\JetBrains\PyCharm 4.5.1\helpers\pydev\pydevconsole.py 54467 54468 PyDev console: starting. import sys; print('Python %s...

How to get good performance on reading cassandra partitions in spark?

scala,cassandra,apache-spark,datastax-enterprise
I am reading data from cassandra partition to spark using cassandra-connector.I tried below solutions for reading partitions.I tried to parallelize the task by creating rdds as much as possible but both solution ONE and solution TWO had same performance . In solution ONE , I could see the stages in...

Unbalanced Cassandra replicas storage

java,cassandra,datastax
In our setup we have 2 DCs, 21 Cassandra nodes in each DC, and a total of 4 replicas per record (in one of the keyspaces) - two replicas per site. Every Cassandra node is setup with 16 VNodes. We did not manually set the initial_token for each node in...

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

scala,cassandra,apache-spark
Its a nested map with contents like this when i print it onto screen (5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3) (1, Map ( "DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3) I need to get something like this Case...

Error running spark app using spark-cassandra connector

cassandra,apache-spark,spark-cassandra-connector
I have written a basic spark app that reads and writes to Cassandra following this guide (https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md) This is what the .sbt for this app looks like: name := "test Project" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.2.1", "com.google.guava" % "guava" % "14.0.1",...

Cassandra Error: “Unable to complete request: one or more nodes were unavailable.”

cassandra,theory,distributed-database
I am a complete newbie at Cassandra and am just setting it up and playing around with it and testing different scenarios using cqlsh. I currently have 4 nodes in 2 datacenters something like this (with proper IPs of course): a.b.c.d=DC1:RACK1 a.b.c.d=DC1:RACK1 a.b.c.d=DC2:RACK1 a.b.c.d=DC2:RACK1 default=DCX:RACKX Everything seems to make sense...

Using partition key along with secondary index

cassandra,nosql,bigdata,cassandra-2.0
Following are the two queries that I need to perform. select * from where dept = 100 and emp_id = 1; select * from where dept = 100 and name = 'One'; Which of the below options is better ? Option 1: Use secondary index along with a partition key....

Preparing Cassandra SELECT Statements in Python

python,cassandra
I'm trying to run prepared select queries against a Cassandra table. The table is defined as such: class EmailAddressLookup(Model, ModelOperations, JSONSerializer): __table_name__ = 'email_address_lookup' email_address = columns.Text(primary_key=True) user_id = columns.Integer(primary_key=True) My INSERT works great. It looks like this: i_email_lookup = session.prepare("""INSERT INTO email_address_lookup (user_id, email_address) VALUES (?, ?)""") session.execute(i_email_lookup, (user_id,...

Location of datastax opscenter dashboards

cassandra,datastax,opscenter
Just a simple question: where does datastax opscenter store it`s dashboards? Is it stored in the opscenter keyspace or as file on the filesystem? Could not find anything...

update cassandra field using string concatenation

cassandra,datastax,cqlsh
I am trying to update an existing string column in cassandra table. For example i want to append domain id in front of username. Following is my table id, username 1, agaikwad 2, xyz I want to write cql to update above table to reflect following id, username 1, homeoffice\\agaikwad...

Cassandra-Hadoop integration

hadoop,cassandra
I'm interested in integrating Cassandra and Hadoop, more precisely, using Cassandra as input for Hadoop jobs. Each Cassandra node is also a Hadoop node. I found these tutorials 1 and 2 that somewhat explain the integration. I'm new to Cassandra so I'm still figuring out some things. My question is...

Handling All Nodes Down in Cassandra when using DataStax C# Driver

c#,azure,cassandra,datastax-enterprise
I have a simple two node cluster setup in Azure and I'm using the DataStax C# driver to connect to it. I am able to take a single node down without any issues. As long as I have one node running all is well. However if I take both nodes...

How to model data in Cassandra for last 100 events for a customer

cassandra,data-modeling,cql
We have multiple customers with each customer running multiple sensors. Each sensor logs data frequently (event 20s). How do I create a data model in Cassandra to answer this query? We have thought of the data model like this for other queries: Create Table Data{ CustomerId, SensorId, Date, DataTime SensorData1,...

Repartitioning of data in Cassandra when adding new servers

cassandra,cassandra-2.0
Let's imagine I have a Cassandra cluster with 3 nodes, each having 100GB of available hard disk space. Replication Factor for this cluster is set to 3 and R/W CLs are set to 2, meaning I can tolerate one of my nodes going down without sacrificing consistency or availability. Now...

OpsCenter is having trouble connecting with the agents

azure,cassandra,ubuntu-14.04,datastax-enterprise,opscenter
I am trying to setup a two node cluster in Cassandra. I am able to get my nodes to connect fine as far as I can tell. When I run nodetool status it shows both my nodes in the same data center and same rack. I can also run cqlsh...

spring-cassandra UDT data insert

java,annotations,cassandra,datastax-java-driver,spring-data-cassandra
Can someone help me to find out how to insert Cassandra UDT data using Spring POJO class? I created one POJO class to map Cassandra's Table and Created one another class for Cassandra UDT, but when i inserted main POJO class that map Cassandra's table than it's not recognized another...

Error when running job that queries against Cassandra via Spark SQL through Spark Jobserver

cassandra,apache-spark,apache-spark-sql,spark-jobserver,spark-cassandra-connector
So I'm trying to run job that simply runs a query against cassandra using spark-sql, the job is submitted fine and the job starts fine. This code works when it is not being run through spark jobserver (when simply using spark submit). Could someone tell my what is wrong with...

How to change the flush queue size of cassandra

cassandra,datastax-enterprise,datastax-java-driver
How to assign more memory for the flush queue between memtable and sstable in Cassandra. I have getting timeout errors and the heap and young region usage seems to within limits. There is no other processing happening except Cassandra in the machine. Also how to find if any requests are...

Columns and rows in casandra datastax opscenter

windows,cassandra,nosql,opscenter
I have just started learing the cassandra. I am using datastax opscenter. I am able to create columnfamily, but could not able to find to add the columns and rows in columnfamily through opscenter, though I am able to do on cqlsh. Please advise Thanks...

What is the difference between broadcast_address and broadcast_rpc_address in cassandra.yaml?

cassandra,bigdata
GOAL: I am trying to understand the best way to configure my Cassandra cluster so that several different drivers across several different networking scenarios can communicate with it properly. PROBLEM/QUESTION: It is not entirely clear to me, after reading the documentation what the difference is between these two settings: broadcast_address...

cassandra search a row by secondary index returns null

cassandra,secondary-indexes
I have created a TABLE and index As follows CREATE TABLE refresh_token ( user_id bigint, refresh_token text, access_token text, device_desc text, device_type text, expire_time timestamp, org_id bigint, PRIMARY KEY (user_id, refresh_token) ) WITH CLUSTERING ORDER BY (refresh_token ASC) CREATE INDEX i_access_token ON demodb.refresh_token (access_token); After i insert or delete data...

Dataframe is not saved into Cassandra

java,cassandra,apache-spark,apache-spark-sql,spark-cassandra-connector
I have one application with Spark (version 1.4.0) and Spark-Cassandra-connector (version 1.3.0-M1). In which, I am trying to store one dataframe into Cassandra table which has two columns (total, message). And i already created table into Cassandra with these two columns. Here is my Code, scoredTweet.foreachRDD(new Function2<JavaRDD<Message>,Time,Void>(){ @Override public Void...

Slicing over partition rows using tuple operation in CQL

cassandra,cql,datastax
I am trying to understand the behavior of tuple operator with clustering keys. Here is what I was trying to do: create table sampletable (a int,b int,c int, d int, e int, primary key((a,b,c),d,e)); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,1); insert into sampletable(a,b,c,d,e) values(1,1,1,1,2); insert into sampletable(a,b,c,d,e) values(1,1,2,1,1);...

cassandra IndexSummaryManagerTest unit test case timeout error

unit-testing,build,cassandra
I have been an error while compiling cassandra unit test cases. One of the tests gets timed out sometimes. While https://issues.apache.org/jira/browse/CASSANDRA-8981 states that this issue has been resolved in version 2.1.5, I am still getting this issue. Building cassandra 2.1.5 from source using jdk1.8. Below are the details: [junit] Testsuite:...

Titan start fails: management.properties not found

cassandra,titan
I downloaded and unziped the titan.zip and used the command ./titan.sh -v start. Now I get the output: ./titan.sh -v start Forking Cassandra... OpenJDK 64-Bit Server VM warning: The UseParNewGC flag is deprecated and will likely be removed in a future release Running nodetool statusthrift.Error: Config file not found: /usr/lib64/jvm/java-1.9.0-openjdk-1.9.0/jre/conf/management/management.properties...

Does Cassandra write to a node(which is up) even if Consistency cannot be met?

cassandra,cassandra-2.0
The below statement from Cassandra documentation is the reason for my doubt. For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails...

Exporting Data from Cassandra to CSV file

apache,csv,cassandra,export,export-to-csv
Table Name : Product uid | productcount | term | timestamp 304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000 6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000 Command : COPY product (uid, productcount, term, timestamp) TO 'temp.csv'; Error: Improper COPY command. Am I missing something? ...

Cassandra CQL driver implementation

java,cassandra,nosql,cql,drivers
I have used Cassandra CQL driver to implement some module. I know CQL driver works on port 9042.My module is working fine on port 9042 for cassandra servers(tried both local and remote). However due to some constraints on the data center port 9042 is not open for Cassandra. I need...

How do I run a repair only within a certain datacenter?

cassandra,cassandra-2.0,datastax,datastax-enterprise
I want to run a repair for specific Cassandra datacenter within a larger cluster. How can I do that nodetool repair -local -pr does not seem to work: $ nodetool repair -local -pr Exception in thread "main" java.lang.RuntimeException: Primary range repair should be performed on all nodes in the cluster....

NoSQL DB for searching in vector space

vector,redis,cassandra,nosql,distance
I am completely new to NoSQL DBS such as Cassandra, Mongo, Redis, etc. and I want to create this type of a structure : { "item_id": "ABC1", "x1": 0.55, "x2": -0.29, ... "x100": 0.17 } Basically, I have millions of items and 100 floats associated with each of them. My...

dse cassandra solr doesnt return _uniqueKey in response

solr,cassandra,datastax,datastax-enterprise
Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...

Optimal JVM settings for Cassandra

cassandra,jvm,database-tuning,cassandra-2.1
I have a 4 node cluster with 16 core CPU and 100 GB RAM on each box (2 nodes on each rack). As of now, all are running with default JVM settings of Cassandra (v2.1.4). With this setting, each node uses 13GB RAM and 30% CPU. It is a write...

Cassandra queries performance, ranges

cassandra,cassandra-2.0,cqlsh
I'm quite new with Cassandra, and I was wondering if there would be any impact in performance if a query is asked with "date = '2015-01-01'" or "date >= '2015-01-01' AND date <= '2015-01-01'". The only reason I want to use the ranges like that is because I need to...

Search for more than one element in a list in Cassandra

search,cassandra,cql3
I'm learning how the data model works in Cassandra, what things you can do and what not, etc. I've seen you can have collections and I'm wondering if you can search for the elements inside the collection. I've seen that you can look for one element with contains, but if...

Cassandra read latencies

cassandra,metrics,latency
What is the difference between these read latency metrics? org.apache.cassandra.metrics.ClientRequest.Read.Latency org.apache.cassandra.metrics.ColumnFamily.system.batchlog.ReadLatency org.apache.cassandra.metrics.ColumnFamily.system.batchlog.CoordinatorReadLatency org.apache.cassandra.metrics.ColumnFamily.system.batchlog.CoordinatorScanLatency ...

New Datastax driver for Tableau is not working

cassandra,odbc,tableau,datastax
trying to run Tableau on top of DSE 4.7. It fails. I can't do something in worksheet or preview the data. Get this error: "Missing EOF at 'tablename_i_try_to_query' " What is the right way to fix it?...

Python Django: Minimal Django + Cassandra local application

python,django,python-3.x,cassandra
I am trying to put together a minimum Django application that uses Cassandra as the database. Here is what I tried: Started a brand new Django project in PyCharm. Checked that python manage.py runserver works as expected. Installed Cassandra using the instructions here. I had to change the rpc_port and...

File Processing with Spark and Cassandra

cassandra,apache-spark
Right now I'm working on loading a table from a Cassandra cluster into a Spark cluster with the Datastax Cassandra Spark Connector. Right now the spark program performs a simple mapreduce job that counts the number of rows in the Cassandra table. Everything is set up and run locally. The...

OutofMemoryErrory creating fat jar with sbt assembly

jar,cassandra,apache-spark,sbt
We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra): import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import com.datastax.spark.connector._ import org.apache.spark.SparkConf object VMProcessProject { def main(args: Array[String]) { val conf = new SparkConf() .set("spark.cassandra.connection.host", "127.0.0.1") .set("spark.executor.extraClassPath",...

How to restart a seed node after its process crashes?

cassandra
Is there any differences between replacing a dead node and restarting a dead node, specially for seed nodes ? Actually, I'm a little bit confused about how to restart a dead seed node. When the process of a seed node crashes, should I restart it without doing any changes to...

How to delete a record in Cassandra?

cassandra,cassandra-2.0,cql3
I have a table like this: CREATE TABLE mytable ( user_id int, device_id ascii, record_time timestamp, timestamp timeuuid, info_1 text, info_2 int, PRIMARY KEY (user_id, device_id, record_time, timestamp) ); When I ask Cassandra to delete a record (an entry in the columnfamily) like this: DELETE from my_table where user_id =...

Cassandra performance for partial select of rows

cassandra,cql
If I have a cassandra table like the following CREATE TABLE importantdata ( a1 int, a2 int, a3 int, data1 blob, data2 blob, PRIMARY KEY (a1), a2, a3) ) and then do a partial select SELECT a1, a2, a3, data1 FROM importantdata WHERE a1=0 and a2=1 and a3<0 does the...

Using secondary indexes to update rows in Cassandra 2.1

cassandra,cassandra-2.1
I'm using Cassandra 2.1 and have a model that roughly looks as follows: CREATE TABLE events ( client_id bigint, bucket int, timestamp timeuuid, ... ticket_id bigint, PRIMARY KEY ((client_id, bucket), timestamp) ); CREATE INDEX events_ticket ON events(ticket_id); As you can see, I've created a secondary index on ticket_id. This index...

Failover not working with Cassandra when using DataStax C# Driver

c#,azure,cassandra,datastax-enterprise
I have a two node setup in Azure and I am trying to get failover working when connecting with the C# driver. My nodes seem to be communicating fine when working with cqlsh and within OpsCenter. var contact = "publicipforfirstnode"; _cluster = Cassandra.Cluster.Builder().AddContactPoint(contact).Build(); _session = _cluster.Connect("demo"); I initially connect with...

Limiting columns per record in CQL

cassandra,cql,datastax
I've a problem which has been bothering me from quite while now. I'm scaling it down for simplification. I've a column family in Cassandra defined as: CREATE TABLE "Test" ( key text, column1 text, value text, PRIMARY KEY (key, column1) ) If I run a query in CQL as: select...

Can't allow a new user to run Cassandra nodetool command

cassandra,datastax-enterprise
So I would like to allow an additional user account to be able to run Cassandra commands, such as nodetool status, etc. This account is not the account under which Cassandra runs. I have a four node cluster, and the installation was done via tarball. I have the path set...

How do I store nested data in Cassandra

database,cassandra,denormalization
Consider the following "documents", how these two documents would be stored in a collection. // collection posts: { id: 1, name: "kingsbounty", fields: { "title": { "title": "Game Title", "value": "Kings Bounty" } }, { "body": { "title": "Game Description", "value": "Kings Bounty is a turn-based fantasy..." } } }...

bash: jstat: command not found

java,cassandra,jstat
I want to use the gc utility to analyse the garbage collection for my Cassandra database. But when I am running jstat command the output comes that bash:jstat: command not found. I searched and found that jstat is located in $JAVA_HOME/bin but I am not able to understand where is...

Does Spark from DSE laod all data into RDD before running SQL Query?

cassandra,apache-spark,datastax
Running DSE 4.7 So say I have a 4 node DSE Cassandra/Spark cluster... I have a Cassandra table with say 4,000,000 records in it. On Spark running the following Spark SQL "select * from table where email = ? or mobile = ?" Will Spark load all the data into...

Is it possible to use a timestamp in ms since epoch in select statement for Cassandra?

cassandra,timestamp,cql
I know that using the formats listed here (http://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html) work to query cassandra. However, I'm having a hard time determining if it is even possible to use ms since epoch in the select statement. I feel like it should since it data can be sent to cassandra in the ms...

Cassandra query flexibility

hadoop,cassandra,apache-spark,bigdata,cql
I'm pretty new to the field of big data and currently stucking by a fundamental decision. For a research project i need to store millions of log entries per minute to my Cassandra based data center, which works pretty fine. (single data center, 4 nodes) Log Entry ------------------------------------------------------------------ | Timestamp...

compiling cassandra test cases failing

ant,build,cassandra
I am trying to build cassandra binaries from source and when I try to compile unit test cases, one of them fails. I am build cassandra-2.1.4 with Java 8. These are the commands I run: ant -Dfile.encoding="UTF-8" ant test ant artifacts Failure details: [junit] Testsuite: org.apache.cassandra.tools.SSTableExportTest [junit] Tests run: 8,...

gremlin hangs with sigle node hbase,titan-all 0.4.4

cassandra,hbase,analytics,zookeeper,titan
I have set up single node hadoop and hbase onto it. I also set up titan onto it. But as soon as I start gremlin and do TitanFactory.open(conf) , it hangs and nothing happens. my titan-hbase.properties is as follows: storage.backend=hbase storage.hostname=127.0.0.1 storage.port=2181 cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time =...

Where can I observer writes to Cassandra database, aka where are they logged?

cassandra,datastax-enterprise
Trying to track down a problem with one of our developers, mainly a program he wrote that modifies (adds some flags) to existing entries in the various tables in our Cassandra keyspace. The issue is that it seems to work just fine for many of the tables, but at least...

cassandra result pagination

pagination,cassandra,cql,cql3
Suppose, I have an application which reads data from cassandra and displays them to the user in chunks of lines like 10 or 20 rows a page. Is there a way to do it efficiently in cassandra? Suppose, I have a table 'ks1.cf1' with partition key 'pk' and clustering column...

Delete-Upsert-Read Access Pattern in Cassandra

cassandra
I use Cassandra to store trading information. Based on the queries available, I design my CF as below: CREATE trades (trading_book text, trading_date timestamp, OTHER TRADING INFO ..., PRIMARY KEY (trading_book, trading_date)); I want to delete all the data on a given date in the following way: collect all the...

Cassandra version differences

cassandra,cql
I started reading Cassandra the definitive guide, which is based on Cassandra 0.7. Now, I'm trying to experiment with Cassandra 2.1.5 and it seems that there's a lot of differences which is really confusing. For example, I see that in 0.7 version CQL did not exist. On the other hand,...

Lucene how to index in Database (Cassandra)

java,indexing,lucene,cassandra
I am just experimenting with Lucene and want to indexing objects in Database(Cassandra) as a table. But, I didnt realized out, how the indexing does work on Cassandra. Especially searching... When i take a simple Example Indexing in Lucene: Document doc = new Document(); doc.add(new TextField("id", "Hotel-1345", Field.Store.YES)); doc.add(new TextField("description",...

Timeout using SSTableloader for Cassandra Aws Instance

amazon-ec2,cassandra
I'm trying to use sstableloader to load SSTable (.db) files into a Cassandra Cluster running on an AWS EC2 instance. This error occurrs: Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of C:\Users\SNCUser\dataquest\CassandraLoader\WrDir\beed5b97-0b52-45d7-be5d-fbbac00ac607\device_data\blob\device_data-blob-ka-1-Data.db to [/172.*.*.*] ERROR 16:08:36 [Stream #1114a0d0-1054-11e5-9ccc-65ee5fdd8902] Streaming error occurred java.net.ConnectException:...

Apache Cassandra. Specific case for an Index or a table with composite primary key

indexing,cassandra,cassandra-2.0,composite-primary-key
I have this scenario in a POC of Cassandra. A table CREATE TABLE B (B-UID UUID, A-UID UUID, CREATED_AT timestamp , JSON text, PARENT_B-UID UUID, POSTALCODE text, CUSTOMER_TYPE text, START_DATE timestamp, END_DATE timestamp, SOME_PRICE int, PRIMARY KEY (B-UID)); 24 k rpm of write / 2 k rpm of read. For...

What node does Cassandra store data on?

cassandra,distributed-database
Is there a command or any way at all to know what data is stored on what nodes of Cassandra? Im pretty new to Cassandra and haven't had much luck googling this question. Thanks!...

how to create a keyspace in cassandra?

java,eclipse,cassandra,apache-spark
i'm using a snippet to understand cassandra and syntax: import com.datastax.driver.core.Cluster; import com.datastax.driver.core.ResultSet; import com.datastax.driver.core.Row; import com.datastax.driver.core.Session; public class App { public static void main(String[] args) { Cluster cluster; Session session; // Connect to the cluster and key space "demo" cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); session = cluster.connect("demo"); // Insert one record...