cassandra-2.0 , Cassandra schema, query

Cassandra schema, query


Tag: cassandra-2.0

I'm designing a new application, which will use Cassandra (I'm new in Cassandra). This database will contain only 2-4 column families. The problem is that, I have to provide opportunity to filter based on almost every column attributes. Could you give me some helpful suggestion that I have to keep in mind during planning? What about data redundancy?


Cassandra isn't optimized for this use-case. The preferred way to query data is using the primary key.

Filtering by arbitrary columns is possible

All of those options have their limitations.


Cassandra cleanup on several servers at once

We have a big Cassandra cluster 18 Servers (on one server near 5T data ) - We have added a new nodes following this documentation . After we have added new servers, we began the process of cleaning data (nodetool cleanup) In the documentation advise: After all new nodes...

Running test cluster cassandra/dse on minimal hardware?

I recently came across some of these rugged computers that one of my predecessors had left in our office. Unfortunately they appear to be maxed out at 4GB RAM, and single core 2.67 GHz processors. There are four of them, and my first thought was to create a test environment...

Restoring cassandra from snapshot

So I did something of a test run/disaster recovery practice deleting a table and restoring in Cassandra via snapshot on a test cluster I have built. This test cluster has four nodes, and I used the node restart method so after truncating the table in question, all nodes were shutdown,...

Cassandra: Data loss after adding new node

We had a two nodes cassandra cluster which we want to expand to four. We followed the procedure described there: But after adding two more nodes (at same time, with a 2 minutes interval as recommended in the documentation), we experienced some data loss. In some column families, there...

cassandra data redistribution when new nodes join

I'm a beginner in Cassandra. I want to understand how the data gets (re)distributed when a new node joins an existing cluster. Let us suppose, there were 100 row keys in a cluster of 10 nodes. Also, let us assume for simplicity that using a hash function the rows were...

Cassandra Cluster in AWS Multi Region VPC

I am trying to achieve the following schema for my Cassandra Cluster : 1 VPC in 4 different AWS Regions linked together with IPSec instances. 1 Cassandra Cluster made of 4 nodes, 1 in each VPC Nodes communicating together with private IP in VPC ( Cassandra data accessible through REST...

Does DateTieredCompactionStrategy work with composite keys?

Does DateTieredCompactionStrategy in Apache Cassandra 2.1.2. work with a compound clustering key? More specifically, like with this table where (timestamp, hash) makes up a compound clustering key: CREATE TABLE sensordata ( timeblock int, timestamp timestamp, hash int, data blob, PRIMARY KEY (timeblock, timestamp, hash) ) I believe, that the DateTieredCompactionStrategy...

Repartitioning of data in Cassandra when adding new servers

Let's imagine I have a Cassandra cluster with 3 nodes, each having 100GB of available hard disk space. Replication Factor for this cluster is set to 3 and R/W CLs are set to 2, meaning I can tolerate one of my nodes going down without sacrificing consistency or availability. Now...

Cassandra add TTL to existing entries

How can I update an entire table and set a TTL for every entry? Current Scenario (Cassandra 2.0.11): table: CREATE TABLE external_users ( external_id text, type int, user_id text, PRIMARY KEY (external_id, type) ) currently there are ~40mio entries in this table and i want to add a TTL for...

Need some clarification on running Cassandra nodetool repairs

So we've been having trouble balancing our workload on our current cluster, due mainly to budgetary constraints and the inability to add more nodes at this time. Up until recently, having a node go down overnight was happening frequently, so I was frequently running nodetool repair. Recently the cluster has...

Hadoop Cassandra job not reading all input rows

I have a quite simple Hadoop job using Cassandra as input and output. Here is the job configuration code (nothing special): Job job = new Job(getConf(), JOB_NAME); job.setJarByClass(getClass()); job.setMapperClass(CassandraHadoopCounterMapper.class); job.setReducerClass(CassandraHadoopCounterReducer.class); job.setCombinerClass(CassandraHadoopCounterCombiner.class); job.setInputFormatClass(CqlInputFormat.class); job.setOutputFormatClass(CqlOutputFormat.class); job.setMapOutputKeyClass(IntWritable.class);...

Inserting only few columns into cassandra table

I am new to cassandra, is it possible to insert into only few columns in a table and leaving other columns for future filling? Thanks in advance...

How to create a Cassandra copy to a test machine?

We have a staging environment which runs a one node cluster completely separate from our production environment. What I'd like to do is copy this one node cluster over to a test machine that I have for the sole purpose of testing. What is the correct way to do this?...

I am getting an InvalidTypeException whenever I am using the row.getToken(“fieldname”)?

for the following piece of code I am getting an InvalidTypeException whenever I am using the row.getToken("fieldname"). Record RowToRecord(Row rw) { ColumnDefinitions cd = rw.getColumnDefinitions(); Record rec = new Record(); int i; for(i = 0; i < cd.size(); i++) { rec.fields.add(cd.getName(i)); System.out.println(cd.getName(i)); //System.out.println((rw.getToken(cd.getName(i))).getValue()); Token tk = rw.getToken(cd.getName(i)); //// InvalidTypeException on...

what is meant a node in cassandra

I am new in Cassandra and I want to install it, before that I'm reading a small article on it. But there one thing that I do not understand and it is the node is there anyone can tell me what is a node, what is it for and how...

Multiple node cluster is really slow

I had a single node cassandra cluster on EC2. I was running my tests on it and it worked great. But then, I had to move this cluster to a VPC, so rather than moving the data, I created a new cluster with two nodes (both seeds), and imported the...

Query all rows with first item of compound partition key only

I have the following column family: CREATE TABLE test."Data" ( "ItemID" uuid, "DataID" uuid, PRIMARY KEY (("ItemID", "DataID")) ) I want to get all the rows having "ItemSourceID" = someuuid. Before, I had the following schema, and it was working great obviously: CREATE TABLE test."Data" ( "ItemID" uuid, "DataID" uuid,...

cassandra operations queue is full

I'm running datastax enterprise 4.5.1, with opscenter 5.1.1. These were installed from the standalone linux installers on Ubuntu 14.04 LTS. $ cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 4.1.1 | Cassandra | CQL spec 3.1.1 | Thrift protocol 19.39.0] In the datastax-agent log, I have been seeing tons...

Datsac Cassandra binding with Apache Cassandra

I am tryng to use the Datsax Cassandra (community endition) , but not able to figure out the Datasax git repo for the same . Can someone please help me out in figuring out which release of apache cassandra is used by Datasax cassandra (Community edition ) ??? or does...

how to update data in cassandra using IN operator

I have a table with the following schema. CREATE TABLE IF NOT EXISTS group_friends( groupId timeuuid, friendId bigint, time bigint, PRIMARY KEY(groupId,friendId)); I need to keep a track of time if any changes happen in a group (such changing the group name or adding a new friend in table etc.)....

How do I run a repair only within a certain datacenter?

I want to run a repair for specific Cassandra datacenter within a larger cluster. How can I do that nodetool repair -local -pr does not seem to work: $ nodetool repair -local -pr Exception in thread "main" java.lang.RuntimeException: Primary range repair should be performed on all nodes in the cluster....

What implications does consistency have on async writes?

Both Datastax Python and Java Cassandra drivers supports async writes. Both of them also allows setting consistency level. Does the consistency level have any implication whatsoever for async writes?

cassandra running multiple apps

I have been playing with Cassandra and I really like the dynamics in how it handles nodes. The question: I have more than one app, where I would like to use Cassandra as db. That is many keyspaces. Is it advisable to have multiple apps using the same cassandra cluster?...

Spring-Cassandra Unit with Embedded Cassandra Dependency Injection Issue

I am having issue with Spring unit testing with Embedded Cassandra. The issue is both Embedded Cassandra and My Cassandra Server are starting at the same time. How to make sure that during unit testing only Embedded Cassandra Starts. I am using spring-data for Cassandra. I have the following Spring...

Using partition key along with secondary index

Following are the two queries that I need to perform. select * from where dept = 100 and emp_id = 1; select * from where dept = 100 and name = 'One'; Which of the below options is better ? Option 1: Use secondary index along with a partition key....

Apache Cassandra. Specific case for an Index or a table with composite primary key

I have this scenario in a POC of Cassandra. A table CREATE TABLE B (B-UID UUID, A-UID UUID, CREATED_AT timestamp , JSON text, PARENT_B-UID UUID, POSTALCODE text, CUSTOMER_TYPE text, START_DATE timestamp, END_DATE timestamp, SOME_PRICE int, PRIMARY KEY (B-UID)); 24 k rpm of write / 2 k rpm of read. For...

Unable to create Solr Core, key type mismatch

Running DataStax Enterprise Server 4.6.0 with 6 node cluster, fresh, only 1 record inside this table: CREATE TABLE tweets.tweets (uid bigint, tweet_id bigint, tweet text,created timestamp,PRIMARY KEY (uid , created) ) WITH CLUSTERING ORDER BY (created DESC); schema.xml looks like this: <?xml version="1.0" encoding="UTF-8" ?> <schema name="tweets" version="1.1"> <types> <fieldType...

cassandra 2.0.11 - column count for partition key

Lets consider following table taken from CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); So weatherstation_id is the partition key and event_time is the clustering column. Data is loaded to that table and then we run query: SELECT COUNT(1) FROM temperature WHERE weatherstation_id...

How to resolve schema mismatch following ALTER TABLE?

I ran this command in cqlsh (in 10-node production cluster running Cassandra 2.1.2): cqlsh:aisdata> alter table packets_area_cell10 with compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 1024 }; errors={}, last_host= The 'alter table' commands seem to have run despite the timeout. But subsequently cqlsh ran into ( - and was...

Recovering Cassandra Data From Files

I have a development machine that had a single-node Cassandra 2.1.2 setup. The main drive failed, but the secondary drive, mounted at /var, is good. I was able to connect this drive to another system with a working OS and Cassandra install and mount it. I can see the files...

Cassandra queries performance, ranges

I'm quite new with Cassandra, and I was wondering if there would be any impact in performance if a query is asked with "date = '2015-01-01'" or "date >= '2015-01-01' AND date <= '2015-01-01'". The only reason I want to use the ranges like that is because I need to...

Cassandra schema, query

I'm designing a new application, which will use Cassandra (I'm new in Cassandra). This database will contain only 2-4 column families. The problem is that, I have to provide opportunity to filter based on almost every column attributes. Could you give me some helpful suggestion that I have to keep...

How to delete a record in Cassandra?

I have a table like this: CREATE TABLE mytable ( user_id int, device_id ascii, record_time timestamp, timestamp timeuuid, info_1 text, info_2 int, PRIMARY KEY (user_id, device_id, record_time, timestamp) ); When I ask Cassandra to delete a record (an entry in the columnfamily) like this: DELETE from my_table where user_id =...

Hadoop Cassandra wide rows in CqlInputFormat

I am writing a hadoop job that uses Cassandra (v2.0.11) as its input and output. In my hadoop job I define input column family: ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, INPUT_COLUMN_FAMILY, WIDE_ROWS); where WIDE_ROWS=true. I also set CqlInputFormat as a reading class: job.setInputFormatClass(CqlInputFormat.class); CqlInputFormat uses CqlRecordReader where it's written (link): // Because the old...

writetime of cassandra row in spark

i'm using spark with cassandra, and i want to select from my cassandra table the writeTime of my row. This is my request : val lines = sc.cassandraTable[(String, String, String, Long)](CASSANDRA_SCHEMA, table).select("a", "b", "c", "writeTime(d)").count() but it display this error : Column channal not found in table test.mytable I've...

Does Spark incur the same amount of overhead as Hadoop for vnodes?

I just read Does Spark (specifically Datastax's Cassandra Spark connector) incur the same amount of overhead as Hadoop when reading from a Cassandra cluster? I know Spark uses threads more heavily than Hadoop does.

Whats the difference between CQL data type VARCHAR vs TEXT?

Difference between CQL data type TEXT vs Varchar ? Any functional limitations ? TIA...

Cassandra python driver event_time

I'm having some trouble with converting the event_time that the python driver returns to a timestamp. Basically, I need to store timestamps with some associated data and then query by time range. At the beginning, I didn't know you can put timestamps in the query, so I would convert them...

How do I represent a cassandra user defined type in a python model in cqlengine

I have the following table schema defined in my cassandra cluster CREATE TABLE users ( username text PRIMARY KEY, creationdate bigint, email text, firstlogin boolean, firstname text, lastloggedin bigint, lastname text, lastprofileupdate bigint, name text, olduserid int, profile frozen<profile_type>, user_id uuid and the user defined type, profile_type as the below......

Does Cassandra write to a node(which is up) even if Consistency cannot be met?

The below statement from Cassandra documentation is the reason for my doubt. For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails...

Cassandra with uneven hardware, how to configure?

We are building a Cassandra (2.1.5) cluster for storing large amount of timeseries data, and we are planning to utilize existing hardware, problem is the hardware available is really different. 2 machines with: 4 core, 8 GB, SSD 2 machines with: 8 core, 16 GB, SSD 2 machines with: 32...

cassandra query on map in select clause

i am new to cassandra and i am trying to read a row from database which contains values siteId | country | someMap 1 | US | {a:b, x:z} 2 | PR | {a:b, x:z} I have also created an index on table using create index on columnfamily(keys(someMap)); but still...

SEVERE error writing to S3 backup

I'm running OpsCenter 5.1.1 with Datastax Enterprise 4.5.1. It's a 3-node cluster on AWS and I'm backing up to S3 (still...) I've started seeing a new error. I think this is a different error than any I've posted b4. $ cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 4.1.1 |...

Cassandra datastax driver ResultSet sharing in multiple threads for fast reading

I've huge tables in cassandra, more than 2 billions rows and increasing. The rows have a date field and it is following date bucket pattern so as to limit each row. Even then, I've more than a million entries for a particular date. I want to read and process rows...

Low TTL with Leveled Compaction, should I reduce gc_grace_seconds to improve read performance without impacting delete replication?

Low TTL with Leveled Compaction, should I reduce gc_grace_seconds to improve read performance? Scenario: Cassandra Table to cache an external db values - read performance needs to be good (less than 100ms) TTL = 4 hrs at row level Functional full table refresh (delete and then lazy load) every 6...

Memtable understanding

I have some questions about cassandra memtable. I'll be grateful for the help. Facts about memtable: 1) placed in RAM; 2) per-ColumnFamily structure; 3) multiple memtables may exist for a single column family; Questions: 1) When additional memtable for column family are created? What condition is need? I assume that...

How to populate related table in Cassandra using CQL?

I am trying to practice Cassandra using this example (under Composite Columns paragraph): So, I have created table tweets and it looks like following: cqlsh:twitter> SELECT * from tweets; tweet_id | author | body --------------------------------------+-------------+-------------- 73954b90-baf7-11e4-a7d0-27983e9e7f51 | gwashington | I chopped... (1 rows) Now I am trying to populate timeline,...