FAQ Database Discussion Community


Hive output larger than dfs blocksize limit

hadoop,hive,hdfs,partitioning
I have a table test which was created in hive. It is partitioned by idate and often partitions need to be inserted into. This can leave files on hdfs which have only a few rows. hadoop fs -ls /db/test/idate=1989-04-01 Found 3 items -rwxrwxrwx 3 deployer supergroup 710 2015-04-26 11:33 /db/test/idate=1989-04-01/000000_0...

Splitting a ruby file at a pattern?

ruby,regex,file,split,partitioning
I have a large ruby file that contains product data. I'm trying to split the file into sections based on a regular expression. I have product headers denoted by the word Product followed by a space and then a number. After that, I have a bunch of lines containing product...

Minimally partitioning a vector of objects (C++)

c++,partitioning
I have an std::vector of objects where each element in the vector more or less looks like: struct Obj { int group; }; The entries in the vector are in no particular order. Ordinarily, when partitioning, one might want to typically group elements in the same partition that have something...

What is error in partition code?

c,quicksort,partitioning
Below code is working for Array{ 4 5 3 7 2 }, but not working for other test case given on HackerRank Site. What is error in my code? Am I doing any wrong while merging the two array a1[] and a2[] into the ar[] ? https://www.hackerrank.com/challenges/quicksort1 #include <stdio.h> #include...

lvm: create snapshot between volume groups

linux,virtualization,partitioning,snapshot,lvm
is there a way to create a snapshot of a logical volume (lv1) that resides into volume group vgA inside a different volume group (say vgB)? i have my root logical volume in volume group vgA on the SSD and i want to take a snapshot of the volume on...

Partition data into two separate groups s.t. residual sum of squares with one continuous predictor is minimized

r,algorithm,statistics,grouping,partitioning
What's the basic algorithm to partition a set of data into two groups s.t. the sum of the two separate residual sum of squares is minimized? For example, consider the code below. Basically, how do you compute the value stored in best.cutpoint$RSS without iteratively testing each possible value? set.seed(1) ind.var...

Using Postgresql is it normal for the master partition table have rows inserted into it along with the child table?

postgresql,partitioning
Using the the example Postgres Partitioning Docs 9.3 should the master table "measurement" get rows inserted when performing inserts after creating the trigger functions and the trigger? Using the example given in the docs upon performing a insert both the master and the child table have rows inserted. I though...

DROP an one year old partition of a table in Oracle

sql,oracle,partitioning
I had to drop a partition of a table which is one year old. Now, in the all_tab_partitions , the HIGH_VALUE column is of LONG datatype and my table is partitioned on RANGE (date column) . Hence, I had to figure out a way to read this column and then...

BASH - Meaning of making of partition code

bash,partitioning,fdisk
I'm reading a .sh file that contains code of making of partition. But I don't understand these lines: cat <<EOF > fdisk.input x h #heads 16 s #sectors 63 c #cylinders EOF echo $kbytes >> fdisk.input cat <<EOF >> fdisk.input r n p 1 a 1 w EOF fdisk hd.img...

How to partition Azure tables used for storing logs

azure,logging,partitioning,azure-table-storage
We have recently updated our logging to use Azure table storage, which owing to its low cost and high performance when querying by row and partition is highly suited to this purpose. We are trying to follow the guidelines given in the document Designing a Scalable Partitioning Strategy for Azure...

More than one Partitioner in a Spring batch job

spring-batch,partitioning
I have two different files(both are different layouts) which i am splitting it as multiple files to make use of local step partitioning. so far i am handling with one file and i have created one custom partitioner class to make use of step partitioning. Now i want to include...

Postgresql Table Partitioning Django Project

django,postgresql,partitioning
I have a Django 1.7 project that uses Postgres 9.3. I have a table that will have rather high volume. The table will have anywhere from 13million to 40million new rows a month. I would like to know what the best way to incorporate Postgres table partitioning with Django?...

Kafka partitions meaning

configuration,partitioning,partition,kafka
When we decide about partitions, should we do that on per-topic base, or it is topic-wide decision? If T1 partitioned on 3 partitions, and T2 partitioned on 2 partitions, can they both be consumed by 1 consumer? Or it is better make equal number of partitions if topics must be...

PostgreSQL: update record on master table and move record in child partitions

postgresql,triggers,partitioning
How to define an update trigger function for Updating records in the master table which has partitions defined on it. I've a table which has partitions defined on it and code for the insert trigger function which will insert data in the child tables whenever there is an insert in...

Maximizing the minimum difference between elements in a set

algorithm,partitioning,greedy
I recently thought of this problem, and I thought of an "instinctive" greedy solution but I can't prove its optimality. You are given N integers, V1, V2, ..., VN and K sets (K < N). You need to find a way of partitioning the integers into the sets, so that...

mysql like button table id auto_increment optimization

mysql,auto-increment,partitioning
I have this table: CREATE TABLE IF NOT EXISTS `likes` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user` varchar(40) NOT NULL, `post_id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; it is for a big store, it allow customers to like products. it record who like (user)...

PySpark repartitioning RDD elements

hadoop,apache-spark,partitioning,rdd,pyspark
I have a spark job that reads from a Kafka stream and performs an action for each RDD in the stream. If the RDD is not empty, I want to save the RDD to HDFS, but I want to create a file for each element in the RDD. I've found...

Using partition key in function Oracle

sql,oracle,partitioning
We have a partioned table in our Oracle database using this syntaxe: ... PARTITION BY RANGE(saledate) (PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')), PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')), ... We usually use partition key in select statement like this: Select * from table where saledate >= trunc(sysdate-3) and...

In Apache Spark, why does RDD.union does not preserve partitioner?

apache-spark,partitioning,hadoop-partitioning
As all knows Spark partitioner has a huge performance impact on any "wide" operations, so its usually customized in operations. When I test partitioner with the following code: val rdd = sc.parallelize(1 to 50).keyBy(_ % 10).partitionBy(new HashPartitioner(10)) val rdd2 = sc.parallelize(200 to 230).keyBy(_ % 13) val cogrouped = rdd.cogroup(rdd2) println("cogrouped:"...

Solutions to resize root partition on live mounted system

linux,chef,partitioning,mount
I'm writing a Chef recipe to automate setting up software RAID 1 on an existing system with. The basic procedure is: Clear partition table on new disk (/dev/sdb) Add new partitions, and set then to raid using parted (sdb1 for /boot and sdb2 with LVM for /) Create a degraded...

How many spark elements per partition

apache-spark,partitioning
Is there any way to get the number of elements in a spark RDD partition, given the partition ID? Without scanning the entire partition. Something like this: Rdd.partitions().get(index).size() Except I don't see such an API for spark. Any ideas? workarounds? Thanks...