FAQ Database Discussion Community


Is it possible to use HWI (Hive Web Interface) in single node installation?

hadoop,hive,hiveql
Is it possible to use HWI (Hive Web Interface) in single node installation?

How to get current time for a given timezone in hive

java,hadoop,hive
I have 2 tables in Hive. table1 contains timelocation 2015-03-04 15:00 Chicago 2015-03-04 15:00 Denver 2015-03-04 15:00 Honolulu table2 contains ID Description America/Chicago CENTRAL STANDARD TIME America/Denver MOUNTAIN STANDARD TIME Pacific/Honolulu HAWAII-ALEUTIAN STANDARD TIME For a record present in table1 say "2015-03-04 15:00 Chicago" , I need to do a...

Access denied when “LOAD DATA” in hive

hadoop,hive,impala
I have read access to an HDFS path 'XXX', when I execute some statements like "LOAD DATA INPATH 'XXX' into table YYY ..." An error occurs: ERROR: AuthorizationException: User 'ZZZ' does not have privileges to access: 'XXX' Why a "LOAD DATA" statement need more privileges than "READ"?...

Is the GROUP BY clause applied after the WHERE clause in Hive?

hive,hiveql
Suppose I have the following SQL: select user_group, count(*) from table where user_group is not null group by user_group Suppose further that 99% of the data has null user_group. Will this discard the rows with null before the GROUP BY, or will one poor reducer end up with 99% of...

How to extract characters to the right of a delimited value in a SELECT statement in Hive

hadoop,hive
I need to extract all the characters to the right of a "/" as part of a select statement. city America/Chicago Europe/London Expected output is cityname Chicago London This is my query which is not working select substring(city, charindex('/',city) +1, length(city)) as cityname from table; ...

Hive UDF returning an array called twice - performance?

hive,hiveql
I have created a GenericUDF in hive that takes one string argument and returns an array of two strings, something like: > select normalise("ABC-123"); ... > [ "abc-123", "abc123" ] The UDF makes a call out via JNI to a C++ program for each row to calculate the return data...

How to find number of unique connection using hive/pig

hadoop,hive,apache-pig
I have a sample table like below: caller receiver 100 200 100 300 400 100 100 200 I need to find the number of unique connection for each number. For ex: 100 will have connections like 200,300 and 400. My output should be like: 100 3 200 1 300 1...

Why does YARN takes a lot of memory for a simple count operation?

hadoop,mapreduce,hive,yarn,hortonworks-data-platform
I have a standard configured HDP 2.2 environment with Hive, HBase and YARN. I've used Hive (/w HBase) to perform a simple count operation on a table that has about 10 million rows and it resulted with a 10gb of memory consumption from YARN. How can I reduce this memory...

hive - Regex in Split function not giving output

hadoop,hive,hiveql
Input : [a,b], [c,d], [e,f] select split(col,'\\,') from table_1; With the above query, am able to split on every comma. (inside and outside the braces) I need to split only on the commas outside the braces. so i changed the query as below. select split(col,',(?=\[)') from table_1; regex which i...

Hadoop Hive unable to move source to destination

java,hadoop,hive,execution
I am trying to use Hive 1.2.0 over Hadoop 2.6.0. I have created an employee table. However, when I run the following query: hive> load data local inpath '/home/abc/employeedetails' into table employee; I get the following error: Failed with exception Unable to move source file:/home/abc/employeedetails to destination hdfs://localhost:9000/user/hive/warehouse/employee/employeedetails_copy_1 FAILED: Execution...

handling oracle's ROWID in apache hive

oracle,hadoop,hive,hiveql,hortonworks-data-platform
I'm converting oracle sql queries to hiveql; how to convert queries with ROWID in oracle to hive. Example: select ROWID, name, country from table1 where date = to_date('10/11/2015','mm/dd/yyyy') ...

Query is not returning any values in hive

hadoop,hive,hiveql
I'm a newbie here. Running the following select statement is not returning any values. Hive queries: select name from patient where name = '[a-g]%'; select name from patient where name like '[a-g]%'; What am I doing wrong? Thanks in advance!...

Hive(Bigdata)- difference between bucketing and indexing

hadoop,mapreduce,hive,bigdata
What is the main difference between bucketing and indexing of a table in Hive?

How to avoid bad records in Hadoop PIG and Hive?

hadoop,hive,apache-pig
Hi I am new in Hadoop i found that badrecords for any input format can be skipped in Java map reduce using SkipBadRecord class,so i just want to know how it possible in both Pig and Hive?

How to import MySql table into a targeted database in hive?

hadoop,hive,sqoop
I am using hadoop version 2.6.0 & sqoop version 1.4.5. I have successfully imported a SQL table- tblSystem into hive using the following sqoop command: sqoop import --connect jdbc:mysql://ip_Address:port_no/MySQL_database_name --username user --password passwd --table tblSystem -m 1 --hive-import However, I noticed that this command imports the SQL table into the...

Adding a default value to a column while creating table in hive

hive,hiveql
I'm able to create a hive table from data in external file. Now I wish to create another table from data in previous table with additional columns with default value. I understand that CREATE TABLE AS SELECT can be used but how do I add additional columns with default value??...

Create external table with select from other table

hive,hdinsight
I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into table2. First I wanted a simple...

Converting null value to dummy ip address

hadoop,hive
I have a table where one of the column is ip address. I need to convert/select any value in that column that has null value to ip address e.g. 0.0.0.0. I have below table schema where ip column is the ip address: hive> desc ad_data; OK ad_id string id_segment bigint...

permissions to hive tables based on user roles

hive
how can we specify permissions for a table in hive in a way that only specific columns are visible to the users when they query according to there roles( I can use "views" but if they are 150 different roles)

Hive derby/mysql installation

hadoop,hive
I have 2 queries from the below logs: How to get rid of the first line WARN message. Why is Hive complaining about the xml as its correctly formatted or am i missing something. 15/04/23 18:26:58 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in...

Replace the empty or NULL value with specific value in HIVE query result

hadoop,replace,null,hive
I'm trying to show a default value, "Others", when the query does not return any result for one of the selected columns. I'll show you the example. This query returns an empty value for os(agent) SO (in the first row): select country, os(agent) SO, count(*) from clicks_data where country is...

Filter out duplicate rows based on a subset of columns

hadoop,hive,hiveql
I have some data that looks like this: ID,DateTime,Category,SubCategory X01,2014-02-13T12:36:14,Clothes,Tshirts X01,2014-02-13T12:37:16,Clothes,Tshirts X01,2014-02-13T12:38:33,Shoes,Running X02,2014-02-13T12:39:23,Shoes,Running X02,2014-02-13T12:40:42,Books,Fiction X02,2014-02-13T12:41:04,Books,Fiction what I would like to do is to only keep one instance of each datapoint in time like this (I don't care which instance in time): ID,DateTime,Category,SubCategory X01,2014-02-13T12:36:14,Clothes,Tshirts X02,2014-02-13T12:39:23,Shoes,Running...

How to read a nested collection in Spark

hadoop,hive,apache-spark,parquet
I have a parquet table with one of the columns being , array<struct<col1,col2,..colN>> Can run queries against this table in Hive using LATERAL VIEW syntax. How to read this table into an RDD, and more importantly how to filter, map etc this nested collection in Spark? Could not find any...

How to load nested collections in hive with more than 3 levels

hadoop,hive,bigdata
I'm struggling to load data into Hive, defined like this: CREATE TABLE complexstructure ( id STRING, date DATE, day_data ARRAY<STRUCT<offset:INT,data:MAP<STRING,FLOAT>>> ) row format delimited fields terminated by ',' collection items terminated by '|' map keys terminated by ':'; The day_data field contains a complex structure difficult to load with load...

Performance improvement for GZ to ORC File

hive,hdinsight
Please let me know Is there any faster way to move (*.gz) to ORC table directly. 1)Another thought, from *.gz file to NON Partition table, Rather than creating External Table and dumping gz file data to External Table. Is there any other approach for quicker loading from Gz to External...

SQL QUALIFY equivalent HIVE query

sql,hive,oracle-sqldeveloper,hiveql,row-number
I'm trying to create a HIVE query from an Oracle SQL query. Essentially I want to select the first record, sorted descending by UPDATED_TM, DATETIME, ID_NUM. SELECT tbl1.NUM AS ID, tbl1.UNIT AS UNIT, tbl2.VALUE AS VALUE, tbl1.CONTACT AS CONTACT_NAME, 'FILE' AS SOURCE, CURDATE() AS DATE FROM DB1.TBL1 tbl1 LEFT JOIN...

GROUP BY statement HiveQL

hadoop,hive,hiveql
I'm a noobie to Hive. My question is why we need to use collect_set(col) while performing GROUP BY? select singer, collect_set(song) from songlist GROUP BY singer;; would really appreciate any help. Thanks in advance!...

Hue Beeswax / HCat no longer working (kerberos default user) after migration to HDP2.2

hive,kerberos,hortonworks-data-platform,hue
I've almost done the migration of my secure HDP2.1 to HDP2.2 hadoop cluster. Everything seems to work (including hive in command line), but hue. If the file browser, job browser, pig interface and oozie interface are working, this is not the case of the beeswax & webhcat interface. (NB :...

How to insert and Update simultaneously to PostgreSQL with sqoop command

postgresql,hadoop,hive,sqoop
I am trying to insert into postgreSQL DB with sqoop command. sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema It is working fine if there is not primary key constrain. I want to insert new records and update old records...

Convert AM/PM times in pig to 24 hour format

hive,apache-pig
My source is sending data like 3/13/15 7:54:47 AM. I am loading it as chararray and my final requirement is to use the output of pig in Partitioned HIVE tables (partitioned on Year/Month/Day). I want to separate h:m:s out of this and need to convert from AM/PM to 24 hour...

Merging two columns into a single column and formatting the content to form an accurate date-time format in Hive?

sql,regex,hadoop,hive,datetime-format
these are the 2 columns(month,year). I want to create a single column out of them having an accurate date-time format('YYYY-MM-DD HH:MM:SS') and add as new column in the table. Month year 12/ 3 2013 at 8:40pm 12/ 3 2013 at 8:39pm 12/ 3 2013 at 8:39pm 12/ 3 2013 at...

what is the main difference between dynamic and static partitioning in hive

hive
What is the main difference between static and dynamic partition in hive. Using individual insert means static and single insert to partition table means dynamic. Is there any other advantage.

Writing column header for hive output with union all

sql,hadoop,hive
I am looking at ways to write a column header to hive output (actually trailer as well) and have only been able to get close using union all. Example select "age" as age, "name" as name, "address" as address union all select age, name, address from customers the problem is...

How to Append new data to already existing hive table

hadoop,hive
How to append the records to existing partitioned Hive table? For example I have existing external Table called "ip_country" and dataset is testdata1. If dataset grows say like my dataset in next day is testdata1 and testdata2 then how to append new data i.e.., "testdata2" to "ip_country" hive table.

Partition swapping in Hive

hive
What is the impact on running queries in Hive if i swap the partition using ALTER TABLE user_data PARTITION (name = 'ABC') SET LOCATION = 'db/partitions/new'; Does this command wait until queries finished executing?...

Import data from oracle into hive using sqoop - cannot use --hive-partition-key

oracle,hadoop,hive,sqoop
I have a simple table: create table osoba(id number, imie varchar2(100), nazwisko varchar2(100), wiek integer); insert into osoba values(1, 'pawel','kowalski',36); insert into osoba values(2, 'john','smith',55); insert into osoba values(3, 'paul','psmithski',44); insert into osoba values(4, 'jakub','kowalski',70); insert into osoba values(5, 'scott','tiger',70); commit; that i want to import into Hive using sqoop....

Create an external Hive table from an existing external table

csv,hadoop,hive
I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to...

Hive: can't fill index

hadoop,indexing,hive,hiveql
I'm using Hive 14.0 and have a challenge to index tables. If I want to build an index without DEFERRED REBUILD, Hive do not create an index-table for me. If I use it with DEFERRED REBUILD an index-table is build, but after REBUILD nothing happens. My testtable has myKey as...

Save flume output to hive table with Hive Sink

hadoop,hive,flume
I am trying to configure flume with Hive to save flume output to hive table with Hive Sink type. I have single node cluster. I use mapr hadoop distribution. Here is my flume.conf agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 agent1.sources.source1.type = exec agent1.sources.source1.command = cat /home/andrey/flume_test.data agent1.sinks.sink1.type...

Retrieve 3rd MAX salary in Hive

hive,hiveql
I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the...

Add minutes to datetime in Hive

hadoop,hive,hiveql
Is there a function in Hive one could use to add minutes(in int) to a datetime similar to DATEADD (datepart,number,date)in sql server where datepart can be minutes: DATEADD(minute,2,'2014-07-06 01:28:02') returns 2014-07-06 01:28:02 On the other hand, Hive's date_add(string startdate, int days) is in days. Any of such for hours?

How to let CREATE TABLE…AS SELECT in HIVE do not populate data?

hadoop,hive,hsql
When I run CTAS in HIVE, the data is also populated simultaneously. But I just want to create the table, but not populate the data. How and what I should do? Thanks.

Weird dse hive integration in DSE 4.7

hadoop,hive,datastax,datastax-enterprise
I'm trying to run Hive query over existing C* table. Here is my C* table definition: drop table IF EXISTS mydata.site_users; CREATE TABLE IF NOT EXISTS appdata.site_users ( user_id text, user_test_uuid uuid, --for testing purposes, if we can use it in queries, there could be some serde problems? user_name text,...

How to calculate Date difference in Hive

hadoop,hive,hiveql
I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do I calculate the datediff? Is there...

Hive output larger than dfs blocksize limit

hadoop,hive,hdfs,partitioning
I have a table test which was created in hive. It is partitioned by idate and often partitions need to be inserted into. This can leave files on hdfs which have only a few rows. hadoop fs -ls /db/test/idate=1989-04-01 Found 3 items -rwxrwxrwx 3 deployer supergroup 710 2015-04-26 11:33 /db/test/idate=1989-04-01/000000_0...

Create External Hive Table Pointing to HBase Table

sql,hadoop,hive,hbase,impala
I have a table named "HISTORY" in HBase having column family "VDS" and the column names ROWKEY, ID, START_TIME, END_TIME, VALUE. I am using Cloudera Hadoop Distribution. I want to provide SQL interface to HBase table using Impala. In order to do this we have to create respective External Table...

Hadoop:Cascading FlowException

hadoop,hive,cascading
I Installed hadoop 1.0.4 and hive 0.12.When i run the Cascading Pattern on this it Give Cascading flow exception. when i run with following hadoop command hadoop jar bulid/libs/pattern-example*.jar i am getting above mention exception,for reference i include Cascading Code. Tap inputTap = new Hfs(new TextDelimited(true, "\t"), "hdfs://hdmaster:54310/user/hive/warehouse/temp/Dataformated/finalformated"); String classifyPath=Output...

How to I access HBase table in Hive & vice-versa?

hive,hbase,sqoop,apache-sqoop,apache-hive
As a developer, I've created HBase table for our project by importing data from existing MySQL table using sqoop job. The problem is our data analyst team are familiar with MySQL syntax, implies they can query HIVE table easily. For them, I need to expose HBase table in HIVE. I...

Apache Kylin - wrong output at the first step of cube building

apache,hadoop,hive,olap,kylin
I'am trying to build my first cube using Apache Kylin, everything goes fine until last step where I'm getting error: java.lang.IllegalStateException: Can't get cube source record count. at com.google.common.base.Preconditions.checkState(Preconditions.java:149) at org.apache.kylin.job.cube.UpdateCubeInfoAfterBuildStep.doWork(UpdateCubeInfoAfterBuildStep.java:104) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) at...

HIVE: apply delimiter until a specified column

hadoop,datatable,hive,delimiter
I am trying to move data from a file into a hive table. The data in the file looks something like this:- StringA StringB StringC StringD StringE where each string is separated by a space. The problem is that i want separate columns for StringA, StringB and StringC and one...

Hive: Kryo Exception

java,hadoop,hive,hiveql
I'm executing one of my HQL query which has few joins, union and insert overwrite operation, which is working fine if I run it just once. If I execute the same job second time, I'm facing this issue. Can someone help me to identify in which scenario we get this...

Selecting YYYYMM of the previous month in HIVE

sql,hive,hiveql
I am using Hive, so the SQL syntax might be slightly different. How do I get the data from the previous month? For example, if today is 2015-04-30, I need the data from March in this format 201503? Thanks! select employee_id, hours, previous_month_date--YYYYMM, from employees where previous_month_date = cast(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') as...

Hive query executing differently in Hive client and JDBC

hadoop,jdbc,hive,cloudera
The following query I executed via Hive client, Java program JDBC and beeline. SELECT * FROM TABLE_ONE AS t1 JOIN TABLE_TWO t2 ON t2.p_id = t1.p_id AND t2.p_n_id = t1.p_n_id AND t2.d_id = t1.d_id JOIN TABLE_THREE t3 ON t3.d_m_id = t1.d_m_id AND t3.d_p_id = t1.d_p_id JOIN TABLE_FOUR t4 ON t4.c_id...

Grant Permission in Hive

hadoop,hive,metastore,hortonworks
I'm trying to set Grant Permissions on Hive 14.0 using Hortonworks Sandbox HDP2.2 as one-node-system. This questions goes similar to unanswered question from march: grant permissions in hive does not work on hdp2.2. I followed the configuration steps of the a Hive-Wiki. After log into hive shell (ssh with root)...

NoSuchMethodError when hive.execution.engine value its tez

java,apache,hadoop,hive,bigdata
I am using hive 1.0.0 and apache tez 0.4.1 When I configure hive to use tez I get an exception. In hive-site.xml when the hive.execution.engine value is mr its works fine. But if I set it to tez I get this error: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:169)...

Reading many small files from S3 very slow

amazon-web-services,amazon-s3,hive,apache-pig,elastic-map-reduce
Loading many small files (>200000, 4kbyte) from a S3 Bucket into HDFS via Hive or Pig on AWS EMR is extremely slow. It seems that only one mapper is used to get the data, though I cannot exactly figure out where the bottleneck is. Pig Code Sample data = load...

Import .sql file in Hive

sql,bash,hadoop,hive
I have the file named Hive_Tables.sql having the following code: CREATE EXTERNAL TABLE IF NOT EXISTS HISTORY ( ROWKEY STRING, ID STRING, START_TIME STRING, END_TIME STRING, VALUE DOUBLE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,VDS:ID,VDS:START_TIME,VDS:END_TIME,VDS:VALUE" ) TBLPROPERTIES("hbase.table.name" = "HISTORY"); How...

Select top 2 rows in Hive

hadoop,hive,hiveql
I'm a noobie here. I'm trying to retrieve top 2 tables from my employee list based on salary in hive (version 0.11). Since it doesn't support TOP function, is there any alternatives? Or do we have define a UDF?

trying out derby and following instructions produces java.lang.ExceptionInInitializerError error

java,jdbc,hive,derby
I am following the derby tutorial from: http://db.apache.org/derby/papers/DerbyTut/ I successfully installed the derby on the system: [email protected]:/usr/local/derby$ java org.apache.derby.tools.sysinfo ------------------ Java Information ------------------ Java Version: 1.7.0_79 Java Vendor: Oracle Corporation Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre Java classpath:...

How can I parse a Json column of a Hive table using a Json serde?

json,hadoop,hive
I am trying to load de-serialized json events into different tables, based on the name of the event. Right now I have all the events in the same table, the table has only two columns EventName and Payload (the payload stores the json representation of the event): CREATE TABLE event(...

Is it possible to concat a string field after group by in Hive

hive,cloudera-cdh
I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive....

can't get hive to accept json file

json,hive
I am following a simple hive json serde tutorial, but can't get hive to accept a json file that looks totally correct. { "id": 596344698102419456, "created_at": "MonApr0101: 32: 06+00002013", "source": "<ahref="http: //google.com"rel="nofollow">RihannaQuotes</a>", "favorited": False } CREATE EXTERNAL TABLE tweets ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN ) ROW...

ERROR jdbc.HiveConnection: Error opening session Hive

java,hadoop,jdbc,hive
i try to run JBDC code for Hive2 get error. i have hive 1.2.0 version hadoop 1.2.1 version. but in command line hive and beeline works fine without any problem.but with jdbc getting error. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveJdbcClient { private static...

get avg of count in hive

hive,hiveql
I am trying get the avg of the result of a count query, in the documentation of hive I read it is impossible and for this reason I try it: 1º CREATE VIEW clicks_pais_totalView AS SELECT p.pais as pais, count(1) as numeroClicks FROM clicks_data_mat p WHERE p.pais is not NULL...

Add PARTITION after creating TABLE in hive

hadoop,hive,partition
i have created a non partitioned table and load data into the table,now i want to add a PARTITION on the basis of department into that table,can I do this? If I do: ALTER TABLE Student ADD PARTITION (dept='CSE') location '/test'; It gives me error: FAILED: SemanticException table is not...

How to load DataFrame directly to Hive in Spark

hive,apache-spark,apache-spark-sql
Is it possible to save DataFrame in spark directly to Hive. I have tried with converting DataFrame to Rdd and then saving as text file and then loading in hive. But i am wondering if i can directly save dataframe to hive...

Not able to connect to hive on AWS EMR using java

hadoop,amazon-web-services,hive,emr
I have setup AWS EMR cluster with hive. I want to connect to hive thrift server from my local machine using java. I tried following code- Class.forName("com.amazon.hive.jdbc3.HS2Driver"); con = DriverManager.getConnection("jdbc:hive2://ec2XXXX.compute-1.amazonaws.com:10000/default","hadoop", ""); http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCDriver.html.As mentioned in the developer guide, added jars related with hive jdbc driver to class path. But I am...

Custom Record Delimiter for HIVE

hadoop,hive
For Hive version - 0.14 Can we provide a custom record delimiter "\r\r\n" instead of defaults ' [ "\r" , "\n", "\r\n" ] As a result, in my case 2 lines become 4 lines in HIVE because of default line separators whereas I needed "\r\r\n" to be line separator....

schedule and automate sqoop import/export tasks

shell,hadoop,automation,hive,sqoop
I have a sqoop job which requires to import data from oracle to hdfs. The sqoop query i'm using is sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '1' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test1...

Selecting the first day of the month in the HIVE

sql,hive,hiveql
I am using Hive (which is similar to SQL, but the syntax can be little different for the SQL users). I have looked at the other stackoverflow, but they seems to be in the SQL with different syntax. I am trying to the get the first day of the month...

calling hive -e from a python script

python,hive,cloudera-cdh
I would like to run a very simple hive command from within my python script. I am trying to use hive -e, but I am getting an error def hive(): cmd = "hive -e \"msck repair table dashboard_report\"" print(cmd) check_call(cmd) This is the error I am getting hive -e "msck...

How to get Substring in Hadoop Hive?

mysql,hadoop,hive
My question is how to get a substring in Hive by an indication in the string. My Column values format is like this: /Country/State/City/Suburb/Street here I only need to get Country. I have fond that SPLIT which returns an array of string delimited by '/'. And also SUBSTR(string a, int...

HIVE or SQL query to compare pre and post sales for same sample size

mysql,sql,hive
I have a table which has columns as employeeIDs (string), performance rating(int), date(string), along with flag account (string) if subscribed( Account = 'yes' after subscription and 'no' before subscription) DIFFERENT EMPLOYEES subscribe on different dates, pre = before subscription post = after subscription Need to calculate their sum performance...

what does this notation in hive script(hivequery.hql) file mean “use ${word:word}”

linux,bash,shell,hadoop,hive
The script (hivequery.hql:) looks like this: Use ${platformType:platformName}; select * from hivetablename; And this script is being called in a bash script as #!/usr/bin/env bash hive -f hivequery.hql ...

Losing decimal places in hive tables

hive,hdinsight
I created a float columns for a hive table. I then uploaded some lat/lng data: -74.223827433599993,40.698842158200002 -117.57601661229999,34.044852414099999 -81.055917609600002,29.239413820500001 -80.369884327199998,25.789129988199999 When I query the data out or into another table, the rounding is significant: -74.22383,40.69884 -117.57602,34.044853 -81.055916,29.239414 -80.36988,25.78913 ...

Hive external table not reading entirety of string from CSV source

csv,hadoop,hive,hiveql
Relatively new to the Hadoop world so apologies if this is a no-brainer but I haven't found anything on this on SO or elsewhere. In short, I have an external table created in Hive that reads data from a folder of CSV files in HDFS. The issue is that while...

how do retrieve specific row in Hive?

hadoop,hive,hiveql
I have a dataset looks like this: --------------------------- cust | cost | cat | name --------------------------- 1 | 2.5 | apple | pkLady --------------------------- 1 | 3.5 | apple | greenGr --------------------------- 1 | 1.2 | pear | yelloPear ---------------------------- 1 | 4.5 | pear | greenPear ------------------------------- my hive...

HDinsight hive output to blob

azure,hadoop,hive,hdinsight
I am using Hive on HDinsight, and I want to store the output of the job in Azure storage (blob). I tried INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/' SELECT name, COUNT(*) as count FROM test GROUP BY name ORDER BY count DESC But this returned error "Error: java.lang.RuntimeException: Error in configuring object"....

Deleting rows from Hive external table

csv,hive
I have a Hive external table created from a list of CSV files. If I perform a Hive delete on a table row (or a number of rows), will the corresponding CSV record/s will be deleted as well?...

Table join in hive fails

hive
I have following data for the 2 tables in hive table1 contains locn zone NY AMERICA/CHICAGO KC AMERICA/DENVER LA AMERICA/CHICAGO table 2 contains stdtime locn 2015-03-04 15:00:00 NY 2015-03-04 16:00:00 KC This is my join query select s.zone,t.stdtime,to_utc_timestamp(t.stdtime,s.zone) as newtime from table1 s inner join table2 t on s.locn=t.locn; When...

Hive: filter a table using another table

sql,filter,hive
I am fairly new to hive and sql and I have a question about how I would go about the following: I have table A: Name id Amy 1 Bob 4 Josh 9 Sam 6 And I want to filter it using values from another table (table B): Value id...

Run spark SQL on CHD5.4.1 NoClassDefFoundError

hive,apache-spark,apache-spark-sql,pyspark
I setup my CHD5.4.1 to run some test Spark SQL on Spark. Spark work well but Spark SQL have some issues. I start pyspark as below: /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/spark/bin/pyspark --master yarn-client I want to select a table in Hive with Spark SQL: results = sqlCtx.sql("SELECT * FROM my_table").collect() It print error logs:...

Hadoop append data to hdfs file and ignore duplicate entries

java,hadoop,mapreduce,hive,hdfs
How can I append data to HDFS files and ignore duplicate values? I have a huge HDFS file (MainFile) and I have 2 other new files from different sources and I want to append data from this files to the MainFile. Main File and the other files has same structure....

Explode function returning single row

hadoop,hive,hql
I used the field type as Array. "Select col as sample_table" returns the below output. ["[-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442]"] When I used select explode(col) from sample_table. I get the output as below which is a single row. [-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442] I want the output in 3 rows as below. [-80.86598534884655,35.53423185253291] [-80.86598789514547,35.53423048990488]...

SQL to Hive query syntax

hive
How can I turn the below SQL to Hive query? I get an error FAILED: ParseException line 2:31 cannot recognize input near 'select' 't2' '.' in expression specification select ip, sum(case when score <> (select t2.score from score t2 where t2.timestamp = (select max(timestamp) from score where ip = t2.ip...

Reducers for Hive data

mapreduce,hive
I'm a novice. I'm curious to know how reducers are set to different hive data sets. Is it based on the size of the data processed? Or a default set of reducers for all? For example, 5GB of data requires how many reducers? will the same number of reducers set...

load struct or any other complex data type in hive

hadoop,hive,hiveql
I have a .xlsx file which contains data some thing like the below image, am trying to create using the below create query create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>, male_15_19 STRUCT<num:double, total:double, perc:double>, male_20_24 STRUCT<num:double, total:double, perc:double>,...

Why is hive Metatool updatelocation called upon Ambari migration from 1.6.0 to 2.0.0 to move locations to unwanted places?

hadoop,hive,hortonworks-data-platform,ambari
I am migrating my HDP2.1 hadoop cluster to HDP2.2.4. The first step is to migrate ambari from 1.6.0 to 2.0.0. After completing this step, I restarted my services. Starting "HiveServer2" through Ambari 2.0 fails whereas sudo service hive-server2 start, subsequent hive requests, and Ambari Hive Service check works. It fails...

passing argument from shell script to hive script

bash,hadoop,hive
I've a concern which can be categorized in 2 ways: My requirement is of passing argument from shell script to hive script. OR within one shell script I should include variable's value in hive statement. I'll explain with an example for both: 1) Passing argument from shell script to hiveQL->...

Automating Hive with python

python,python-2.7,hadoop,hive
I am running hive 0.12, and I'd like to run several queries and get the result back as a python array. for example: result=[] for col in columns: sql='select {c} as cat,count(*) as cnt from {t} group by {c} having cnt > 100;'.format(t=table,c=col) result.append(hive.query(sql)) result=dict(result) What I'm missing, is the...

Adding a storage container to Azure with different account?

azure,hive,windows-azure-storage,azure-storage-blobs,hdinsight
We are trying to use a common data for more than one Outlook accounts. Lets say data is stored in a container which belongs to [email protected] and I want to read it as [email protected], my friend wants to read from [email protected] I have common account's storage account name, container name...

Insert Overwrite: Cannot move … to the trash, as it contains the trash

hive,hdinsight
I am attempting to insert into a table by selecting from another: INSERT OVERWRITE TABLE testtable1 select * from testtable0 The error: Moving data to: wasb://{container}@{storage}.blob.core.windows.net/hive/scratch/hive_2015-06-01_15-05-14_062_6478651325775395196-1/-ext-10000 Loading data to table default.testtable1 rmr: DEPRECATED: Please use 'rm -r' instead. rmr: Cannot move "wasb://{container}@{storage}.blob.core.windows.net/" to the trash, as it contains the trash....

Impala mathematical operation containing avg fails with AnalysisException

hive,impala
I am attempting to subtract a value in a column (column_18) from the average of another column (avg(column_19)) and obtain this result as a third column (result) for each row of the table: cur.execute("Select avg(column_19) - column_18 as result FROM test1") This doesn't seem to be working well, and I...

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

java,hadoop,hive,hdfs,hiveql
1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;" 2) Create table 3) The table is creates successfully 4) But when I execute any select query it throws file not found exception hive> select count(*) from tab_tweets; Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806 Total jobs = 1 Launching Job 1...

Should 'show table extended' list the files under partitions?

hive,hdfs,hdinsight
I have separated files by year for an external table. I added partitions for each year. I ran the show table extended command and it shows: partitioned:true partitionColumns:struct partition_columns { i32 year} totalNumberFiles:0 totalFileSize:0 Should it list the files in each partition's location? DROP TABLE IF EXISTS myTable; CREATE EXTERNAL...

Error while creating external table in Hive using EsStorageHandler

hadoop,elasticsearch,hive
I am facing an error while creating an External Table to push the data from Hive to ElasticSearch. What I have done so far: 1) Successfully set up ElasticSearch-1.4.4 and is running. 2) Successfully set up Hadoop1.2.1, all the daemons are up and running. 3) Successfully set up Hive-0.10.0. 4)...

sqoop-export is failing when I have \N as data

hive,sqoop
Iam getting below error when I run my sqoop export command. This is my content to be exported by sqoop command 00001|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N 00002|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N sqoop command sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema...

Data in HDFS files not seen under hive table

hadoop,hive,sqoop,hadoop-partitioning
I have to create a hive table from data present in oracle tables. I'm doing a sqoop, thereby converting the oracle data into HDFS files. Then I'm creating a hive table on the HDFS files. The sqoop completes successfully and the files also get generated in the HDFS target directory....

Spark SQL HiveContext - saveAsTable creates wrong schema

hive,apache-spark,apache-spark-sql
I try to store a Dataframe to a persistent Hive table in Spark 1.3.0 (PySpark). This is my code: sc = SparkContext(appName="HiveTest") hc = HiveContext(sc) peopleRDD = sc.parallelize(['{"name":"Yin","age":30}']) peopleDF = hc.jsonRDD(peopleRDD) peopleDF.printSchema() #root # |-- age: long (nullable = true) # |-- name: string (nullable = true) peopleDF.saveAsTable("peopleHive") The Hive...

Minus query in HIVE

hive
Minus query seems to not work in HIVE. Tried ex: select x from abc minus select x from bcd ; Am I doing this wrong or minus query isn't defined for HIVE? If so, is there any other way to get the result for this?...