FAQ Database Discussion Community

Jdbc settings for connecting to Impala

What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)? I have tried a few including: jdbc:hive2://myserver:21050/;auth=noSasl And with the following driver: org.apache.hive.jdbc.HiveDriver I have added /opt/cloudera/parcels/CDH/lib/impala/lib/*:/opt/cloudera/parcels/CDH/lib/hive/lib/* to the classpath (but still no success) The result is: java.sql.SQLException: No suitable driver found for...

Impala mathematical operation containing avg fails with AnalysisException

I am attempting to subtract a value in a column (column_18) from the average of another column (avg(column_19)) and obtain this result as a third column (result) for each row of the table: cur.execute("Select avg(column_19) - column_18 as result FROM test1") This doesn't seem to be working well, and I...

RImpala: Query Failed When Larger Data

check1<-rimpala.query("select * from sum2") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.sql.SQLException: Method not supported dim(sum2) is 49501 rows and 18 columns. check1<-rimpala.query("select *from sum3") dim(sum3) is 102 rows and 6 columns. It worked with smaller sample size. sorry that I cant reproduce example to this. Is anyone encounter the...

Impala SQL - How to Truncate Timestamp to Day?

Using Cloudera's Impala SQL, is there a way to truncate a timestamp by day? i.e. go from: 2015-05-01 01:23:45 -> 2015-05-01 00:00:00 2015-05-01 12:34:56 -> 2015-05-01 00:00:00 2015-05-01 23:45:59 -> 2015-05-01 00:00:00 The default TRUNC options only seem to enable Week or Hour, not Day......

What happens if impala Query runs out of memory?

What happens if impala Query runs out of memory, 1.) Does the Impala Daemon Crash 2.)Or Writes to disk(Like Spills onto disk and becomes slower!!!) A detailed explanation would help! Thanks in advance!...

How to select unique columns from X columns but show X + Y columns from the results in SQL?

This table is on our impala cluster, although, the impala shell seems very similar to SQL. I have very little SQL or Impala experience, outside of doing extremely simple queries so this problem might be way above my experience and pay grade. I need to get some data from our...

Where is impala data storage allocated ?

Is there any designated directory for impala to store data in case the table is entirely managed by impala? Or it is up to datanode to allocate the data to anywhere in HDFS configured by datanode directory such as /dfs/dn? In case my app is running a node that is...

Access denied when “LOAD DATA” in hive

I have read access to an HDFS path 'XXX', when I execute some statements like "LOAD DATA INPATH 'XXX' into table YYY ..." An error occurs: ERROR: AuthorizationException: User 'ZZZ' does not have privileges to access: 'XXX' Why a "LOAD DATA" statement need more privileges than "READ"?...

Faster technologies compared to Impala in Hadoop stack?

Have been using Impala for over 4 months and we have found it to be very fast compared to hive in most cases. Since we are in a process of assessing fast frameworks in Hadoop for data processing wanted to know if their are any more frame works in the...

Is there any way to run impala shell with sql script with parameters?

Is there any way to run impala shell with sql script with parameters? For example impala-shell –f /home/john/sql/load.sql /dir1/dir2/dir3/data_file I got errors Error, could not parse arguments "–f /home/john/sql/load.sql /dir1/dir2/dir3/data_file”...

Hive Query Language return only values where NOT LIKE a value in another table

I'm trying find all the values in my hosts table, which do not contain partial match to values in my maildomains table. hosts +-------------------+-------+ | host | score | +-------------------+-------+ | www.gmail.com | 489 | | www.hotmail.com | 653 | | www.google.com | 411 | | w3.hotmail.ca | 223 |...

Cloudera Hadoop quick Start VM Impala Error

I am trying to run impala on cloudera quick start vm. I installed impala / impala-server / impala-state-store / impala-catalog. Then I did impala-shell and got following message : `Starting Impala Shell without Kerberos authentication Error connecting: TTransportException, Could not connect to localhost.localdomain:21000 Welcome to the Impala shell. Press TAB...

Impala SQL - How to count Months between timestamps?

Using Impala SQL, is there a concise way to count the # of months between two timestamps? The datediff function only returns the # of days between two timestamps, but I'm hoping there is an elegant way to perform the same calculation on # of months. ...

Is Impala nested data available ?

according to this http://impala.io/overview.html, impala 2.1 should support nested data. I am using impala 2.1.1 but didn't find any documentation about nested data. Does anyone know when nested data will be supported by Impala?

Create External Hive Table Pointing to HBase Table

I have a table named "HISTORY" in HBase having column family "VDS" and the column names ROWKEY, ID, START_TIME, END_TIME, VALUE. I am using Cloudera Hadoop Distribution. I want to provide SQL interface to HBase table using Impala. In order to do this we have to create respective External Table...

Impyla error when port is specified

I am using impyla 0.9.0, if I specify port in the connect conn = impala.dbapi.connect(host='n1', port=21000) I will get the following error Traceback (most recent call last): File "./myquery.py", line 78, in <module> main(len(sys.argv), sys.argv) File "./myquery.py", line 58, in main cur = conn.cursor() File "/usr/lib/python2.6/site-packages/impala/dbapi/hiveserver2.py", line 55, in cursor...

Divide columns values by total rows in impala

SELECT COUNT(DISTINCT cgi.sample_idSince Impala does not allow the SET operation, or subqueries in a select statement, I'm having a hard time figuring out how to divide column values by the total number of rows returned. My ultimate goal is to calculate minor allele frequency at each chr:start position. My data...

Impala Query: Combine multiple COUNT DISTINCT WHERE clauses

On impala, I'm trying to count the number of distinct sample id's that contain "101", "102" or "103". Here is an example of my data: | sample_id | ___________ | 101-345-5 | | 101-345-6 | | 101-345-6 | | 102-345-5 | | 103-345-5 | | 103-345-8 | | 103-345-8 |...

Is it possible to load parquet table directly from file?

If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to...

Fetch HBase Column in Bash Array Using Impala

I have following data in HBase Table named HISTORY. ID VALUES 51 101 52 102 QUERY="SELECT VALUES FROM HISTORY"; How to apply the above query on HBase table to fetch data in bash array using Impala?...

SQL find records from table 1 that are not in table 2 OR in table 2 with condition

I see many examples on how to find records that are not in another table, but I'm having a lot of trouble finding records that are either not in table 2, or are in table two, but the freq column value is less than 10%. I'm first joining a list...

Why Impala generates multiple files for one insert statement

It is supposed that only one file should be generated for a single "insert...select" statement, while 20 files are generated in my case. How can I reduce the result into one single file?