FAQ Database Discussion Community


Cloudera Hadoop quick Start VM Impala Error

virtual-machine,cloudera,impala
I am trying to run impala on cloudera quick start vm. I installed impala / impala-server / impala-state-store / impala-catalog. Then I did impala-shell and got following message : `Starting Impala Shell without Kerberos authentication Error connecting: TTransportException, Could not connect to localhost.localdomain:21000 Welcome to the Impala shell. Press TAB...

Cloudera installation failed to detect root privileges on CentOS

linux,ssh,centos,cloudera,cloudera-manager
I tried to adding new host into the cluster o CentOS. It fails on install & gives "Installation failed. Failed to detect root privileges" in status. I know that Cloudera needs user to have passwordless privileges ("Root access to your hosts is required to install the Cloudera packages. This installer...

Convert Json Data into specific table format using Pig

json,hadoop,apache-pig,bigdata,cloudera
I have Json file that has following format: "Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}] "Properties2":[{"K":"A","T":"String","V":"W”"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}] I want to extract data in table format from above mentioned json format by using pig: Expected Format: Note: - In first record C column should be blank or null because in first record there is no...

Cloudera Twiiter Hive Query failure

twitter,cloudera,hadoop-streaming
Team, Curious to know if anyone succeeded in executing query for Twitter Cloudera Example? I added mentioned SerDe Jar in Beewax file resources as Jar, still I am getting the error for any query. Query: SELECT t.retweeted_screen_name, sum(retweets) AS total_retweets, count(*) AS tweet_count FROM (SELECT retweeted_status.user.screen_name as retweeted_screen_name, retweeted_status.text, max(retweet_count)...

Oozie null pointer exception when submitting jobs

cloudera,oozie,oozie-coordinator
Just trying to run a very simple word count example but getting the following null pointer when submitting the job: oozie job -oozie=http://localhost:11000/oozie/ -config job.properties -run [[email protected] Oozie_Example]$ oozie job -oozie=http://localhost:11000/oozie/ -config job.properties -run java.lang.RuntimeException: java.lang.NullPointerException at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1242) at...

Get country from tweet with certain keywords

java,twitter4j,cloudera,flume
I am using TwitterSource for Flume from Cloudera. I want to get tweets by country with certain keywords. I'm not sure what to compare to when I want to get tweets from The Netherlands. I have the following which results in nothing being processed: public void onStatus(Status status) { if(status.getPlace().getCountry().equalsIgnoreCase("netherlands"))...

Why does automatic failover break when running both HA HDFS and MR1?

hadoop,cloudera,cloudera-cdh
I am re-configuring a Hadoop cluster to use the High Availability (HA) features for both the shared filesystem and the MR1 jobtracker. It seems I can't get the automatic failover features for both to work at the same time. Instead one of the services is stuck with both (all) daemons...

Oozie date time start

cloudera,hue,restfb,flume-ng,oozie-coordinator
I have a custom source of my own running on my flume.config that is responsible for extract data from a Facebook page every hour. I'm wondering if there is any way of set the period of extraction with the time start of my coordinator? Like, I set my coordinator to...

Can apache drill work with cloudera hadoop?

cloudera,apache-drill
I am trying to setup apache drill in distributed mode. I already have cloudera hadoop cluster with a master and 2 slaves. From documentation given on apache drill, its not pretty clear if it can be set up with typical cloudera cluster. I could not find any relevant articles. Any...

what does “Encountered: after : ”“ ” mean using pig

hadoop,apache-pig,cloudera
I am a beginner on Hadoop and Pig. I examined the example proved in cloudera virtual image, and modefied it to count Top 5 frequent words: Lines = LOAD '/user/hue/pig/examples/data/midsummer.txt' as (line:CHARARRAY); Words = FOREACH Lines GENERATE FLATTEN(TOKENIZE(line)) AS word; Groups = GROUP Words BY word; Counts = FOREACH Groups...

NameError: uninitialized constant SingleColumnValueFilter

hbase,cloudera
I am trying to use hbase filter using this code, hbase(main):001:0> scan 'students', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),Bytes.toBytes('name'), CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('emp1')))} and this code give the error like, NameError: uninitialized constant SingleColumnValueFilter Please let me know what I am doing wrong or what I need to do for get filter result....

Broken packages error while installing zookeeper-server

hadoop,cloudera,zookeeper,apt,cloudera-cdh
I am running Ubuntu 14.04 LTS (Trusty) and was trying to install Cloudera Hadoop with Yarn by following this tutorial. Under the options I choose To add the CDH 5 repository: and customized url for trusty OS. Then I installed Zookeeper but while installing Zookeeper-Server it is giving me following...

Flume-ng hdfs sink .tmp file refresh rate control proprty

cloudera,flume,hortonworks-data-platform,flume-ng,flume-twitter
I am trying to refresh the .tmp file with additional events in every 5 minutes, my source is slow and it takes 30 min to get 128MB file in my hdfs sink. Is there any property in flume hdfs sink where I can control the refresh rate of .tmp file...

Copy files from Remote unix and windows servers into HDFS without intermediate staging

hadoop,hdfs,cloudera,biginsights,hortonworks
I am trying to see if there is anything for copying files from remote unix and windows servers into HDFS without intermediate staging from the command line. Thanks for the help...

Switch a disk containing cloudera hadoop / hdfs / hbase data

hadoop,hbase,database-migration,cloudera,disk-partitioning
we have a Cloudera 5 installation based on one single node on a single server. Before adding 2 additional nodes on the cluster, we want to increase the size of the partition using a fresh new disk. We have the following services installed: yarn with 1 NodeManager 1 JobHistory and...

What is the benefit of using CDH (cloudera)? [closed]

hadoop,bigdata,apache-spark,cloudera,cloudera-cdh
Why we use CDH (cloudera) instead of using Apache-Hadoop or Apache-Spark ets. solely? What is it's advantages? If I want to use Apache-Spark for data analysis, is it better to use CDH or Apache-Spark Framework Solely? Thanks...

YARN UNHEALTHY nodes

hadoop,distributed-computing,cloudera,yarn,cloudera-cdh
In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4...

CDH autodeployment via API does not set the CDH version for the hosts

python,api,hadoop,cloudera
I am trying to deploy automatically the version 4.7.1 of Cloudera CDH using the Python API of the Cloudera Manager 5.3.1. I am following the example here: https://github.com/cloudera/cm_api/blob/master/python/examples/auto-deploy/deploycloudera.py Once that I init the cluster, create all the services that I need (Zookeeper, HDFS, MapReduce, and HBase) and start the cluster,...

What happens if impala Query runs out of memory?

hadoop,cloudera,impala,mpp
What happens if impala Query runs out of memory, 1.) Does the Impala Daemon Crash 2.)Or Writes to disk(Like Spills onto disk and becomes slower!!!) A detailed explanation would help! Thanks in advance!...

Hive Query Language return only values where NOT LIKE a value in another table

hadoop,hive,cloudera,hiveql,impala
I'm trying find all the values in my hosts table, which do not contain partial match to values in my maildomains table. hosts +-------------------+-------+ | host | score | +-------------------+-------+ | www.gmail.com | 489 | | www.hotmail.com | 653 | | www.google.com | 411 | | w3.hotmail.ca | 223 |...

Hive query executing differently in Hive client and JDBC

hadoop,jdbc,hive,cloudera
The following query I executed via Hive client, Java program JDBC and beeline. SELECT * FROM TABLE_ONE AS t1 JOIN TABLE_TWO t2 ON t2.p_id = t1.p_id AND t2.p_n_id = t1.p_n_id AND t2.d_id = t1.d_id JOIN TABLE_THREE t3 ON t3.d_m_id = t1.d_m_id AND t3.d_p_id = t1.d_p_id JOIN TABLE_FOUR t4 ON t4.c_id...

Change IP address of a Hadoop HDFS data node server and avoid Block pool errors

hadoop,hdfs,cloudera,cloudera-manager
I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service. Initialization...