FAQ Database Discussion Community


Expected timestamp in the Flume event headers, but it was null

flume,flume-ng,flume-twitter
I am using below configuration details to push Twitter feeds into HDFS using Flume, but getting Expected timestamp in the Flume event headers, but it was null twitter.conf TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx...

flume for collecting syslog data

hadoop,bigdata,router,syslog,flume
I am trying to collect syslog from 10 devices(routers). I came to know that I can use syslog source, but need clarification about the host and ports in the properties. Whether they are the local port on the machine where flume agent is running. Also how to redirect syslogs to...

Flume + ElasticSearch Sink TTL

java,elasticsearch,flume
I've got a question about the TTL in elasticsearch sink of apache flume I've working on elastic search + flume integration. I'm using elasticsearch version 1.4.1 and flume version 1.5.2 Both are running locally on my machine In Flume My ElasticSearch Sink is configured as follows: agent.sinks.elasticSearchSink.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink agent.sinks.elasticSearchSink.channel...

Using local file system as Flume source

java,flume
I've just started learning Big Data, and at this time, I'm working on Flume. The common example I've encountered is for processing of tweets (the example from Cloudera) using some Java. Just for testing and simulation purposes, can I use my local file system as a Flume source? particularly, some...

Sending text message using Log4j2 with Flume

hadoop,log4j,bigdata,log4j2,flume
I have Log4j2 configuration: <?xml version="1.0" encoding="UTF-8"?> <configuration> <appenders> <Console name="console" target="SYSTEM_OUT"> <PatternLayout pattern="%d %-5p - %m%n"/> </Console> <Flume name="flume" > <MarkerFilter marker="FLUME" onMatch="ACCEPT" onMismatch="DENY"/> <Agent host="IP_HERE" port="6999"/> </Flume> <File name="file" fileName="flume.log"> <MarkerFilter marker="FLUME" onMatch="ACCEPT" onMismatch="DENY"/> </File> </appenders>...

Flume-ng hdfs sink .tmp file refresh rate control proprty

cloudera,flume,hortonworks-data-platform,flume-ng,flume-twitter
I am trying to refresh the .tmp file with additional events in every 5 minutes, my source is slow and it takes 30 min to get 128MB file in my hdfs sink. Is there any property in flume hdfs sink where I can control the refresh rate of .tmp file...

Save flume output to hive table with Hive Sink

hadoop,hive,flume
I am trying to configure flume with Hive to save flume output to hive table with Hive Sink type. I have single node cluster. I use mapr hadoop distribution. Here is my flume.conf agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 agent1.sources.source1.type = exec agent1.sources.source1.command = cat /home/andrey/flume_test.data agent1.sinks.sink1.type...

Loading csv file into HDFS using Flume (spool directory as source)

hadoop,hadoop-streaming,flume,hortonworks-data-platform,flume-ng
i am trying to load csv file (6MB) into HDFS using flume and spooldir as source and HDFS as sink and here's my configuration file: # Initialize agent's source, channel and sink agent.sources = TwitterExampleDir agent.channels = memoryChannel agent.sinks = flumeHDFS # Setting the source to spool directory where the...

Apache Flume (twitter)

twitter,flume
I am a beginner so kindly bear with me. I need to download twitter logs and would like to use Flume. However, I am not familiar with Java. Can Python be use with the Flume Agent ? Any links that I could refer to will be very helpful. thanks!...

How to use flume for uploading zip files to hdfs sink

flume,flume-ng
I am new to flume.My flume agent having source as http server,from where it getting zip files(compressed xml files) on regular interval.This zip files are very small (less than 10 mb) and i want to put the zip files extracted into the hdfs sink.Please share some idea how to do...

Unicode character with flume

csv,hadoop,unicode,flume
I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also. Once the file is there in HDFS I tried to view the content, but unable to see the records properly. File content Name age sal msg Abc 21 1200 Lukè éxample àpple Xyz 23 1400...

How do you enable server-side encryption when streaming data using Apache Flume to Amazon S3

encryption,amazon-s3,flume
I am streaming some sensitive log data to Amazon S3 using flume. I can't figure out how to set the flag/configuration in flume so that S3 uses server-side encryption.

I need a Cassandra Flume Sink

cassandra,nosql,flume,flume-ng
I am trying to find a template/sample of a Cassandra flume sink. I have looked online, and the two projects I have found on github have outdated dependencies (JARs), and I cant find those artifcats anywhere :(. Thanks! looking forward for any refs. ...

Flume HDFS Source

hdfs,flume
I want to use flume to transfert data from hdfs directory into also directory in hdfs, in this transfer I want to apply processing morphline. For example: my source is "hdfs://localhost:8020/user/flume/data" and my sink is "hdfs://localhost:8020/user/morphline/" Is it possible with flume? If yes, what is the type for the source...

Flume to HDFS split a file to lots of files

hadoop,hdfs,flume,flume-ng
I'm trying to transfer a 700 MB log file from flume to HDFS. I have configured the flume agent as follows: ... tier1.channels.memory-channel.type = memory ... tier1.sinks.hdfs-sink.channel = memory-channel tier1.sinks.hdfs-sink.type = hdfs tier1.sinks.hdfs-sink.path = hdfs://*** tier1.sinks.hdfs-sink.fileType = DataStream tier1.sinks.hdfs-sink.rollSize = 0 The source is a spooldir, channel is memory and...

Using flume to read IBM MQ data

hadoop,streaming,message-queue,flume
I want to read data from IBM MQ and put it into HDFs. Looked into JMS source of flume, seems it can connect to IBM MQ, but I’m not understanding what does “destinationType” and “destinationName” mean in the list of required properties. Can someone please explain? Also, how I should...

Get country from tweet with certain keywords

java,twitter4j,cloudera,flume
I am using TwitterSource for Flume from Cloudera. I want to get tweets by country with certain keywords. I'm not sure what to compare to when I want to get tweets from The Netherlands. I have the following which results in nothing being processed: public void onStatus(Status status) { if(status.getPlace().getCountry().equalsIgnoreCase("netherlands"))...

flume : find ip/hostname of event sender?

flume,flume-ng
I am trying to setup data pipeline where applications servers send (using log4j logging) logevents to flume (using flume log4j appender) over network , to a avrosource that flume agent is using I tried with below configration but It only appends IP of the host on which agent is running...

Hive External table not showing anything

hadoop,hive,flume
I am trying to learn Hive by following twitter data tutorial from the below link. https://github.com/cloudera/cdh-twitter-example/ I have successfully installed and configured hadoop and hive and tested simple text file load into hive table. All working good so far. However, even thought files existed in hdfs, external table is showing...

Flume agent does not stop retrying for unrecoverable solr error

solr,flume,avro,flume-ng
I am using Morphline Solr Sink to store information in Solr. The problem that I am facing is that flume agent never stops retrying the failed requests, which sometimes can increase over time. This results in the flume warning of MaxIO Workers being used and the system suffers with performance...

Data ingestion with Apache Storm

apache,storm,apache-kafka,flume
I have been reading a lot of articles where implementations of Apache Storm are explained for ingesting data from either Apache Flume or Apache Kafka. My main question remains unanswered after reading several articles. What is the main benefit of using Apache Kafka or Apache Flume? Why not collecting data...

Write CSV files to HDFS using Flume

hdfs,flume
I'm writing a number of CSV files from my local file system to HDFS using Flume. I want to know what would be the best configuration for Flume HDFS sink such that each file on local system will be copied exactly in HDFS as CSV. I want each CSV file...