FAQ Database Discussion Community


Reuse results of first computation in second computation

java,flink
I'm trying to write a computation in Flink which requires two phases. In the first phase I start from a text file, and perform some parameter estimation, obtaining as a result a Java object representing a statistical model of the data. In the second phase, I'd like to use this...

Iterator behaviour in flink reduceGroup

scala,hadoop,flink
I am creating a system that should handle huge amount of data and I need to understand how the reduce group operator works I have a dataset where I apply a groupby and subsequently a reduceGroup How does the iterator that is passed to the reduceGroup function behave? is it...

What is/are the main difference(s) between Flink and Storm?

storm,flink
Flink has been compared to Spark, which, as I see it, is the wrong comparison; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases it compares a real-time vs. a batched event processing strategy, even if at a smaller "scale" in...

HDFS Path for Spark Submit and Flink on YARN

java,hadoop,apache-spark,hdfs,flink
i work with cloudera live vm, there i have a hadoop and spral standalone cluster. now i want submit my jobs with spark submit and flink run scripts. this works, too. but my apps can find the path to input and outputs files in the hdfs. i set the path...

Apache Flink Channel received an event before completing the current partial record

java,maven,flink
i read a csv file (http://data.gdeltproject.org/events/index.html) from disk with flink (java, maven version 8.1) and get the following exception: ERROR operators.DataSinkTask: Error in user code: Channel received an event before completing the current partial record.: DataSink(Print to System.out) (4/4) java.lang.IllegalStateException: Channel received an event before completing the current partial record....

zipWithIndex on Apache Flink

flink
I'd like to assign each row of my input an id - which should be a number from 0 to N - 1, where N is the number of rows in the input. Roughly, I'd like to be able to do something like the following : val data = sc.textFile(textFilePath,...

In Flink, stream windowing does not seem to work?

flink
I tried to enhance the Flink example displaying the usage of streams. My goal is to use the windowing features (see the window function call). I assume that the code below outputs the sum of last 3 numbers of the stream. (the stream is opened thanks to nc -lk 9999...

NoSuchMethod exception in Flink when using dataset with custom object array

scala,maven,flink
I have a problem with Flink java.lang.NoSuchMethodError: org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(Lorg/apache/flink/api/common/typeinfo/TypeInformation;)Lorg/apache/flink/api/java/typeutils/ObjectArrayTypeInfo; at LowLevel.FlinkImplementation.FlinkImplementation$$anon$6.<init>(FlinkImplementation.scala:28) at LowLevel.FlinkImplementation.FlinkImplementation.<init>(FlinkImplementation.scala:28) at IRLogic.GmqlServer.<init>(GmqlServer.scala:15) at...

How to flatMap a function on GroupedDataSet in Apache Flink

scala,hadoop,flink
I want to apply a function via flatMap to each group produced by DataSet.groupBy. Trying to call flatMap I get the compiler error: error: value flatMap is not a member of org.apache.flink.api.scala.GroupedDataSet My code: var mapped = env.fromCollection(Array[(Int, Int)]()) var groups = mapped.groupBy("myGroupField") groups.flatMap( myFunction: (Int, Array[Int]) => Array[(Int, Array[(Int,...

OutOfBoundsException with ALS - Flink MLlib

indexoutofboundsexception,recommendation-engine,flink
I'm doing a recommandation system for movies, using the MovieLens datasets available here : http://grouplens.org/datasets/movielens/ To compute this recommandation system, I use the ML library of Flink in scala, and particulalrly the ALS algorithm (org.apache.flink.ml.recommendation.ALS). I first map the ratings of the movie into a DataSet[(Int, Int, Double)] and then...

Linkage failure when running Apache Flink jobs

maven,flink
I have a job developed in Flink 0.9 that is using the graph module (Gelly). The job is running successfully within the IDE (Eclipse) but after exporting it to a JAR using maven (mvn clean install) it fails to execute on the local flink instance with the following error "The...

Create objects from input files in Apache Flink

scala,flink
I have a data set which is structured by folders and files. The folder / file structure itself is important for the data analysis. The Structure of the data set: folder1 +-----file11 +-----column1 +-----column2 Every file contains data which describe one object. The format of the files is consistent. Its...

Flink error - org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

maven,hadoop,flink
I am trying to run a flink job using a file from HDFS. I have created a dataset as following - DataSource<Tuple2<LongWritable, Text>> visits = env.readHadoopFile(new TextInputFormat(), LongWritable.class,Text.class, Config.pathToVisits()); I am using flink's latest version - 0.9.0-milestone-1-hadoop1 (I have also tried with 0.9.0-milestone-1) whereas my Hadoop version is 2.6.0 But,...

Output of Join in Apache Flink

scala,flink
In Apache Flink, if I join two data sets on one primary key I get a tuple 2 containing the corresponding data set entry out each of the data sets. The problem is, when applying a the map() method to the outcoming tuple 2 data set it does not really...