spring-batch,partitioning , More than one Partitioner in a Spring batch job

More than one Partitioner in a Spring batch job


Tag: spring-batch,partitioning

I have two different files(both are different layouts) which i am splitting it as multiple files to make use of local step partitioning.

so far i am handling with one file and i have created one custom partitioner class to make use of step partitioning.

Now i want to include another file, so i am planning to create another partitioner class for this new file. (Idea is to create another step for new file). Or can we use the same Partitioner for both steps?

Will this work or we need to create separate job for each files?

This is my current configuration:

<batch:step id="step9">
        <batch:partition step="loadFlatFiles" partitioner="multiFileResourcePartitioner">
                    <batch:handler grid-size="15" task-executor="loadCustomerTaskExecutor" />

<bean id="multiFileResourcePartitioner" class="com.cdi.batch.partitioner.MultiFileResourcePartitioner"
        <property name="keyName" value="fileResource" />
        <property name="fileName" value="fileName" />
        <property name="directory" value="file:${input.files.location}" />

Please do let me know if this approach is correct and is there any problems will happen in this approach?

Regards, Shankar.


What you have there should work fine. Since the partitioner is step scoped, each step should get it's own instance.


What is error in partition code?

Below code is working for Array{ 4 5 3 7 2 }, but not working for other test case given on HackerRank Site. What is error in my code? Am I doing any wrong while merging the two array a1[] and a2[] into the ar[] ? https://www.hackerrank.com/challenges/quicksort1 #include <stdio.h> #include...

Spring Batch Add Custom Fields

I've never used Spring Batch before but it seems like a viable option for what I am attempting to accomplish. I have about 15 CSV files for 10 institutions that I need to process nightly. I am stashing the CSV into staging tables in an Oracle database. The CSV File...

How does Spring Batch manage transactions (with possibly multiple datasources)?

I would like some information about the data flow in a Spring Batch processing but fail to find what I am looking for on the Internet (despite some useful questions on this site). I am trying to establish standards to use Spring Batch in our company and we are wondering...

how to run asynchronous queries with Spring

I need to use asynchronous queries using Spring framework. I use Cassandra and Java driver from Datastax. How can call the executeAsync method and get the results.

Exclusive batch jobs with javax.batch/jsr352

We have an application which does a lot of imports and exports - basically between CSV files and database tables. Some of the imports and exports are conflicting (you can't execute them simultaneously) for various reasons (like "legacy code"). We were looking into javax.batch. Conceptually it suits very well. But...

PySpark repartitioning RDD elements

I have a spark job that reads from a Kafka stream and performs an action for each RDD in the stream. If the RDD is not empty, I want to save the RDD to HDFS, but I want to create a file for each element in the RDD. I've found...

Spring batch FileItemWriter not creating file at correct path

I have a spring batch service containing a FileItemReader,FileItemProcessor and FileItemWriter.When creating the FileItemWriter I have to set the Resource that will be my output file. I am running the batch service on websphere on a Linux machine.The problem is if I set the resource as new FileSystemResource(new File("opt\temp1\myFile.txt")), the...

Hive output larger than dfs blocksize limit

I have a table test which was created in hive. It is partitioned by idate and often partitions need to be inserted into. This can leave files on hdfs which have only a few rows. hadoop fs -ls /db/test/idate=1989-04-01 Found 3 items -rwxrwxrwx 3 deployer supergroup 710 2015-04-26 11:33 /db/test/idate=1989-04-01/000000_0...

Spring Batch | MongoItemReader | How to pass JobParameters to mongo query?

How can I pass JobParameters to MongoItemReader query ? My ItemReader looks like :- @Bean public ItemReader<Person> PersonTenantBasedItemReader() { MongoItemReader<Person> reader = new MongoItemReader<Person>(); reader.setTemplate(mongoTemplate); reader.setTargetType((Class<? extends Person>) Person.class); reader.setQuery("{status:'XYZ',nextCheckpointDate:{$gte:?fromDate,$lte:?toDate}"); // !!!!I want to pass fromDate and toDate as job parameters. !!!! Map<String, Direction> sorts = new HashMap<String,...

How to set property using “tasklet ref” tag

I have a tasklet ValidarSituacaoTasklet that has an property situacao. This tasklet is used in 2 steps in distinct values for situacao. I declared steps as like: and the bean: <bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step"> </bean> I have to pass 'situacao' to tasklet . I tried: <step id="validaSituacaoStep"> <tasklet ref="validarSituacaoTasklet ">...

log4j2.xml loaded but not applied [JVM argument]

I am trying to create a batch using spring batch, spring mvc and spring boot. And I am using log4j2 for logging. My goal is to load an external log4j2.xml configuration file thanks to JVM arguments like that -Dlog4j.configurationFile=file://C:\{path}\Workspace\demo-indexeur\config\log4j2.xml Spring boot detect my file (I have no error in the...

Spring Batch Item Reader is executing only once

Trying to implement Spring batch,but facing a strange problem,Our ItemReader class is executing only once. Here below is the detail. If we have 1000 rows in DB. Our Item reader fetch 1000 rows from DB,and pass list to ItemWriter ItemWriter successfully delete all items. Now ItemReader again tries to fetch...

Overwriting spring-boot autoconfiguration

I'm a little bit confused with the behaviour of spring-boot when overwriting specific autoconfigurations. I like to partly overwrite the BatchAutoConfiguration, but I guess, my question is not specific to BatchAutoConfiguration. Actually, I just want to "overwrite" two methods of this class: public BatchDatabaseInitializer batchDatabaseInitializer() and public ExitCodeGenerator jobExecutionExitCodeGenerator(). Therefore,...

Splitting a ruby file at a pattern?

I have a large ruby file that contains product data. I'm trying to split the file into sections based on a regular expression. I have product headers denoted by the word Product followed by a space and then a number. After that, I have a bunch of lines containing product...

DROP an one year old partition of a table in Oracle

I had to drop a partition of a table which is one year old. Now, in the all_tab_partitions , the HIGH_VALUE column is of LONG datatype and my table is partitioned on RANGE (date column) . Hence, I had to figure out a way to read this column and then...

Spring batch generating reports

I would like to generate a summary report at the end of my batch execution. For ex: I have an ItemProcessor which receives accountId. for every accountId: get MarketplaceId's for every marketplaceId: call real time availability At the end of batch execution I need to provide a nice summary in...

Spring Batch Execution Status Backed by Database

From the Spring Guides: For starters, the @EnableBatchProcessing annotation adds many critical beans that support jobs and saves you a lot of leg work. This example uses a memory-based database (provided by @EnableBatchProcessing), meaning that when it’s done, the data is gone. How can I make the execution state backed...

How to initialize custom ItemReader?

I have created my custom ItemReader: @Component("pricereader") public class MyItemReader implements ItemReader<Price>{ @Override public Price read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException { // TODO Auto-generated method stub return null; } } Calling it in a job thus defined: <batch:job id="job1"> <batch:step id="step1"> <batch:tasklet> <batch:chunk reader="pricereader" processor="priceprocessor" writer="pricewriter" commit-interval="1"/> </batch:tasklet>...

Change Spring Boot project to inherit custom dependency management

I have a small Spring Boot application which I must adapt to use a custom parent. Google finds lots of examples of how to migrate to Spring Boot, but I am at a loss as to migrate from Spring Boot (which I barely know to begin with). The reasons To...

How do I use JdbcPagingItemReader in Spring Batch when my primary key is a string?

I've got some SQL data that I need to access with an ItemReader in Spring Batch. I thought that JdbcPagingItemReader / PagingQueryProvider would be the perfect fit. The table I'm selecting from has a primary key that is a composite of three columns: INTEGER, VARCHAR, and VARCHAR. And actually, for...

Spring Batch : custom ItemReader

I have a Spring Batch project with a simple custom reader and writer. When i run the code i end up with an endeless loop printing the first item "item 1". What am i doing wrong? Here is my code: Reader.java public class Reader implements ItemReader<SimpleItem> { public SimpleItem read()...

Spring GS - Creating a Batch Service missing output from db query

I have run the complete source for Getting Started - Creating a Batch Service Knowing that the sample uses the memory-based database provided by the @EnableBatchProcessing, is the db query result expected or it will only be available if data will be persisted permanently? After adding some debug lines, it...

To read Excel can we use Spring batch?

I want to know if it is possible to use Spring Batch, in order to read from an file Excel and save it in Database. remark : the content of file Excel chang every 2 hours. And if it is not possible with Spring Batch, what other solution can i...

More than one tasklet in a step?

I have a tasklet setting some information into my JobContext, and another one checking some stuff to know if I can execute the next steps in my batch or not. Both tasklets work well if I use two different steps in my job flow, but I'd like to use these...

In Apache Spark, why does RDD.union does not preserve partitioner?

As all knows Spark partitioner has a huge performance impact on any "wide" operations, so its usually customized in operations. When I test partitioner with the following code: val rdd = sc.parallelize(1 to 50).keyBy(_ % 10).partitionBy(new HashPartitioner(10)) val rdd2 = sc.parallelize(200 to 230).keyBy(_ % 13) val cogrouped = rdd.cogroup(rdd2) println("cogrouped:"...

Using Postgresql is it normal for the master partition table have rows inserted into it along with the child table?

Using the the example Postgres Partitioning Docs 9.3 should the master table "measurement" get rows inserted when performing inserts after creating the trigger functions and the trigger? Using the example given in the docs upon performing a insert both the master and the child table have rows inserted. I though...

Batch job initialization fails on error while creating bean with name 'batchPropertyPostProcessor'

I'm trying to implement sample batch application using JSR-352 API and Spring Batch 3.0.4 as implementation. Batch job execution fails during initialization phase on error while creating bean with name 'batchPropertyPostProcessor': Exception in thread "main" javax.batch.operations.JobStartException: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'batchPropertyPostProcessor': Injection of autowired dependencies failed; nested exception...

Spring batch for rules

I am new to Spring Batch. I need to run a nightly batch process that: reads records from table A, for each record in table A, run about 10 business rules with logic involve reading data from the database (in each rule) and write into table B in each rule,...

Spring batch diffrence between Multithreading vs partitioning

I cannot understand the difference between multi-threading and partitioning in Spring batch. the implementation is of course different. In partitioning you need to prepare the partitions then process it. I want to know what is the difference and witch one is more efficient way to process when the bottle neck...

how to best approach to use spring batch annotation or xml files ?

firstly, thanks for attention,in my spring batch project defined many jobs , for example: <batch:job id="helloWorldJob1" job-repository="jobRepository"> <batch:step id="step1" > <batch:tasklet> <batch:chunk reader="itemReader1" writer="itemWriter1" processor="itemProcessor1"> </batch:chunk> </batch:tasklet> </batch:step> </batch:job> <batch:job id="helloWorldJob2" job-repository="jobRepository"> <batch:step id="step1" > <batch:tasklet> <batch:chunk...

Spring Batch - Write multiple files based on records count

In spring batch, I have an requirement to read from the database and to write in a file, The no of rows allowed in a file is N, so if N+10 records are fetched then two files should be created containing N rows and 10 rows respectively. Can someone please...

Execute database operations inside a chunck orientad step

I have a chunk oriented processor in the form "reader / processor / writer" called Job1. I have to execute database EJB operations after this job ends, if possible, in the same transaction. I have others jobs (implemented by Tasklets) that I could do this in a simply manner. I...

Setting EXIT_MESSAGE in batch_job_execution

One of the step in my job is having an exception and hence the job is failing with the EXIT_CODE "FAILED". Now I want to set the EXIT_MESSAGE as well, I did the following but the message is not getting set.. Any ideas?? chunkContext.getStepContext().getStepExecution().getJobExecution().setExitStatus(ExitStatus.FAILED); ExitStatus es = jobExecution.getExitStatus(); es =...

Spring Batch reader file by file

I'm developing a Spring webapp, using spring boot and spring batch frameworks. We have a set of complex & different json files, and we need to: read each file slightly modify its content finally store them in mongodb. The question: It makes sense to use spring batch for this task?...

Spring batch - ItemReader within another itemreader or Itemprocessor

Here is my requirement : Create a batch job that 1. Fetches discount programs from Discount table for specific search critieria 2. For each discount program fetched in Step1, Get sales records for sales that fit the discount program dates Get additional details for sales from some other tables 3....

PostgreSQL: update record on master table and move record in child partitions

How to define an update trigger function for Updating records in the master table which has partitions defined on it. I've a table which has partitions defined on it and code for the insert trigger function which will insert data in the child tables whenever there is an insert in...

Postgresql Table Partitioning Django Project

I have a Django 1.7 project that uses Postgres 9.3. I have a table that will have rather high volume. The table will have anywhere from 13million to 40million new rows a month. I would like to know what the best way to incorporate Postgres table partitioning with Django?...

Using partition key in function Oracle

We have a partioned table in our Oracle database using this syntaxe: ... PARTITION BY RANGE(saledate) (PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')), PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')), ... We usually use partition key in select statement like this: Select * from table where saledate >= trunc(sysdate-3) and...

spring integration vs spring batch [on hold]

We have application where we are receiving file every day and it needs to be parsed and persist in db. File has 5000 record. Should we use spring batch or spring integration? And why? we do need to skip bad record and audit them....

How to create Master Job to process multiple spring batch job?

We have multiple spring batch job.But Each of them needs to be started individually. Is there any way to create a Master Job or any controller in spring which will be responsible for executing all other batch jobs? So that we just have to execute the master job only,and all...

BASH - Meaning of making of partition code

I'm reading a .sh file that contains code of making of partition. But I don't understand these lines: cat <<EOF > fdisk.input x h #heads 16 s #sectors 63 c #cylinders EOF echo $kbytes >> fdisk.input cat <<EOF >> fdisk.input r n p 1 a 1 w EOF fdisk hd.img...

Removing duplicate code from Spring job configuration

Below is my step configuration - <beans:bean id="myInputFileReader" class="com.rbos.fm.risk.batch.spring.reader.InputFileReader" scope="step"> <beans:property name="delegate"> <beans:bean class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"> <beans:property name="resource" ref="inputFileSystemResource" /> <beans:property name="linesToSkip" value="1" /> <beans:property name="lineMapper"> <beans:bean...

Running a specific spring batch job amongst several jobs contained withing a spring boot fat jar

I am trying to run a spring batch job from a spring boot fat jar and I am having issues referencing the nested jars. Here is the command I use: java -cp bignibou-batch-core/build/libs/bignibou-batch-core.jar:lib/spring-batch-core-3.0.3.RELEASE.jar org.springframework.batch.core.launch.support.CommandLineJobRunner com.bignibou.batch.configuration.BatchConfiguration mailingJob Notice how I reference the nested spring batch jar using the colon. Why is...

Kafka partitions meaning

When we decide about partitions, should we do that on per-topic base, or it is topic-wide decision? If T1 partitioned on 3 partitions, and T2 partitioned on 2 partitions, can they both be consumed by 1 consumer? Or it is better make equal number of partitions if topics must be...

What is the best approach for loading data from DB by multiple threads

I have some data in database in parent child relation, where my table is actually representing a forest of tree data structure. And the table structure is like: row1 parent: null row2 parent:row1 row3 parent:row2 row4 parent:row1 Now when I am loading this data from DB to my data structure...

Could not autowired Object in ItemStreamReader open method

I use Spring Batch with Spring Boot and here is a my main class. @SpringBootApplication public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } } Here is my configuration classes @Configuration public class AppConfig { @Bean public MyObject getObject() { return new MyObject(); } } @Configuration...

how to run async batch job in batch-int:job-launching-gateway?

Firstly thanks for attention, I combined spring integration and spring batch in my project, i want to launch jobs in asynchronous mode in batch-int:job-launching-gateway, my means is that each message in input channel launch job in asynchronous and not wait util to complete the jobs,my code is: <batch-int:job-launching-gateway request-channel="outboundJobRequestChannel" reply-channel="jobLaunchReplyChannel"/>...

how to use ExecutorChannel in spring integration?

Firstly thanks for attention i defined ExecutorChannel and task executor in my spring integration project, for async processing on messages with spring batch, as bellow : <bean id="ftpSessionFactory" class="org.springframework.integration.ftp.session.DefaultFtpSessionFactory"> <property name="host" value="${ftp.server.ip}"/> <property name="port" value="${ftp.port}"/> <property name="username" value="${ftp.username}"/> <property name="password" value="${ftp.password}"/> <property name="clientMode"...

@BeforeStep annotated method not being called

I am writing a spring batch job. But when this Archive class which implenets the tasklet interface in loaded, the method under the annotation @BeforeStep is not being called. Can anyone help me with this ? Thank You import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.nio.file.Files; import java.nio.file.StandardCopyOption;...

How to call a specific method of a tasklet

In a job context there are an 'method' parameter so a could call directly an tasklet method as said in the documentation "If the tasklet is specified as a bean definition, then a method can be specified and a POJO will be adapted to the Tasklet interface. The method suggested...