What are the prerequisites for batching queries to the database? #4765

Sax388 · 2025-02-21T08:40:47Z

Sax388
Feb 21, 2025

I've got the problem that my batch application takes too much time reading the data. I think it's because it's contacting the database for every item instead of getting a chunk (closing the connection, if the chunk size is greater than the result set) and only querying the database once for each chunk.

This is my current ItemReader implementation:

@Bean
@StepScope
public JpaCursorItemReader<RevenueMessageEntity> revenueMessageTableReader(
    @Value("#{jobParameters['tenant.id']}") String tenantId,
    @Value("#{jobParameters['period.start']}") LocalDateTime periodStart,
    @Value("#{jobParameters['period.end']}") LocalDateTime periodEnd) {

  String jqlString =
      "SELECT vm FROM RevenueMessageEntity vm "
          + "LEFT JOIN FETCH vm.travelData WHERE "
          + "vm.tenantIdFkvp = :tenantIdFkvp AND "
          + ":startDate <= vm.dateTimeSale AND vm.dateTimeSale < :endDate AND "
          + "vm.id IN ("
          + "    SELECT MAX(vm2.id)"
          + "    FROM RevenueMessageEntity vm2"
          + "    GROUP BY vm2.messageId"
          + ") ORDER BY vm.dateTimeSale, vm.messageId ASC";

  return new JpaCursorItemReaderBuilder<RevenueMessageEntity>()
      .name("billingDataTableReader")
      .queryString(jqlString)
      .parameterValues(
          Map.of(
              "tenantIdFkvp",
              UUID.fromString(tenantId),
              "startDate",
              periodStart,
              "endDate",
              periodEnd))
      .entityManagerFactory(revenueMessageEntityManagerFactory)
      .build();
}

In my application.yml I have:

spring:
  jpa:
    properties:
      hibernate:
        default_batch_fetch_size: 1000
        max_fetch_depth: 3
        jdbc:
          batch_size: 1000

Please advise 🙂. I you need more context I should be able to provide it.

EDIT: I just had a small epiphany: Spring Batch of course is talking to a database for every item: The metadata tables (https://docs.spring.io/spring-batch/reference/schema-appendix.html)! And in our case this is a remote database so of course there's some time lost on the way. What's the advice in these circumstances? Only communicating the state after each chunk would be fine.

EDIT2: Unfortunately after changing the @BatchDataSource to an embedded H2 database it's still as slow as before 🙈 , so it has to be something else.

Answered by Sax388

Feb 23, 2025

It was the very famous N+1 problem. Adding a few LEFT JOIN FETCH did the trick (after trying lots of other stuff).

View full answer

Sax388 · 2025-02-23T10:51:10Z

Sax388
Feb 23, 2025
Author

It was the very famous N+1 problem. Adding a few LEFT JOIN FETCH did the trick (after trying lots of other stuff).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What are the prerequisites for batching queries to the database? #4765

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What are the prerequisites for batching queries to the database? #4765

Uh oh!

Uh oh!

Sax388 Feb 21, 2025

Replies: 1 comment

Uh oh!

Sax388 Feb 23, 2025 Author

Sax388
Feb 21, 2025

Sax388
Feb 23, 2025
Author