A large number of Spring Batch Jobs would lead to the exhaustion of database connections?! #4745

wondywang · 2025-01-16T11:11:26Z

wondywang
Jan 16, 2025

Recently, I've been considering developing a batch processing system using Spring Batch. However, while looking into the Spring Batch architecture, I found a potential issue.

Each batch job depends on database records for task information. Through use cases, it was verified that one Job establishes 11 persistent connections to the database(This is the default configuration of spring.datasource and can be adjusted by spring.datasource.hikari.maximum-pool-size & spring.datasource.hikari.minimum-idle). So, if I'm designing a Spring Batch system where all Jobs connect to the same database, does this limit the number of Jobs that can run in parallel?

For example, if 1000 Jobs are running at the same time, would these 1000 Jobs establish 11,000 persistent connections to the database? This could quickly exhaust the database connection pool, leading to systemic failures.

I hope my understanding is incorrect.

Sax388 · 2025-01-21T15:05:07Z

Sax388
Jan 21, 2025

What do you need the parallel processing for? Did you read https://docs.spring.io/spring-batch/reference/scalability.html? I don't think that this is a spring batch specific issue.

Note also that there may be limits placed on concurrency by any pooled resources used in your step, such as a DataSource. Be sure to make the pool in those resources at least as large as the desired number of concurrent threads in the step.

What do you think?

2 replies

fmbenhassine Jan 31, 2025
Maintainer

Exactly, I answered the question from a job perspective, but the answer is the same at the step level as well. If the step is concurrent, there should be enough database connections from the pool to serve all concurrent threads, otherwise one might accept that some threads will be waiting for database connections to become free in order to proceed (and therefore not fully take advantage from concurrency).

wondywang Feb 5, 2025
Author

Thanks @Sax388

It has nothing to do with multiple steps in parallel. I am concerned about the number of database connections occupied by multiple independent jobs (independent processes) store/load the metadata in a database, not multiple parallel steps of the same job.

I don't know if my description is clear.

fmbenhassine · 2025-01-31T08:39:13Z

fmbenhassine
Jan 31, 2025
Maintainer

Spring Batch effectively opens/closes a database connection every time it interacts with the job repository to store/load the metadata it needs to function properly, but it does not hold any connection open any longer than that for meta-data related operations.

What can hold a database connection open for a long time is a Step, and this still really depends on the business logic of the step. For example, a tasklet step might just open/close a connection to do a quick operation on the database, while a chunk-oriented step can hold a jdbc cusror open for the duration of the dataset processing.

so saying "it was verified that one Job establishes 11 long connections to the database", yes that could be the case (you did not share the code) but if a single job uses that many connections then you should consider that in the design of your deployment patterns and configuration (ie adapt the connection pool size accordingly).

Do you see ? Let me know if you need more details on that.

2 replies

wondywang Feb 5, 2025
Author

Thanks @fmbenhassine

I found that about "it was verified that one Job establishes 11 long connections to the database", it is the default configuration of spring.datasource and can be adjusted by spring.datasource.hikari.maximum-pool-size & spring.datasource.hikari.minimum-idle.

And about the usage of database connection resources, I am concerned about the usage of database connections by multiple independent jobs (independent processes), rather than multiple parallel steps of the same job.

I don’t know if my description is clear.

fmbenhassine Feb 25, 2025
Maintainer

It is clear to me, thank you for the update. As mentioned, in that case, you need to adapt the connection pool accordingly. I see no other solutions to that, except limiting the number of parallel jobs. This is really a resource limit specific to your use case, and not a potential Spring Batch architecture issue as described in the initial question.

I think I answered your question, so I will let you accept whatever you think appropriate. Otherwise please let me know if you need more details and I will be happy to elaborate if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A large number of Spring Batch Jobs would lead to the exhaustion of database connections?! #4745

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A large number of Spring Batch Jobs would lead to the exhaustion of database connections?! #4745

wondywang Jan 16, 2025

Replies: 2 comments · 4 replies

Sax388 Jan 21, 2025

fmbenhassine Jan 31, 2025 Maintainer

wondywang Feb 5, 2025 Author

fmbenhassine Jan 31, 2025 Maintainer

wondywang Feb 5, 2025 Author

fmbenhassine Feb 25, 2025 Maintainer

wondywang
Jan 16, 2025

Replies: 2 comments 4 replies

Sax388
Jan 21, 2025

fmbenhassine Jan 31, 2025
Maintainer

wondywang Feb 5, 2025
Author

fmbenhassine
Jan 31, 2025
Maintainer

wondywang Feb 5, 2025
Author

fmbenhassine Feb 25, 2025
Maintainer