Skip to content

DATAJDBC-356 - Support for large datasets with stream #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

detinho
Copy link
Contributor

@detinho detinho commented Oct 19, 2019

This is a work in progress

I decided to open the pull request so it is easier to keep track of changes and get feedback.

I stared working on a solution for DATAJDBC-356. I did not started mybatis support yet and will start working on tests and better documentation / examples, but would like to discuss the solution API and implementation wise.

For this case, I created some support classes to return an SqlRowSet backed by a connected ResultSet and it's DataSource. I did it this way so I can keep track of all resources and close them when appropriated.
On the API side, I created a new @QueryStream annotation, to specify the ResultSet fetchSize and to not break current uses of the API (client code that don't close the resulting Stream).

I would much appraciate some review on this work in progress, as it is my first Spring contribution.

Copy link
Contributor

@schauder schauder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look and noticed a couple of things.

The big question right now is where the main implementation should go.

There is spring-projects/spring-framework#18474 in Spring Framework which asks for Stream support.
And I think the right way to do this would be to add the Stream support there and then use it when the return type is Stream in the repository.

I asked the Spring team if they would appreciate a PR on that issue or if there are other reasons then lack of time why that issue is dormant.

/**
* The number of rows fetched from the database when more rows are needed
*/
int fetchSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate annotation.

It is Spring Data standard behaviour that you have to close Stream results.

And the fetchSize already can be configured globally via the JdbcTemplate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schauder thanks for the feedback.
If the standard behaviour is to close then I'm ok on not having a new annotation and all streams have from now on will be backed by a connected ResultSet/RowSet.
But in my view it is important to have a per query configurable fetch size, as this size depends on the use case.
Maybe then, move fetchSize as an optional parameter on @query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave the per query configuration of fetchSize to a separate issue. This way we can gather some information if it is actually needed.

* {@link java.sql.ResultSet} that will fetch rows as needed. It is responsibility of
* the client code to close all resources via {@link java.util.stream.Stream#close()}.
*
* @author detinho
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a full name not just a github handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


JdbcOpenSqlRowSet queryForOpenCursorRowSet(String sql, SqlParameterSource paramSource, Integer fetchSize) {
Assert.state(this.getJdbcTemplate().getDataSource() != null, "No DataSource set");
Assert.state(fetchSize != null, "No fetchSize set");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably no longer relevant due to previous comments, but if fetchSize must be not null it should simply be an int.

super(classicJdbcTemplate);
}

JdbcOpenSqlRowSet queryForOpenCursorRowSet(String sql, SqlParameterSource paramSource, Integer fetchSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this builds on RowSet and not on ResultSet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrapped a reference to the ResultSet and the corresponding DataSource so that it can be released later, when the stream is closed. I used the RowSet interface as this code can be used as a base for further work just as in the issue you linked above.

@detinho
Copy link
Contributor Author

detinho commented Nov 20, 2019

@schauder sorry for the delay.
I've made some changes based on you feedback.

@denniseffing
Copy link

@schauder @detinho Is this still being worked on? We just realized that Spring Data JDBC is not capable of streaming large result sets and we are encountering the first memory issues because of this.

@detinho
Copy link
Contributor Author

detinho commented May 7, 2020

@denniseffing I guess the blocker issue here is if we must add stream support on JdbcTemplate first or not.
As @schauder pointed out, there is a issue on spring-jdbc and I'd be glad to work on it if there aren't any blockers or even continue on this implementation if it still makes sense to do it here.

@denniseffing
Copy link

@detinho Thanks for the fast response. I commented on the linked issue as well. It would be awesome if Stream support would be added in the near future.

@schauder
Copy link
Contributor

The framework issue is fixed now. A PR with an implementation based on that would be highly appreciated.

@denniseffing
Copy link

@detinho Would you mind working on a PR for this? Take a look at the new queryForStream methods in JdbcTemplate. See spring-projects/spring-framework#18474 for context.

@detinho
Copy link
Contributor Author

detinho commented Sep 23, 2020

@denniseffing I'll take a look at it through the week.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 4, 2021
denniseffing pushed a commit to denniseffing/spring-data-jdbc that referenced this pull request Jan 7, 2021
Use queryForStream for streamed query results. ResultSetExtractor is ignored because it cannot be used together with streams.

Original pull request spring-projects#176
Closes spring-projects#356
denniseffing added a commit to denniseffing/spring-data-jdbc that referenced this pull request Jan 7, 2021
Use queryForStream for streamed query results. ResultSetExtractor is ignored because it cannot be used together with streams.

Original pull request spring-projects#176
Closes spring-projects#356
denniseffing added a commit to denniseffing/spring-data-jdbc that referenced this pull request Jan 7, 2021
Use queryForStream for streamed query results. ResultSetExtractor is ignored because it cannot be used together with streams.

Original pull request spring-projects#176
Closes spring-projects#356
@denniseffing
Copy link

I created a new PR (#903) because we couldn't wait for the change.

@schauder
Copy link
Contributor

This PR got superseded by #903 which very much looks like we might be able to merge it soon.

Thanks everybody, although id didn't got merged it helped move this feature forward.

@schauder schauder closed this Jan 15, 2021
denniseffing added a commit to denniseffing/spring-data-jdbc that referenced this pull request Jun 28, 2021
Use queryForStream for streamed query results. ResultSetExtractor is ignored because it cannot be used together with streams.

Original pull request spring-projects#176
Closes spring-projects#356
mp911de added a commit that referenced this pull request Feb 21, 2022
mp911de added a commit that referenced this pull request Feb 21, 2022
Remove all-dbs profile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants