Skip to content

[Backport 2.x] Refactor multipart download to a more async model #10373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

andrross
Copy link
Member

@andrross andrross commented Oct 4, 2023

Backports 28f185b from #10349

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…#10349)

* Refactor read context streams to async streams

Signed-off-by: Kunal Kotwani <[email protected]>

* Refactor multipart download to a more async model

The previous approach of kicking off the stream requests for all parts
of a file did not work well for very large files. For example, a 20GiB
file uploaded in 16MiB parts will consist of 1200+ parts. When we
attempted to initiate streaming for all parts concurrently, some parts
would hit a client timeout after 2 minutes without being able to get a
connection due to the other parts not having been completed in that time
frame. This refactoring adds yet another layer of indirection in order
to allow the code that is actually writing the destination file to
control the rate at which streams are started. This should allow for
downloading files consisting of arbitrarily many parts at any connection
speed.

This commit also wires in the download rate limiter so that the
`indices.recovery.max_bytes_per_sec` is properly honored.

Signed-off-by: Andrew Ross <[email protected]>

---------

Signed-off-by: Kunal Kotwani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Kunal Kotwani <[email protected]>
(cherry picked from commit 28f185b)
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

Compatibility status:

Checks if related components are compatible with change ff38621

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Oct 5, 2023

Codecov Report

Merging #10373 (ff38621) into 2.x (9c678ab) will increase coverage by 0.09%.
Report is 5 commits behind head on 2.x.
The diff coverage is 83.76%.

@@             Coverage Diff              @@
##                2.x   #10373      +/-   ##
============================================
+ Coverage     70.78%   70.87%   +0.09%     
- Complexity    58370    58392      +22     
============================================
  Files          4818     4816       -2     
  Lines        275947   275950       +3     
  Branches      40554    40559       +5     
============================================
+ Hits         195318   195580     +262     
+ Misses        63971    63732     -239     
+ Partials      16658    16638      -20     
Files Coverage Δ
...rg/opensearch/repositories/s3/S3BlobContainer.java 80.17% <100.00%> (+1.01%) ⬆️
...bstore/AsyncMultiStreamEncryptedBlobContainer.java 59.18% <100.00%> (+1.73%) ⬆️
...arch/common/blobstore/stream/read/ReadContext.java 100.00% <100.00%> (ø)
...rg/opensearch/common/settings/ClusterSettings.java 92.85% <ø> (ø)
...c/main/java/org/opensearch/index/IndexService.java 75.65% <100.00%> (+0.21%) ⬆️
...arch/index/remote/RemoteStorePressureSettings.java 80.64% <ø> (-6.46%) ⬇️
...java/org/opensearch/index/shard/StoreRecovery.java 57.25% <ø> (-0.23%) ⬇️
...earch/index/store/RemoteSegmentStoreDirectory.java 92.46% <100.00%> (+2.53%) ⬆️
...ndex/store/RemoteSegmentStoreDirectoryFactory.java 96.15% <100.00%> (+3.84%) ⬆️
...ices/replication/RemoteStoreReplicationSource.java 90.47% <100.00%> (ø)
... and 9 more

... and 476 files with indirect coverage changes

@andrross andrross merged commit fc7dc20 into opensearch-project:2.x Oct 5, 2023
@andrross andrross deleted the backport/backport-10349-to-2.x branch October 5, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants