-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Tiered Cache] Using a single cache manager for all ehcache disk caches #17513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tiered Cache] Using a single cache manager for all ehcache disk caches #17513
Conversation
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar <[email protected]>
❌ Gradle check result for 69bbc69: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar <[email protected]>
...ins/cache-ehcache/src/main/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManager.java
Outdated
Show resolved
Hide resolved
...ins/cache-ehcache/src/main/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManager.java
Show resolved
Hide resolved
...ins/cache-ehcache/src/main/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManager.java
Outdated
Show resolved
Hide resolved
plugins/cache-ehcache/src/main/java/org/opensearch/cache/EhcacheDiskCacheSettings.java
Outdated
Show resolved
Hide resolved
I am really curious if we have observed any cases via hot_threads / flamegraph that confirms disk write threads being responsible for CPU spikes. These threads should be I/O bound, and I won't really expect them to cause observable CPU spike.
The default of 2 looks really low to me. Assuming |
Not yet. We don't have a performance test which is able to reproduce this scenario. We ran our OSB benchmark with/without changes, and both were pretty similar in terms of performance(latency p50, p90 etc)
We can discuss on the default and increase it further. But main objective of this change is to have a way to increase/decrease the number of disk write threads when needed irrespective of the number of N partitions we are creating within tiered cache. Right now, each disk cache object have its own write thread pool, and when we create N(CPU * 1.5) segments/disk cache object, we are essentially creating (N * 4) disk write threads which seems unnecessary and cause unknown problems, and it is not possible this configure to <=(CPU*1.5). |
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
❌ Gradle check result for 112a96b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
❌ Gradle check result for cc0fea8: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17513 +/- ##
============================================
+ Coverage 72.39% 72.41% +0.01%
- Complexity 66066 66235 +169
============================================
Files 5358 5385 +27
Lines 306500 307200 +700
Branches 44409 44560 +151
============================================
+ Hits 221888 222447 +559
- Misses 66474 66543 +69
- Partials 18138 18210 +72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
plugins/cache-ehcache/src/main/java/org/opensearch/cache/EhcacheDiskCacheSettings.java
Outdated
Show resolved
Hide resolved
plugins/cache-ehcache/src/main/java/org/opensearch/cache/EhcacheDiskCacheSettings.java
Outdated
Show resolved
Hide resolved
...ins/cache-ehcache/src/main/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sagar Upadhyaya <[email protected]>
❕ Gradle check result for 13b2e85: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17513-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 58eb44e7ece913aca6de34d32f6b837a512541ae
# Push it to GitHub
git push --set-upstream origin backport/backport-17513-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…es (opensearch-project#17513) * Using a single cache manager for all ehcache disk caches Signed-off-by: Sagar Upadhyaya <[email protected]> * Added changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Fixing cache manager UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing comments Signed-off-by: Sagar Upadhyaya <[email protected]> * Removing commented out code Signed-off-by: Sagar Upadhyaya <[email protected]> * Adding changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to perform mutable changes for cache manager under a lock Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to fix UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing minor comments Signed-off-by: Sagar Upadhyaya <[email protected]> --------- Signed-off-by: Sagar Upadhyaya <[email protected]> Signed-off-by: Sagar <[email protected]> Signed-off-by: Sriram Ganesh <[email protected]>
…es (opensearch-project#17513) * Using a single cache manager for all ehcache disk caches Signed-off-by: Sagar Upadhyaya <[email protected]> * Added changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Fixing cache manager UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing comments Signed-off-by: Sagar Upadhyaya <[email protected]> * Removing commented out code Signed-off-by: Sagar Upadhyaya <[email protected]> * Adding changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to perform mutable changes for cache manager under a lock Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to fix UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing minor comments Signed-off-by: Sagar Upadhyaya <[email protected]> --------- Signed-off-by: Sagar Upadhyaya <[email protected]> Signed-off-by: Sagar <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
…es (opensearch-project#17513) * Using a single cache manager for all ehcache disk caches Signed-off-by: Sagar Upadhyaya <[email protected]> * Added changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Fixing cache manager UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing comments Signed-off-by: Sagar Upadhyaya <[email protected]> * Removing commented out code Signed-off-by: Sagar Upadhyaya <[email protected]> * Adding changelog Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to perform mutable changes for cache manager under a lock Signed-off-by: Sagar Upadhyaya <[email protected]> * Changes to fix UT Signed-off-by: Sagar Upadhyaya <[email protected]> * Addressing minor comments Signed-off-by: Sagar Upadhyaya <[email protected]> --------- Signed-off-by: Sagar Upadhyaya <[email protected]> Signed-off-by: Sagar <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
Description
Earlier while trying to create N ehcache disk caches, we were creating those via N cache managers which had their own disk write thread pools (so total N). We create N disk caches based on tiered cache setting and it is decided based on number of CPU cores. So essentially we were creating
(CPU_CORE * 4)
disk write threads which is a lot and can cause CPU spikes with tiered cache enabled.This change essentially creates a single cache manager, and all subsequent caches are created via this single manager. Through this we only have one disk write thread pool and is configured to have between 2 and CPU * 1.5 threads.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.