Skip to content

Add task cancellation check in aggregation code paths #18426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kaushalmahi12
Copy link
Contributor

@kaushalmahi12 kaushalmahi12 commented Jun 3, 2025

Description

This change adds cancellation checks for nested and bucket aggregations. This change will solve the long running cancelled queries stuck in aggregation code paths both at shard and co-ordinator level.

This change takes care of cancellation of request during query phase where the InternalAggregations are built but it is not present in fetch phase where we transform these into final results.

We will create a separate PR for introducing cancellation checks in fetch phase as a follow up to this change.

Related Issues

Resolves #15413

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
@kaushalmahi12
Copy link
Contributor Author

{"run-benchmark-test": "id_1"}

Copy link
Contributor

github-actions bot commented Jun 3, 2025

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3327/ . Final results will be published once the job is completed.

Signed-off-by: Kaushal Kumar <[email protected]>
Copy link
Contributor

github-actions bot commented Jun 3, 2025

✅ Gradle check result for a5e1364: SUCCESS

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

Attention: Patch coverage is 80.76923% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.66%. Comparing base (d52cefa) to head (f072cdc).
Report is 43 commits behind head on main.

Files with missing lines Patch % Lines
...opensearch/search/aggregations/AggregatorBase.java 33.33% 1 Missing and 1 partial ⚠️
...arch/search/aggregations/support/ValuesSource.java 0.00% 2 Missing ⚠️
...ns/bucket/adjacency/AdjacencyMatrixAggregator.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18426      +/-   ##
============================================
- Coverage     72.74%   72.66%   -0.08%     
+ Complexity    67767    67686      -81     
============================================
  Files          5497     5499       +2     
  Lines        311815   311890      +75     
  Branches      45261    45273      +12     
============================================
- Hits         226822   226630     -192     
- Misses        66504    66727     +223     
- Partials      18489    18533      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3327/

Metric Task Value Unit
Cumulative indexing time of primary shards 208.777 min
Min cumulative indexing time across primary shards 208.777 min
Median cumulative indexing time across primary shards 208.777 min
Max cumulative indexing time across primary shards 208.777 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 118.391 min
Cumulative merge count of primary shards 66
Min cumulative merge time across primary shards 118.391 min
Median cumulative merge time across primary shards 118.391 min
Max cumulative merge time across primary shards 118.391 min
Cumulative merge throttle time of primary shards 30.4328 min
Min cumulative merge throttle time across primary shards 30.4328 min
Median cumulative merge throttle time across primary shards 30.4328 min
Max cumulative merge throttle time across primary shards 30.4328 min
Cumulative refresh time of primary shards 13.388 min
Cumulative refresh count of primary shards 133
Min cumulative refresh time across primary shards 13.388 min
Median cumulative refresh time across primary shards 13.388 min
Max cumulative refresh time across primary shards 13.388 min
Cumulative flush time of primary shards 4.66532 min
Cumulative flush count of primary shards 34
Min cumulative flush time across primary shards 4.66532 min
Median cumulative flush time across primary shards 4.66532 min
Max cumulative flush time across primary shards 4.66532 min
Total Young Gen GC time 12.298 s
Total Young Gen GC count 303
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 23.5298 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 17
Min Throughput index 48440 docs/s
Mean Throughput index 50532.8 docs/s
Median Throughput index 50063.9 docs/s
Max Throughput index 53598.7 docs/s
50th percentile latency index 1434.17 ms
90th percentile latency index 2022.74 ms
99th percentile latency index 6240.98 ms
99.9th percentile latency index 12558 ms
99.99th percentile latency index 15359.1 ms
100th percentile latency index 16546 ms
50th percentile service time index 1433.91 ms
90th percentile service time index 2022.98 ms
99th percentile service time index 6228.7 ms
99.9th percentile service time index 12558 ms
99.99th percentile service time index 15359.1 ms
100th percentile service time index 16546 ms
error rate index 0.01 %
Min Throughput wait-until-merges-finish 0 ops/s
Mean Throughput wait-until-merges-finish 0 ops/s
Median Throughput wait-until-merges-finish 0 ops/s
Max Throughput wait-until-merges-finish 0 ops/s
100th percentile latency wait-until-merges-finish 442219 ms
100th percentile service time wait-until-merges-finish 442219 ms
error rate wait-until-merges-finish 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/107/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 217.087 208.777 -8.30955 min
Min cumulative indexing time across primary shard 217.087 208.777 -8.30955 min
Median cumulative indexing time across primary shard 217.087 208.777 -8.30955 min
Max cumulative indexing time across primary shard 217.087 208.777 -8.30955 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 111.769 118.391 6.6224 min
Cumulative merge count of primary shards 67 66 -1
Min cumulative merge time across primary shard 111.769 118.391 6.6224 min
Median cumulative merge time across primary shard 111.769 118.391 6.6224 min
Max cumulative merge time across primary shard 111.769 118.391 6.6224 min
Cumulative merge throttle time of primary shards 30.4407 30.4328 -0.00783 min
Min cumulative merge throttle time across primary shard 30.4407 30.4328 -0.00783 min
Median cumulative merge throttle time across primary shard 30.4407 30.4328 -0.00783 min
Max cumulative merge throttle time across primary shard 30.4407 30.4328 -0.00783 min
Cumulative refresh time of primary shards 13.5861 13.388 -0.19807 min
Cumulative refresh count of primary shards 133 133 0
Min cumulative refresh time across primary shard 13.5861 13.388 -0.19807 min
Median cumulative refresh time across primary shard 13.5861 13.388 -0.19807 min
Max cumulative refresh time across primary shard 13.5861 13.388 -0.19807 min
Cumulative flush time of primary shards 4.6486 4.66532 0.01672 min
Cumulative flush count of primary shards 34 34 0
Min cumulative flush time across primary shard 4.6486 4.66532 0.01672 min
Median cumulative flush time across primary shard 4.6486 4.66532 0.01672 min
Max cumulative flush time across primary shard 4.6486 4.66532 0.01672 min
Total Young Gen GC time 12.114 12.298 0.184 s
Total Young Gen GC count 295 303 8
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 28.4441 23.5298 -4.91429 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 35 17 -18
Min Throughput index 46562 48440 1878 docs/s
Mean Throughput index 48698.7 50532.8 1834.05 docs/s
Median Throughput index 48342.2 50063.9 1721.65 docs/s
Max Throughput index 51620.9 53598.7 1977.8 docs/s
50th percentile latency index 1491.02 1434.17 -56.8479 ms
90th percentile latency index 2125.17 2022.74 -102.435 ms
99th percentile latency index 6685.69 6240.98 -444.709 ms
99.9th percentile latency index 12291.4 12558 266.537 ms
99.99th percentile latency index 15154 15359.1 205.091 ms
100th percentile latency index 16396.4 16546 149.639 ms
50th percentile service time index 1491.17 1433.91 -57.2631 ms
90th percentile service time index 2124.53 2022.98 -101.555 ms
99th percentile service time index 6694.71 6228.7 -466.006 ms
99.9th percentile service time index 12291.4 12558 266.537 ms
99.99th percentile service time index 15154 15359.1 205.091 ms
100th percentile service time index 16396.4 16546 149.639 ms
error rate index 0.00653339 0.00655093 2e-05 %
Min Throughput wait-until-merges-finish 0.00314923 0.00226132 -0.00089 ops/s
Mean Throughput wait-until-merges-finish 0.00314923 0.00226132 -0.00089 ops/s
Median Throughput wait-until-merges-finish 0.00314923 0.00226132 -0.00089 ops/s
Max Throughput wait-until-merges-finish 0.00314923 0.00226132 -0.00089 ops/s
100th percentile latency wait-until-merges-finish 317538 442219 124682 ms
100th percentile service time wait-until-merges-finish 317538 442219 124682 ms
error rate wait-until-merges-finish 0 0 0 %

@kaushalmahi12
Copy link
Contributor Author

{"run-benchmark-test": "id_3"}

Copy link
Contributor

github-actions bot commented Jun 4, 2025

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3328/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3328/

Metric Task Value Unit
Cumulative indexing time of primary shards 0 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 0 min
Max cumulative indexing time across primary shards 0 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0 min
Cumulative refresh count of primary shards 2
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 0 min
Max cumulative refresh time across primary shards 0 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 1
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0.22 s
Total Young Gen GC count 6
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 21.4085 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 23
Min Throughput wait-for-snapshot-recovery 4.18302e+07 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18302e+07 byte/s
Median Throughput wait-for-snapshot-recovery 4.18302e+07 byte/s
Max Throughput wait-for-snapshot-recovery 4.18302e+07 byte/s
100th percentile latency wait-for-snapshot-recovery 544184 ms
100th percentile service time wait-for-snapshot-recovery 544184 ms
error rate wait-for-snapshot-recovery 0 %
Min Throughput default 3.01 ops/s
Mean Throughput default 3.02 ops/s
Median Throughput default 3.02 ops/s
Max Throughput default 3.03 ops/s
50th percentile latency default 6.76413 ms
90th percentile latency default 7.48825 ms
99th percentile latency default 8.1333 ms
100th percentile latency default 8.36666 ms
50th percentile service time default 5.69877 ms
90th percentile service time default 6.2942 ms
99th percentile service time default 7.10432 ms
100th percentile service time default 7.16483 ms
error rate default 0 %
Min Throughput range 0.7 ops/s
Mean Throughput range 0.71 ops/s
Median Throughput range 0.71 ops/s
Max Throughput range 0.71 ops/s
50th percentile latency range 9.64608 ms
90th percentile latency range 10.5386 ms
99th percentile latency range 11.9888 ms
100th percentile latency range 12.1524 ms
50th percentile service time range 7.41071 ms
90th percentile service time range 8.21645 ms
99th percentile service time range 9.90116 ms
100th percentile service time range 10.1236 ms
error rate range 0 %
Min Throughput distance_amount_agg 0.15 ops/s
Mean Throughput distance_amount_agg 0.15 ops/s
Median Throughput distance_amount_agg 0.15 ops/s
Max Throughput distance_amount_agg 0.16 ops/s
50th percentile latency distance_amount_agg 599603 ms
90th percentile latency distance_amount_agg 844548 ms
99th percentile latency distance_amount_agg 900211 ms
100th percentile latency distance_amount_agg 903253 ms
50th percentile service time distance_amount_agg 6566.06 ms
90th percentile service time distance_amount_agg 6728.37 ms
99th percentile service time distance_amount_agg 7003.14 ms
100th percentile service time distance_amount_agg 7081.7 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 1.51 ops/s
Mean Throughput autohisto_agg 1.51 ops/s
Median Throughput autohisto_agg 1.51 ops/s
Max Throughput autohisto_agg 1.53 ops/s
50th percentile latency autohisto_agg 7.94265 ms
90th percentile latency autohisto_agg 8.81307 ms
99th percentile latency autohisto_agg 10.3047 ms
100th percentile latency autohisto_agg 10.9396 ms
50th percentile service time autohisto_agg 6.55163 ms
90th percentile service time autohisto_agg 7.26998 ms
99th percentile service time autohisto_agg 9.27742 ms
100th percentile service time autohisto_agg 9.86933 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 1.51 ops/s
Mean Throughput date_histogram_agg 1.52 ops/s
Median Throughput date_histogram_agg 1.51 ops/s
Max Throughput date_histogram_agg 1.53 ops/s
50th percentile latency date_histogram_agg 7.16127 ms
90th percentile latency date_histogram_agg 7.92216 ms
99th percentile latency date_histogram_agg 9.09706 ms
100th percentile latency date_histogram_agg 9.73304 ms
50th percentile service time date_histogram_agg 5.57716 ms
90th percentile service time date_histogram_agg 6.3307 ms
99th percentile service time date_histogram_agg 7.72926 ms
100th percentile service time date_histogram_agg 8.52015 ms
error rate date_histogram_agg 0 %
Min Throughput desc_sort_tip_amount 0.5 ops/s
Mean Throughput desc_sort_tip_amount 0.5 ops/s
Median Throughput desc_sort_tip_amount 0.5 ops/s
Max Throughput desc_sort_tip_amount 0.51 ops/s
50th percentile latency desc_sort_tip_amount 26.8988 ms
90th percentile latency desc_sort_tip_amount 28.1579 ms
99th percentile latency desc_sort_tip_amount 31.1042 ms
100th percentile latency desc_sort_tip_amount 32.4823 ms
50th percentile service time desc_sort_tip_amount 24.0065 ms
90th percentile service time desc_sort_tip_amount 25.1313 ms
99th percentile service time desc_sort_tip_amount 29.3283 ms
100th percentile service time desc_sort_tip_amount 31.7227 ms
error rate desc_sort_tip_amount 0 %
Min Throughput asc_sort_tip_amount 0.5 ops/s
Mean Throughput asc_sort_tip_amount 0.5 ops/s
Median Throughput asc_sort_tip_amount 0.5 ops/s
Max Throughput asc_sort_tip_amount 0.51 ops/s
50th percentile latency asc_sort_tip_amount 8.7278 ms
90th percentile latency asc_sort_tip_amount 9.2651 ms
99th percentile latency asc_sort_tip_amount 10.3334 ms
100th percentile latency asc_sort_tip_amount 10.5945 ms
50th percentile service time asc_sort_tip_amount 6.00683 ms
90th percentile service time asc_sort_tip_amount 6.24363 ms
99th percentile service time asc_sort_tip_amount 7.56655 ms
100th percentile service time asc_sort_tip_amount 7.63475 ms
error rate asc_sort_tip_amount 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/108/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 0 0 0 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 0 0 0 min
Max cumulative indexing time across primary shard 0 0 0 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 0 0 0 min
Cumulative merge count of primary shards 0 0 0
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 0 0 0 min
Max cumulative merge time across primary shard 0 0 0 min
Cumulative merge throttle time of primary shards 0 0 0 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0 0 0 min
Max cumulative merge throttle time across primary shard 0 0 0 min
Cumulative refresh time of primary shards 0 0 0 min
Cumulative refresh count of primary shards 2 2 0
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0 0 0 min
Max cumulative refresh time across primary shard 0 0 0 min
Cumulative flush time of primary shards 0 0 0 min
Cumulative flush count of primary shards 1 1 0
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0 0 0 min
Max cumulative flush time across primary shard 0 0 0 min
Total Young Gen GC time 0.214 0.22 0.006 s
Total Young Gen GC count 6 6 0
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 21.4085 21.4085 0 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 23 23 0
Min Throughput wait-for-snapshot-recovery 4.18729e+07 4.18302e+07 -42672 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18729e+07 4.18302e+07 -42672 byte/s
Median Throughput wait-for-snapshot-recovery 4.18729e+07 4.18302e+07 -42672 byte/s
Max Throughput wait-for-snapshot-recovery 4.18729e+07 4.18302e+07 -42672 byte/s
100th percentile latency wait-for-snapshot-recovery 543367 544184 817.438 ms
100th percentile service time wait-for-snapshot-recovery 543367 544184 817.438 ms
error rate wait-for-snapshot-recovery 0 0 0 %
Min Throughput default 3.01226 3.01179 -0.00046 ops/s
Mean Throughput default 3.01996 3.01923 -0.00074 ops/s
Median Throughput default 3.01821 3.01752 -0.00068 ops/s
Max Throughput default 3.03521 3.03393 -0.00129 ops/s
50th percentile latency default 7.17748 6.76413 -0.41335 ms
90th percentile latency default 7.70853 7.48825 -0.22028 ms
99th percentile latency default 9.08683 8.1333 -0.95353 ms
100th percentile latency default 9.68735 8.36666 -1.32069 ms
50th percentile service time default 6.15225 5.69877 -0.45348 ms
90th percentile service time default 6.67898 6.2942 -0.38478 ms
99th percentile service time default 7.90528 7.10432 -0.80095 ms
100th percentile service time default 8.4661 7.16483 -1.30127 ms
error rate default 0 0 0 %
Min Throughput range 0.704381 0.704372 -1e-05 ops/s
Mean Throughput range 0.707211 0.707196 -2e-05 ops/s
Median Throughput range 0.706559 0.706545 -1e-05 ops/s
Max Throughput range 0.71304 0.713013 -3e-05 ops/s
50th percentile latency range 9.91947 9.64608 -0.27338 ms
90th percentile latency range 10.6624 10.5386 -0.12377 ms
99th percentile latency range 11.7983 11.9888 0.19052 ms
100th percentile latency range 11.8295 12.1524 0.32282 ms
50th percentile service time range 7.72585 7.41071 -0.31514 ms
90th percentile service time range 8.45635 8.21645 -0.2399 ms
99th percentile service time range 10.0886 9.90116 -0.18747 ms
100th percentile service time range 10.2619 10.1236 -0.13826 ms
error rate range 0 0 0 %
Min Throughput distance_amount_agg 0.148655 0.153394 0.00474 ops/s
Mean Throughput distance_amount_agg 0.149028 0.154687 0.00566 ops/s
Median Throughput distance_amount_agg 0.149093 0.154754 0.00566 ops/s
Max Throughput distance_amount_agg 0.1492 0.155605 0.0064 ops/s
50th percentile latency distance_amount_agg 624150 599603 -24547 ms
90th percentile latency distance_amount_agg 871986 844548 -27437.7 ms
99th percentile latency distance_amount_agg 927631 900211 -27419.3 ms
100th percentile latency distance_amount_agg 930689 903253 -27435.8 ms
50th percentile service time distance_amount_agg 6683.27 6566.06 -117.207 ms
90th percentile service time distance_amount_agg 6771.3 6728.37 -42.9346 ms
99th percentile service time distance_amount_agg 6857.12 7003.14 146.018 ms
100th percentile service time distance_amount_agg 6858.45 7081.7 223.248 ms
error rate distance_amount_agg 0 0 0 %
Min Throughput autohisto_agg 1.50906 1.50884 -0.00022 ops/s
Mean Throughput autohisto_agg 1.51496 1.51461 -0.00035 ops/s
Median Throughput autohisto_agg 1.51362 1.51331 -0.00031 ops/s
Max Throughput autohisto_agg 1.52696 1.52628 -0.00067 ops/s
50th percentile latency autohisto_agg 7.3429 7.94265 0.59975 ms
90th percentile latency autohisto_agg 8.0327 8.81307 0.78037 ms
99th percentile latency autohisto_agg 9.97927 10.3047 0.32543 ms
100th percentile latency autohisto_agg 10.3378 10.9396 0.60175 ms
50th percentile service time autohisto_agg 6.09449 6.55163 0.45714 ms
90th percentile service time autohisto_agg 6.61933 7.26998 0.65065 ms
99th percentile service time autohisto_agg 8.87414 9.27742 0.40327 ms
100th percentile service time autohisto_agg 9.25852 9.86933 0.61081 ms
error rate autohisto_agg 0 0 0 %
Min Throughput date_histogram_agg 1.50974 1.50975 1e-05 ops/s
Mean Throughput date_histogram_agg 1.51612 1.51612 -0 ops/s
Median Throughput date_histogram_agg 1.51466 1.51467 1e-05 ops/s
Max Throughput date_histogram_agg 1.52905 1.52902 -3e-05 ops/s
50th percentile latency date_histogram_agg 7.38085 7.16127 -0.21958 ms
90th percentile latency date_histogram_agg 8.19219 7.92216 -0.27003 ms
99th percentile latency date_histogram_agg 8.60965 9.09706 0.48741 ms
100th percentile latency date_histogram_agg 8.61518 9.73304 1.11787 ms
50th percentile service time date_histogram_agg 5.92396 5.57716 -0.3468 ms
90th percentile service time date_histogram_agg 6.67439 6.3307 -0.34369 ms
99th percentile service time date_histogram_agg 7.00048 7.72926 0.72878 ms
100th percentile service time date_histogram_agg 7.04021 8.52015 1.47994 ms
error rate date_histogram_agg 0 0 0 %
Min Throughput desc_sort_tip_amount 0.50257 0.502555 -1e-05 ops/s
Mean Throughput desc_sort_tip_amount 0.504229 0.504204 -3e-05 ops/s
Median Throughput desc_sort_tip_amount 0.503847 0.503825 -2e-05 ops/s
Max Throughput desc_sort_tip_amount 0.507639 0.507591 -5e-05 ops/s
50th percentile latency desc_sort_tip_amount 26.9325 26.8988 -0.03373 ms
90th percentile latency desc_sort_tip_amount 27.6999 28.1579 0.45802 ms
99th percentile latency desc_sort_tip_amount 29.0442 31.1042 2.06004 ms
100th percentile latency desc_sort_tip_amount 29.3819 32.4823 3.10039 ms
50th percentile service time desc_sort_tip_amount 24.3774 24.0065 -0.37089 ms
90th percentile service time desc_sort_tip_amount 24.817 25.1313 0.31432 ms
99th percentile service time desc_sort_tip_amount 27.2457 29.3283 2.08258 ms
100th percentile service time desc_sort_tip_amount 27.4886 31.7227 4.23415 ms
error rate desc_sort_tip_amount 0 0 0 %
Min Throughput asc_sort_tip_amount 0.503056 0.503019 -4e-05 ops/s
Mean Throughput asc_sort_tip_amount 0.505032 0.504971 -6e-05 ops/s
Median Throughput asc_sort_tip_amount 0.504577 0.504521 -6e-05 ops/s
Max Throughput asc_sort_tip_amount 0.509096 0.508988 -0.00011 ops/s
50th percentile latency asc_sort_tip_amount 8.53637 8.7278 0.19143 ms
90th percentile latency asc_sort_tip_amount 9.02361 9.2651 0.2415 ms
99th percentile latency asc_sort_tip_amount 9.77252 10.3334 0.56086 ms
100th percentile latency asc_sort_tip_amount 10.2152 10.5945 0.37929 ms
50th percentile service time asc_sort_tip_amount 5.74805 6.00683 0.25878 ms
90th percentile service time asc_sort_tip_amount 5.97087 6.24363 0.27276 ms
99th percentile service time asc_sort_tip_amount 7.20754 7.56655 0.359 ms
100th percentile service time asc_sort_tip_amount 7.81044 7.63475 -0.17569 ms
error rate asc_sort_tip_amount 0 0 0 %

Copy link
Contributor

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaushalmahi12, had a few small comments. For completeness, could you also share an example of what a search request looks like if it gets cancelled in the Aggregation phase? Just want to make sure there's no difference from the user perspective if the search gets cancelled in the query phase or the aggregation phase.

@kaushalmahi12
Copy link
Contributor Author

kaushalmahi12 commented Jun 11, 2025

Thanks @jed326 for reviewing this change

could you also share an example of what a search request looks like if it gets cancelled in the Aggregation phase?

I think you mean request here

In current change it will look like following

{
  "error" : {
    "root_cause" : [
      {
        "type" : "rejected_execution_exception",
        "reason" : "The query has been cancelled"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "nyc_taxis",
        "node" : "PileqDQFSWSPuJmgPQ1chQ",
        "reason" : {
          "type" : "rejected_execution_exception",
          "reason" : "The query has been cancelled",
          "suppressed" : [
            {
              "type" : "rejected_execution_exception",
              "reason" : "The query has been cancelled"
            },
            {
              "type" : "rejected_execution_exception",
              "reason" : "The query has been cancelled"
            }
          ]
        }
      }
    ],
    "caused_by" : {
      "type" : "rejected_execution_exception",
      "reason" : "The query has been cancelled",
      "caused_by" : {
        "type" : "rejected_execution_exception",
        "reason" : "The query has been cancelled",
        "suppressed" : [
          {
            "type" : "rejected_execution_exception",
            "reason" : "The query has been cancelled"
          },
          {
            "type" : "rejected_execution_exception",
            "reason" : "The query has been cancelled"
          }
        ]
      }
    }
  },
  "status" : 429
}

But if we make the Exception type to task cancellation then it might change slightly

Note that this request was fired in a single node single shard index env

Copy link
Contributor

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes! Only other thing I can think of is if we want to track the aggs that don't support cancellation today. And how we want to inform plugin developers that they can add cancellation support like so.

Copy link
Contributor

❌ Gradle check result for 4e0f814: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <[email protected]>
@kaushalmahi12 kaushalmahi12 force-pushed the task_cancellation_check_2 branch from 4e0f814 to f072cdc Compare June 11, 2025 20:04
Copy link
Contributor

✅ Gradle check result for f072cdc: SUCCESS

@jainankitk jainankitk merged commit 3078649 into opensearch-project:main Jun 11, 2025
30 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 12, 2025
---------

Signed-off-by: Kaushal Kumar <[email protected]>
(cherry picked from commit 3078649)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 17, 2025
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Deeply nested aggregations are not terminable by any mechanism and cause Out of Memory errors in data nodes.
4 participants