Skip to content

Search Replica Allocation and Recovery #17457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 21, 2025

Conversation

vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Feb 25, 2025

Description

Related Issues

Resolves #17422
Resolves #17421
Resolves #17334
Related to #15306

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Search:Performance labels Feb 25, 2025
Copy link
Contributor

❌ Gradle check result for f9c54c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 2d5b977: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Feb 26, 2025

Codecov Report

Attention: Patch coverage is 94.02985% with 4 lines in your changes missing coverage. Please review.

Project coverage is 72.34%. Comparing base (dcad6b8) to head (79a8da2).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/opensearch/index/shard/StoreRecovery.java 92.30% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17457      +/-   ##
============================================
- Coverage     72.46%   72.34%   -0.12%     
+ Complexity    65757    65717      -40     
============================================
  Files          5311     5311              
  Lines        305001   305052      +51     
  Branches      44230    44237       +7     
============================================
- Hits         221022   220696     -326     
- Misses        65932    66280     +348     
- Partials      18047    18076      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vinaykpud vinaykpud reopened this Mar 19, 2025
Copy link
Contributor

❌ Gradle check result for 1ecdbc3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Mar 19, 2025
@vinaykpud vinaykpud reopened this Mar 19, 2025
Copy link
Contributor

❌ Gradle check result for 1ecdbc3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Mar 19, 2025
@vinaykpud vinaykpud reopened this Mar 19, 2025
Copy link
Contributor

❌ Gradle check result for 1ecdbc3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for d197bd0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for d197bd0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Mar 19, 2025
@vinaykpud vinaykpud reopened this Mar 19, 2025
Copy link
Contributor

❕ Gradle check result for d197bd0: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❕ Gradle check result for 79a8da2: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mch2 mch2 merged commit 1acba95 into opensearch-project:main Mar 21, 2025
34 of 35 checks passed
bzhangam pushed a commit to bzhangam/OpenSearch that referenced this pull request Mar 25, 2025
* Restrict Search Replicas to Allocate only to Search dedicated node

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed the javadoc

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed tests

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Treat Regular and Search Replicas Separately to Prevent Allocation Blocking

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated tests and some refactor

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fixed SearchReplica recovery scenario for same node and new node

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated the logic for SearchReplica recovery scenario for new node

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fixed nits after self review

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Modified the search replica allocation based on node attribute

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed PR comments

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Revert "Fixed SearchReplica recovery scenario for same node and new node"

This reverts commit de1e719.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Separated the recovery flow method for search replica

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Revert "fixed PR comments"

This reverts commit 8fe8dcf.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added unit tests in IndexShardTests

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* updated method name and minor refactor

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Removed search replica recovery logic from internalRecoverFromStore method

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added integ test to cover search node restart scenario

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Applied search node role in tests and removed searchonly attribute

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fixed failing test

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Removed unwanted comment

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Address PR comments

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Enhancement or improvement to existing feature or request Search:Performance skip-changelog
Projects
None yet
3 participants