Skip to content

STAR-385 Retry cluster stop after exception stopping 'gently' #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2021

Conversation

djatnieks
Copy link

STAR-385 Some tests leave nodes in a state where they cannot be shutdown using "gently=True", which is requested when running with Jacoco so that the jvm exits normally and the jacoco agent can write coverage results. However, if stopping the node fails, not only are jacoco results not produced, but the test is marked as Failed (when it otherwise passed).

This change will retry a cluster stop request made with "gently=True" using "gently=False", as would have been requested without jacoco. This will allow the test to complete as passing. Jacoco coverage will not be available, but then, it wasn't going to be anyway.

Copy link

@tlasica tlasica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change if for any reason some other tests will stop closing gently we will not notice it as test failures, potentially missing a bug.

I would rather explicitly stop by kill in this particular test case at the end of the test.
Assuming that C* devs confirm that this is expected behavior after STOP policy is executed.

Ideal approach would be to add some decorator of cluster flag so that ccm itself knows that in this case it should not try to stop gently (and save some test duration).

def cleanup_cluster(self, request=None):
with log_filter('cassandra'): # quiet noise from driver when nodes start going down
test_failed = request and hasattr(request.node, 'rep_call') and request.node.rep_call.failed
if self.dtest_config.keep_test_dir or (self.dtest_config.keep_failed_test_dir and test_failed):
self.cluster.stop(gently=self.dtest_config.enable_jacoco_code_coverage)
self.stop_cluster(gently=self.dtest_config.enable_jacoco_code_coverage)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: looking at this logic - unless I am not fully awake - I think it can be simplified:

self.stop_cluster(gently=self.dtest_config.enable_jacoco_code_coverage)
preserve_logs = self.dtest_config.keep_test_dir or (self.dtest_config.keep_failed_test_dir and test_failed)
if not preserve_logs:
    # Cleanup everything

but I understand we do not want to introduce changes vs apache

Copy link
Author

@djatnieks djatnieks Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change if for any reason some other tests will stop closing gently we will not notice it as test failures, potentially missing a bug.

Actually, I don't think that would happen. The only time that stop(gently) is ever used, is for jacoco code coverage; in all other cases it is stop(kill). If stop(gently) was the normal case, this might be true, but stop(kill) is the norm.

I would rather explicitly stop by kill in this particular test case at the end of the test.
Assuming that C* devs confirm that this is expected behavior after STOP policy is executed.

Ideal approach would be to add some decorator of cluster flag so that ccm itself knows that in this case it should not try to stop gently (and save some test duration).

To save some test duration by avoiding a stop(gently) that is expected to fail, maybe we can annotate this test (and others) so that cleanup_cluster can use that to override the gentle request for jacoco.

I suspected there are other tests besides test_stop_failure_policy and test_stop_commit_failure_policy failing for the same reason with jacoco and it seemed the current approach has the advantage of working for those as well. But maybe a mechanism to tag such tests is better in making the expectation clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: looking at this logic - unless I am not fully awake - I think it can be simplified:

I noticed it too, and agree it can be simplified to be more easily read/understood.

but I understand we do not want to introduce changes vs apache

I am always debating this question with myself; in this case, I am wondering if this issue is affecting OSS also and therefore the fix should be ported back? In that case, I would be in favor of making this logic simpler.

Copy link

@tlasica tlasica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If default was non-gently without the jacoco-cov I am ok with current fix.

@djatnieks djatnieks merged commit f6fc3d3 into ds-trunk Jul 2, 2021
@djatnieks djatnieks deleted the STAR-385-retry-cluster-stop branch July 2, 2021 20:49
jacek-lewandowski pushed a commit that referenced this pull request Aug 25, 2021
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
jacek-lewandowski pushed a commit that referenced this pull request Feb 23, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
jacek-lewandowski pushed a commit that referenced this pull request Mar 9, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
jacek-lewandowski pushed a commit that referenced this pull request May 24, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
jacek-lewandowski pushed a commit that referenced this pull request May 27, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
jacek-lewandowski pushed a commit that referenced this pull request Oct 11, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
jacek-lewandowski pushed a commit that referenced this pull request Oct 18, 2022
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
djatnieks added a commit that referenced this pull request Aug 21, 2023
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
djatnieks added a commit that referenced this pull request Feb 21, 2024
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
jacek-lewandowski pushed a commit that referenced this pull request Jul 24, 2024
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
djatnieks added a commit that referenced this pull request Jan 29, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
michaelsembwever pushed a commit that referenced this pull request Apr 30, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
michaelsembwever pushed a commit that referenced this pull request May 6, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
michaelsembwever pushed a commit that referenced this pull request May 6, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 6, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 7, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 9, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 11, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 12, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 15, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)
michaelsembwever pushed a commit that referenced this pull request May 18, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
michaelsembwever pushed a commit that referenced this pull request May 20, 2025
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status.

This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing.

(cherry picked from commit f6fc3d3)
(cherry picked from commit aff9d6e)
(cherry picked from commit fe27230)
(cherry picked from commit f931d43)
(cherry picked from commit 88c022c)
(cherry picked from commit 56b4680)

rebase notes:
will be upstreamed in CASSANDRA-20627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants