-
Notifications
You must be signed in to change notification settings - Fork 3
STAR-385 Retry cluster stop after exception stopping 'gently' #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change if for any reason some other tests will stop closing gently we will not notice it as test failures, potentially missing a bug.
I would rather explicitly stop by kill in this particular test case at the end of the test.
Assuming that C* devs confirm that this is expected behavior after STOP policy is executed.
Ideal approach would be to add some decorator of cluster
flag so that ccm
itself knows that in this case it should not try to stop gently (and save some test duration).
def cleanup_cluster(self, request=None): | ||
with log_filter('cassandra'): # quiet noise from driver when nodes start going down | ||
test_failed = request and hasattr(request.node, 'rep_call') and request.node.rep_call.failed | ||
if self.dtest_config.keep_test_dir or (self.dtest_config.keep_failed_test_dir and test_failed): | ||
self.cluster.stop(gently=self.dtest_config.enable_jacoco_code_coverage) | ||
self.stop_cluster(gently=self.dtest_config.enable_jacoco_code_coverage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: looking at this logic - unless I am not fully awake - I think it can be simplified:
self.stop_cluster(gently=self.dtest_config.enable_jacoco_code_coverage)
preserve_logs = self.dtest_config.keep_test_dir or (self.dtest_config.keep_failed_test_dir and test_failed)
if not preserve_logs:
# Cleanup everything
but I understand we do not want to introduce changes vs apache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change if for any reason some other tests will stop closing gently we will not notice it as test failures, potentially missing a bug.
Actually, I don't think that would happen. The only time that stop(gently) is ever used, is for jacoco code coverage; in all other cases it is stop(kill). If stop(gently) was the normal case, this might be true, but stop(kill) is the norm.
I would rather explicitly stop by kill in this particular test case at the end of the test.
Assuming that C* devs confirm that this is expected behavior after STOP policy is executed.Ideal approach would be to add some decorator of
cluster
flag so thatccm
itself knows that in this case it should not try to stop gently (and save some test duration).
To save some test duration by avoiding a stop(gently) that is expected to fail, maybe we can annotate this test (and others) so that cleanup_cluster can use that to override the gentle request for jacoco.
I suspected there are other tests besides test_stop_failure_policy
and test_stop_commit_failure_policy
failing for the same reason with jacoco and it seemed the current approach has the advantage of working for those as well. But maybe a mechanism to tag such tests is better in making the expectation clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: looking at this logic - unless I am not fully awake - I think it can be simplified:
I noticed it too, and agree it can be simplified to be more easily read/understood.
but I understand we do not want to introduce changes vs apache
I am always debating this question with myself; in this case, I am wondering if this issue is affecting OSS also and therefore the fix should be ported back? In that case, I would be in favor of making this logic simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If default was non-gently without the jacoco-cov I am ok with current fix.
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680)
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
Cluster stop requests in cleanup_cluster are made with "gently=True" when Jacoco code coverage is enabled to allow the jacoco agent to record results; however, some tests leave nodes in a state where this type of shutdown does not succeed, resulting in the test being marked failed regardless of it's true completion status. This change will retry these stop requests with "gently=False" so that the test completion status will not be altered due to shutdown not completing. (cherry picked from commit f6fc3d3) (cherry picked from commit aff9d6e) (cherry picked from commit fe27230) (cherry picked from commit f931d43) (cherry picked from commit 88c022c) (cherry picked from commit 56b4680) rebase notes: will be upstreamed in CASSANDRA-20627
STAR-385 Some tests leave nodes in a state where they cannot be shutdown using "gently=True", which is requested when running with Jacoco so that the jvm exits normally and the jacoco agent can write coverage results. However, if stopping the node fails, not only are jacoco results not produced, but the test is marked as Failed (when it otherwise passed).
This change will retry a cluster stop request made with "gently=True" using "gently=False", as would have been requested without jacoco. This will allow the test to complete as passing. Jacoco coverage will not be available, but then, it wasn't going to be anyway.