Description
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
1.21.3
Component version:
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-03-01T02:23:41Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.17-eks-48e63af", GitCommit:"47b89ea2caa1f7958bc6539d6865820c86b4bf60", GitTreeState:"clean", BuildDate:"2023-01-24T09:34:06Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?:
aws, EKS
What did you expect to happen?:
unneeded since would continue to increase until the node either is removed or becomes needed
What happened instead?:
unneeded since dropped to 0 for many nodes at once (but not all) even though they were never determined to be needed per the logs, causing the --scale-down-unneeded-time
timer to be reset.
How to reproduce it (as minimally and precisely as possible):
I'm unsure what causes this, but I know we have a fairly high churn rate on our cluster, around 300 nodes, mostly default settings with CA, and I did see a Watch
on replicasets close in the loop that this happened in.. if that matters.
Anything else we need to know?:
I'm happy to answer more questions, but I'm unsure what else to put here. The logs are far too verbose to copy in entirety, but I'll say, this is the piece of code I'm at that I think might possibly be lying.
// Update stores nodes along with a time at which they were found to be
// unneeded. Previously existing timestamps are preserved.