Include short unregistered nodes in calculation of incorrect node group #5894

BigDarkClown · 2023-06-27T14:36:03Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

There are 2 mechanisms used to fix the incorrect node group size:

Unregistered nodes are removed after the default time of 15 minutes.
Node group is resized to the number of registered nodes if this number is smaller than the target size after the default time of 15 minutes.

Both these mechanisms use the same time threshold, so there can occur race conditions between them. For example in the following situation:

Node group with:

1 registered node
1 unregistered node
target size of 2

In this instance it is better to remove the unregistered node rather than resizing the node group. The issue with the second approach is that we don't have a guarantee which node will be removed - it can be either the unregistered one (desired) or the registered one (undesired).

To prevent this the second logic takes the nodes unregistered for > 15 minutes into account when deciding if it should resize the node group. It however does not take the nodes unready for < 15 minutes, which can result in the second mechanism being triggered.

This change takes all the unregistered nodes into account when deciding on the trigger condition for the second mechanism. This should prevent the race condition I've described here from happening.

Does this PR introduce a user-facing change?

Because of this change some cloud providers may trigger the "fixNodeGroupSize" 
logic more frequently in their clusters than they did before. This can happen if the 
scale-ups take a long time to create nodes and can result in the CA treating the 
node groups as incorrectly sized and scaling them down. This can be prevented 
by increasing the max node provisioning time parameter of Cluster Autoscaler.

You can check if your cluster is experiencing this issue by looking for the following log line:
"Decreasing size of <node group name>, expected=X current=Y delta=Z"

krzysied · 2023-06-27T15:26:19Z

/lgtm

BigDarkClown · 2023-06-27T15:27:01Z

/assign @MaciekPytel

MaciekPytel · 2023-06-28T12:21:41Z

cluster-autoscaler/clusterstate/clusterstate.go

@@ -512,7 +512,7 @@ func (csr *ClusterStateRegistry) updateAcceptableRanges(targetSize map[string]in
 		size := targetSize[nodeGroup.Id()]
 		readiness := csr.perNodeGroupReadiness[nodeGroup.Id()]
 		result[nodeGroup.Id()] = AcceptableRange{
-			MinNodes:      size - len(readiness.LongUnregistered),
+			MinNodes:      size - len(readiness.Unregistered) - len(readiness.LongUnregistered),


It makes sense to subtract Unregistered nodes here, given that we have a separate (and arguably better) mechanism for tracking them. I'm wondering though - doesn't that make us double-count scale-ups?
We subtract scaleUpRequest.Increase below and I'd imagine in most cases the unregistered nodes we're subtracting here come from scale-up requests.
Wouldn't it make sense to remove the scaleUpRequest part below? Sure, we will get out of acceptableRange for the window of time between resizing nodeGroup and instances being created, but having a timeout on instances being created seems like the purpose of this entire mechanism.

WDYT?

I guess it makes sense. The mechanism triggers only if the diff between the target size and current size is both outside of range and constant for 15 minutes. So we should not really be worried about scale-up requests, as the unregistered nodes should come and become registered.

The only worry I have is that it will trigger this mechanism more often than it already is. I'm not 100% sure if that is desirable for everyone, but I agree that it would be a more correct behaviour.

I've updated the change to not take scale-up requests into account. I've also removed the scale-down requests as they are not used anywhere and they are effectively a dead code.

Ok, I did another 180 turn.

I've left the scale-up / scale-down logic in calculating acceptable range. This is used by the IsNodeGroupHealthy method, and I don't think we want to change the behaviour there. I've removed the long unregistered nodes from here as they IMO should impact the node pool healthiness.

I however including the unregistered nodes when calculating the incorrect node group size. This is used only for correcting the node group size after 15 minutes, so we should be good here.

MaciekPytel · 2023-06-29T10:01:59Z

cluster-autoscaler/clusterstate/clusterstate.go

-		if len(readiness.Registered) > acceptableRange.MaxNodes ||
-			len(readiness.Registered) < acceptableRange.MinNodes {
+		unregisteredNodes := len(readiness.Unregistered) + len(readiness.LongUnregistered)
+		if len(readiness.Registered) > acceptableRange.CurrentTarget ||


Why current target and not MaxNodes? It seem like MaxNodes is an obvious fit here, there are no changes to how you calculate it and I don't see why we shouldn't offset for scale-downs that we're aware of (tbh I wouldn't implement the whole acceptableRanges logic if we were starting from scratch, but why not use it if it's already there?

The idea is that we don't want the incorrect node group size to depend on the ongoing scale-ups and scale-downs. We want to only guard on the expected size. Otherwise we will effectively double our max node provisioning period for this mechanism.

Also in all fairness - the mechanism for fixing situations when current size > expected size is not implemented yet. It's possible that it will require some slightly different edge conditions, but I think it's better to leave figuring them to the person implementing this mechanism.

sizes

MaciekPytel · 2023-06-29T12:05:46Z

/lgtm
/approve

k8s-ci-robot · 2023-06-29T12:05:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BigDarkClown, MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/cluster-autoscaler labels Jun 27, 2023

k8s-ci-robot requested review from feiskyer and x13n June 27, 2023 14:36

BigDarkClown force-pushed the fix-cs branch from 624d74c to 6542de0 Compare June 27, 2023 15:23

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 27, 2023

k8s-ci-robot assigned krzysied Jun 27, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 27, 2023

k8s-ci-robot assigned MaciekPytel Jun 27, 2023

MaciekPytel reviewed Jun 28, 2023

View reviewed changes

BigDarkClown force-pushed the fix-cs branch from 6542de0 to 8096cb8 Compare June 28, 2023 13:16

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 28, 2023

BigDarkClown force-pushed the fix-cs branch 2 times, most recently from ae0adbf to dc636ec Compare June 28, 2023 15:40

BigDarkClown changed the title ~~Include all unregistered nodes in node group acceptable range~~ Include short unregistered nodes in calculation of incorrect node group Jun 28, 2023

BigDarkClown force-pushed the fix-cs branch 2 times, most recently from f9238e5 to bf1f123 Compare June 28, 2023 15:46

MaciekPytel reviewed Jun 29, 2023

View reviewed changes

Include short unregistered nodes in calculation of incorrect node group

67d3e7e

sizes

BigDarkClown force-pushed the fix-cs branch from bf1f123 to 67d3e7e Compare June 29, 2023 10:29

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2023

k8s-ci-robot merged commit 973f9fd into kubernetes:master Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include short unregistered nodes in calculation of incorrect node group #5894

Include short unregistered nodes in calculation of incorrect node group #5894

Uh oh!

BigDarkClown commented Jun 27, 2023 •

edited

Loading

Uh oh!

krzysied commented Jun 27, 2023

Uh oh!

BigDarkClown commented Jun 27, 2023

Uh oh!

MaciekPytel Jun 28, 2023

Uh oh!

BigDarkClown Jun 28, 2023

Uh oh!

BigDarkClown Jun 28, 2023

Uh oh!

BigDarkClown Jun 28, 2023

Uh oh!

MaciekPytel Jun 29, 2023

Uh oh!

BigDarkClown Jun 29, 2023 •

edited

Loading

Uh oh!

MaciekPytel Jun 29, 2023

Uh oh!

MaciekPytel commented Jun 29, 2023

Uh oh!

k8s-ci-robot commented Jun 29, 2023

Uh oh!

Uh oh!

Include short unregistered nodes in calculation of incorrect node group #5894

Include short unregistered nodes in calculation of incorrect node group #5894

Uh oh!

Conversation

BigDarkClown commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Does this PR introduce a user-facing change?

Uh oh!

krzysied commented Jun 27, 2023

Uh oh!

BigDarkClown commented Jun 27, 2023

Uh oh!

MaciekPytel Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

MaciekPytel Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaciekPytel Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

MaciekPytel commented Jun 29, 2023

Uh oh!

k8s-ci-robot commented Jun 29, 2023

Uh oh!

Uh oh!

BigDarkClown commented Jun 27, 2023 •

edited

Loading

BigDarkClown Jun 29, 2023 •

edited

Loading