Skip to content

CA scale-up delays on clusters with heavy scaling activity #5769

Closed as not planned
@benfain

Description

@benfain

Which component are you using?:
cluster-autoscaler

What version of the component are you using?:
Component version: v1.26.1, though we've seen the same behavior in versions 1.24.1 and 1.27.1

What k8s version are you using (kubectl version)?:
1.24

What environment is this in?:
AWS, using kops

What did you expect to happen?:
On k8s clusters with heavy scaling activity, we would expect CA to be able to scale-up in a timely manner to clear unschedulable pending pods.

What happened instead?:
There are times when we need CA to process up to 3k+ pending (unschedulable) pods and have seen significant delays in processing, sometimes up to 15 minutes before CA gets through the list and scales up nodes. We have several deployments that scale up and down by hundreds of pods often in the cluster.

During this time frame, looking at CA metrics, we noticed significantly increased latency generally but more so in the scale-up function as seen below (in seconds):
Screenshot 2023-05-15 at 12 05 34

Below is a screenshot showing the delay in scale-up time. As mentioned above, you can see we peaked above 3k unschedulable pods with a lack of scaling activity during these periods. We suspect CA is struggling to churn through the list.
Screenshot 2023-05-16 at 3 28 42 PM

Anything else we need to know?:

We do not seem to be hitting pod or node level resource limits; not OOM'ing and pods/nodes are not approaching limits in general. We use node selectors for assignment. We also are not being rate-limited on the cloud provider side from what we can tell, there's just a delay before CA attempts to update the ASGs. Per the defined SLO, we are expecting no more than 60s for CA to scale-up on large clusters like ours.

We looked into running multiple replicas to help us churn through the list but by default there can only be one leader, I can't find any documentation about how well it works to run multiple replicas in parallel; under the impression that's not recommended.

Alternatively, we looked into running multiple instances of CA in a single cluster, focused on separate workloads/resources based on pod labels. I don't believe this is supported an any version of CA at this point?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/cluster-autoscalerarea/core-autoscalerDenotes an issue that is related to the core autoscaler and is not specific to any provider.kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions