CA scale-up delays on clusters with heavy scaling activity

**Which component are you using?**:
cluster-autoscaler

**What version of the component are you using?**:
Component version: v1.26.1, though we've seen the same behavior in versions 1.24.1 and 1.27.1

**What k8s version are you using (`kubectl version`)?**:
1.24

**What environment is this in?**:
AWS, using kops

**What did you expect to happen?**:
On k8s clusters with heavy scaling activity, we would expect CA to be able to scale-up in a timely manner to clear unschedulable pending pods.

**What happened instead?**:
There are times when we need CA to process up to 3k+ pending (unschedulable) pods and have seen significant delays in processing, sometimes up to 15 minutes before CA gets through the list and scales up nodes. We have several deployments that scale up and down by hundreds of pods often in the cluster.

During this time frame, looking at CA metrics, we noticed significantly increased latency generally but more so in the scale-up function as seen below (in seconds):
![Screenshot 2023-05-15 at 12 05 34](https://github.com/kubernetes/autoscaler/assets/17205367/138e01fd-4558-4bb5-93a8-34a5ae5163b0)

Below is a screenshot showing the delay in scale-up time. As mentioned above, you can see we peaked above 3k unschedulable pods with a lack of scaling activity during these periods. We suspect CA is struggling to churn through the list.
![Screenshot 2023-05-16 at 3 28 42 PM](https://github.com/kubernetes/autoscaler/assets/17205367/19951a6a-cc14-40c9-88f6-9c6af441635a)

**Anything else we need to know?**:

We do not seem to be hitting pod or node level resource limits; not OOM'ing and pods/nodes are not approaching limits in general. We use node selectors for assignment. We also are not being rate-limited on the cloud provider side from what we can tell, there's just a delay before CA attempts to update the ASGs. Per the defined [SLO](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-service-level-objectives-for-cluster-autoscaler), we are expecting no more than 60s for CA to scale-up on large clusters like ours.

We looked into running multiple replicas to help us churn through the list but by default there can only be one leader, I can't find any documentation about how well it works to run multiple replicas in parallel; under the impression that's not recommended.

Alternatively, we looked into running multiple instances of CA in a single cluster, focused on separate workloads/resources based on pod labels. I don't believe this is supported an any version of CA at this point? 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA scale-up delays on clusters with heavy scaling activity #5769

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CA scale-up delays on clusters with heavy scaling activity #5769

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions