-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scale disruption cost by the node utilization #2028
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cnmcavoy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cc: @rschalo |
@rschalo Has been thinking on this problem -- might be worth having a discussion over Slack or in a GH issue to discuss how y'all are both thinking about this |
Would love to chat via Slack @cnmcavoy |
0374e0d
to
9c0d57a
Compare
Pull Request Test Coverage Report for Build 13597579341Details
💛 - Coveralls |
Signed-off-by: Cameron McAvoy <[email protected]>
9c0d57a
to
6a8062d
Compare
Fixes #N/A
Description
Scales the disruption cost of nodes by their utilization of pod resources. A node with 1 pod and 99% utilization should have a higher disruption cost than a node with 1 pod and 10% utilization.
How was this change tested?
We've been looking at improving utilization of resources in our clusters, and noticed that Karpenter tends to prefer consolidating underutilized nodes with fewer pods rather than nodes with the most wasted resources. It seems like this can happen because the
disruptionutils.ReschedulingCost(ctx, pods)
produces a value that is roughly equivalent to the pod count (scaled by pod priority class). So nodes with fewer pods and higher utilization will be candidates for consolidation, while underutilized nodes with many smaller pods are not.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.