Patches based implementation for DRA snapshot. #8090

mtrqq · 2025-05-05T11:44:41Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR enhances the performance of DRA snapshot which directly impacts scheduling simulations speed and cluster-autoscaler decision making process overall. In this PR we are changing the approach for its state management from deep copies based into patches based one.

Which issue(s) this PR fixes:

Fixes #7681

Special notes for your reviewer:

This PR removes the original implementation for dynamicresources.Snapshot and replaces it with the patches based approach, while we can keep the original implementation for the sake of safety and ability to switch store implementation in the running cluster autoscaler, but it would require maintaining two implementations, I've attempted to use clone-based Snapshot as a baseline for the new changes, but it only resulted in complex code while yielding minimal benefits.

In the change you may find a benchmark test which uses exagerrated scheduling scenario to test the performance of two implementations, what I've found that patches based option is roughly 50x times faster in terms of overall runtime while allocating 40x less memory on the heap primarily because Fork/Commit/Revert operations are used A LOT in the suite

Here's a few profiling insights in differences:

CPU Profile / Forking

CPU Profile / GC

Memory Profile / Allocated Space

Memory Profile / Allocated Objects

Grab a copy of profiling samples -> Profiles.zip

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-05-05T11:44:43Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

mtrqq · 2025-05-05T11:46:32Z

/assign towca

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

jackfrancis · 2025-05-09T18:07:35Z

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

+			merged[key] = value
+		}
+
+		for key := range patch.Deleted {


Is this meant to address map access race conditions between:

enumerate through the set of patches and begin copying into a new, merged map

one or more patches in the set begin a delete operation

func (p *patch[K, V]) Delete(key K) { p.Deleted[key] = true delete(p.Modified, key) }

ensure that our source patch wasn't copied from a state in between the 1st of the above 2 statements by double-checking the same key against any corresponding existence in the Deleted map

?

(If so have we considered the tradeoffs of inverting the order of operations in the Delete() method?)

I would say that Snapshot is not thread-safe in general, but that's a fair call, I'll move the order of operations in Delete() method so that it'll become consistent with other data manipulation functions. In case we need it to be truly suitable for concurrent usage (and as I see based on simulations code - we don't) - we'll probably need to use sync.Map to actually store the data, in current implementation we are using bare maps just the sake of performance gain

The reason why we account for Deleted keys here - is to handle following situations:

Prerequisite: Type in the example -> PatchSet[int, int]

Patch#1: Modified: {1: 1, 2: 2, 3: 3}, Deleted: {}
Patch#2: Modified: {}, Deleted: {1, 2}
Patch#3: Modified: {1: 5}, Deleted: {}

The result of the AsMap() call on the PatchSet holding these 3 patches should be: {1: 5, 3: 3}, because keys 1 and 2 are getting deleted in the second patch, but key 1 is getting reintroduced in Patch#3

If I misunderstood your comment - LMK

Yeah, we're aligned, do we have a UT case for that example scenario?

I'm not sure I get the difference between the order of operations in Delete(), we're not planning for any concurrency here for now, right?

(I do get the need for iterating over Deleted and deleting from merged here, but not sure how it relates to Jack's comment 😅)

The reason to invert the order in Delete() (actually delete first, take an accounting of that deletion second) is so that we don't have to double-check that the data is really deleted before composing the merge from the data.

I agree with @towca's other comment that this is a great addition, and was really fun to review. I did have to try really hard to find something to criticize! 😀😂

At least at this point of time I don't see Snapshot being used concurrently anywhere - and if it will be, I think DRA Snapshot would be the only blocker there, so let's keep it as is at least for now, WDYT?

I agree with @towca's other comment that this is a great addition, and was really fun to review. I did have to try really hard to find something to criticize! 😀😂

I forgot the last time I had to implement something that technical and fun so it's coming from both sides :P

At least at this point of time I don't see Snapshot being used concurrently anywhere - and if it will be, I think DRA Snapshot would be the only blocker there, so let's keep it as is at least for now, WDYT?

Yup, the current version looks good to me! I was just curious about the comment.

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

towca

Reviewed everything but the tests (I'll do that in a second pass). In general the PR looks very good, the only bigger comments are about the caching strategy.

Two general comments:

The patchSet is a very elegant solution, I'm a big fan! (also yay generics, this would be a nightmare otherwise..)
(for future reference) Breaking the PR down into smaller, meaningful commits would have seriously helped the review here. For example a structure like: change Snapshot to be a pointer (this just pollutes the diff) -> introduce patch&patchset -> add caching to patchset -> use patchset in Snapshot -> use the new dra.Snapshot in ClusterSnapshot implementations -> add benchmark.

cluster-autoscaler/simulator/dynamicresources/snapshot/snapshot.go

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

towca · 2025-05-15T17:52:19Z

cluster-autoscaler/simulator/dynamicresources/snapshot/snapshot_class_lister.go

@@ -22,18 +22,16 @@ import (
 	resourceapi "k8s.io/api/resource/v1beta1"
 )

-type snapshotClassLister Snapshot
+type snapshotClassLister struct {


Why change this (here and for the other subtypes)? IMO the previous pattern was more readable - no need for redundant methods on Snapshot.

I don't have a specific preference tbh, but first iteration implementation was way harder in terms of interaction with patchSet and I wanted to keep all the interaction with it inside the Snapshot itself, while this introduces some duplication - it makes me thinking about these wrapper objects as just wrappers without any logic apart of snapshot function calls.

If you want - I may change it back, right now it doesn't make a lot of difference from my point of view

I don't have a specific preference tbh, but first iteration implementation was way harder in terms of interaction with patchSet

Interesting, I'm really curious why?

If you want - I may change it back, right now it doesn't make a lot of difference from my point of view

I think right now it only introduces some code duplication for no clear reason. I'd prefer to revert back to the previous approach unless it's less extensible for the future or something (I still don't understand the limitations here).

Interesting, I'm really curious why?

As I've mentioned previously - first iteration was based on the ability for each operation to have an operation that would cancel the effect of the first one, essentially if you have Add1 function on the API surface - you should have Remove1 available at least internally, we dropped this idea because of the unoptimized behavior of Revert operation which was truly O(N) and way worse compared to what's implemented right now.

The rationale behind bringing all the logic into the Snapshot - hiding the complexity behind human readable functions as each operation essentially looked like this

func DoSomething() { return applyPatch({ Apply: func() { doSomethingImpl() } Revert: func() {undoSomethingImpl()} }) }

And I wanted to hide this complexity within the snapshot itself and not bring it into wrapper objects so that they will simply call internal handlers specifically made available for them, so that was the idea

cluster-autoscaler/simulator/dynamicresources/snapshot/test_utils.go

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

cluster-autoscaler/simulator/clustersnapshot/predicate/predicate_snapshot_dra_benchmark_test.go

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset_test.go

towca · 2025-06-03T14:04:01Z

@mtrqq Thanks for addressing my comments! I see the following remaining items:

Revert Snapshot subtypes to their previous form (unless there is a good reason to keep the new one).
Stop mutating the patchSet in InCurrentPatch.
Add caching-specific unit tests for patchSet.
Rework the new DRA snapshot benchmark.

We're planning to cut 1.33 CA on 2025-06-11 and this is the most important change we want to get in. Points 3. and 4. are test-specific and IMO we can address them in a follow-up PR after the release. Points 1. and 2. should be fairly quick to address so that we can merge this PR before the cut, right?

jackfrancis · 2025-06-06T17:45:55Z

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset.go

+
+// Set marks a key-value pair as modified in the current patch.
+// If the key was previously marked as deleted, the deletion mark is removed.
+func (p *patch[K, V]) Set(key K, value V) {


Right now It doesn't serve any interface fulfilment, instead I'm using this notation to separate public and private API surfaces of the structs, in case of lower case methods - it means that those are not supposed to be used to interact with an object (by external entity), while Set|Delete|Get|IsDeleted are more explicitly telling that they are supposed to be used, wdyt?

I don't want to slow down progress but in my view until "external entity" actually means "external" entity I think it's preferable to keep the methods private to express the fact that they are currently implemented as implementation details underneath the publicly facing DRA business logic.

If we anticipate reusability we can put it into a common utils directory and import it?

cc @towca for thoughts as well

cluster-autoscaler/simulator/dynamicresources/snapshot/patchset_test.go

k8s-ci-robot · 2025-06-09T14:45:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mtrqq
Once this PR has been reviewed and has the lgtm label, please ask for approval from towca. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-06-09T14:58:13Z

@mtrqq: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-e2e-autoscaling-vpa-full	`74e0b71`	link	false	`/test pull-kubernetes-e2e-autoscaling-vpa-full`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Instead of exposing DeepCloning API - dynamicresources.Snapshot now exposes Fork/Commit/Revert methods mimicking operations in ClusterSnapshot. Now instead of storing full-scale copies of DRA snapshot we are only storing a single object with patches list stored inside, which allows very effective Clone/Fork/Revert operations while sacrificing some performance and memory allocation while doing fetch requests and inplace objects modifications (ResourceClaims).

k8s-ci-robot requested review from feiskyer and vadasambar May 5, 2025 11:44

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 5, 2025

mtrqq marked this pull request as ready for review May 5, 2025 11:46

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2025

k8s-ci-robot requested a review from x13n May 5, 2025 11:46

k8s-ci-robot assigned towca May 5, 2025

mtrqq force-pushed the dra-shapshot-patch branch from 40597ae to d04e764 Compare May 5, 2025 12:55

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025

jackfrancis reviewed May 9, 2025

View reviewed changes