Skip to content

Introduce payload size metrics #6745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2025

Conversation

tubignat
Copy link
Contributor

What changed?

Added payload size metrics per operation

Why?

To track DB response sizes per operation

@@ -67,6 +68,209 @@ func (r GetAllHistoryTreeBranchesResponse) Len() int {
return len(r.Branches)
}

// For responses that require metrics for payload size EstimatePayloadSizeInBytes() int should be defined.

func (r *GetReplicationTasksResponse) EstimatePayloadSizeInBytes() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @timl3136 is introducing a Size interface for this that is probably worth adopting

}

total := int(unsafe.Sizeof(*r)) + len(r.NextPageToken)
for _, v := range r.HistoryEventBlobs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: this will be big

}

binariesSize := 0
for key, value := range info.BadBinaries.Binaries {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I don't think anything domain related will be very big, but the BadBinaries can be huge for the executions table (for concrete / Type 1 executions).

"time"
)

func TestGetReplicationTaskResponseEstimatePayloadSize(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is pretty hard to test for because a lot of the structs can get pretty deep, and it's pretty easy to miss a nil check.

I think there's two tools maybe worth taking a look at:

  1. There's a nil-checker lint (@Groxx probably is better to speak to that) but might be helpful to use on these kinds of code to avoid random surprise panics in prod.
  2. Generating data (including nils) can be done with a fuzzer library we have, (in common/testing/testdatagen - it's some google lib but we added support for a few types which rely on enums being valid) which will create fuzzed / filled out structs (With optional nils) to ensure that the payload estimation can be called safely. I'd really suggest using it to generate some test data like this:
assert.NotPanics(t, func() {
    for i := 0; i < 100; i++ {
		fuzzer := testdatagen.NewWithNilChance(t, seed, 25)

		execution := &persistence.WorkflowMutableState{}
		fuzzer.Fuzz(&execution)

                 _ = execution.GetPayloadSize()
    }
})

@@ -244,6 +268,44 @@ var emptyCountedMethods = map[string]struct {
},
}

var payloadSizeEmittingMethods = map[string]struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark / feel free to ignore, I'm just wondering aloud:

I kind of wish we didn't need to maintain such a map, but I feel like there's an opportunity for List/Get/Start wrapper structs to adhere to an interface like MetricName() (Scope int, MetricName string) in which they're capable of describing their scope themselves, rather than having them in one big centralized thing.

@@ -67,6 +68,209 @@ func (r GetAllHistoryTreeBranchesResponse) Len() int {
return len(r.Branches)
}

// For responses that require metrics for payload size EstimatePayloadSizeInBytes() int should be defined.

func (r *GetReplicationTasksResponse) EstimatePayloadSizeInBytes() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename these functions to Size() uint64 and implement it for each response type and underlying types. This would be consistent with new size-base cache interface

@tubignat tubignat force-pushed the payload-size-metrics branch from 69d4126 to 1380306 Compare March 24, 2025 17:14
Copy link
Member

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the update, talked to @timl3136 , I think he's changing the interface to be ByteSize also. I have no real opinion, but that sounds good.

@tubignat tubignat merged commit b227fdf into cadence-workflow:master Mar 25, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants