Skip to content

Fix bugs in silences #3892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

grobinson-grafana
Copy link
Collaborator

This commit fixes a number of bugs that can occur when creating and updating silences:

  1. When creating a silence fails because the silence is invalid or would exceed limits, subsequent calls for the same silence fail even when the validation or limits are fixed. This happens because the silence is assigned an ID that does not exist. This commit fixes that issue as it assigns a new ID to silences that do not exist.

  2. When updating an existing silence, and the update changes how alerts are matched or the time window in which the silence is active (excluding time extensions), then the existing silence must be expired and a new silence created. However, if the new silence is invalid or would exceed limits, the existing silence is expired, but the new silence is not created. This commit fixes that issue such that existing silences are not expired unless the replacing silence is both valid and within limits.

Fixes #3877

@@ -518,14 +518,10 @@ func matchesEmpty(m *pb.Matcher) bool {
}

func validateSilence(s *pb.Silence) error {
if s.Id == "" {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed checks for the Id and UpdatedAt fields from validateSilence. There are two reasons for this:

  1. I have moved validation of silences to the start of the function before the Id and UpdatedAt fields are assigned. This is to fix the issue where an existing silence is expired but the new silence is not created.
  2. These fields are supposed to be set in silences.Set(*pb.Silence), not by the calling code. We assert that these are present within tests, not validation code that covers both internal and external fields.

s := *sil
return &s
// Expire the silence with the given ID immediately.
func (s *Silences) Expire(id string) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes here, just been moved up in the file.

msil, ok := s.st[id]
if !ok {
return nil, false
// Set inserts a new silence or updates an existing silence. It inserts a
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes to existing behavior here, this is just documentation.

// time extensions), the existing silence is expired and replaced with a
// new silence with a different ID.
//
// When updating an existing silence, the existing silence must be cloned
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again no changes to existing behavior here, this is just documentation. The requirement that silences are cloned before being updated was always there (see silence_test.go for examples of this). What I have done here is documented the fact and added an explicit Clone() function to deep copy a silence.

// When updating an existing silence, the existing silence must be cloned
// using Clone() before making any modifications to the silence. This
// prevents situations where the stored silence is modified because both
// pointers reference the same memory. If Clone() is not used before
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone() is required otherwise functions like canUpdate break. Again, no changes here, this is just documentation.

sil3 := cloneSilence(sil2)
sil3.EndsAt = start3.Add(100 * time.Minute)

// Can update silence without modifications.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an extra test case here.

require.NoError(t, s.Set(sil3))
require.Equal(t, sil2.Id, sil3.Id)

// Can update silence with new comment.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another test case here.

require.NoError(t, s.Set(sil4))
require.Equal(t, sil3.Id, sil4.Id)

// Extend sil4 to expire at a later time. This should not expire the
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are existing test cases, but the variable names have been updated since I added two new cases above.

@@ -479,23 +502,18 @@ func TestSilenceLimits(t *testing.T) {
StartsAt: time.Now(),
EndsAt: time.Now().Add(5 * time.Minute),
}
require.EqualError(t, s.Set(sil2), "exceeded maximum number of silences: 1 (limit: 1)")
require.Equal(t, "", sil2.Id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check the id here, removed.

require.NotEqual(t, "", sil3.Id)
require.EqualError(t, s.Set(sil3), fmt.Sprintf("exceeds limits: maximum size: %d bytes (limit: 4096 bytes)", s.toMeshSilence(sil3).Size()))

// Should be able to insert sil4 and then update it without modifications.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional test cases to show that the fixes work.

This commit fixes a number of bugs that can occur when creating
and updating silences:

1. When creating a silence fails because the silence is invalid
   or would exceed limits, subsequent calls for the same silence
   fail even when the validation or limits are fixed. This happens
   because the silence is assigned an ID that does not exist. This
   commit fixes that issue as it assigns a new ID to silences that
   do not exist.

2. When updating an existing silence, and the update changes
   how alerts are matched or the time window in which the silence
   is active (excluding time extensions), then the existing silence
   must be expired and a new silence created. However, if the new
   silence is invalid or would exceed limits, the existing silence
   is expired, but the new silence is not created. This commit fixes
   that issue such that existing silences are not expired unless the
   replacing silence is both valid and within limits.

Signed-off-by: George Robinson <[email protected]>
@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-silences-set branch from 026afc4 to 1e18844 Compare June 22, 2024 14:52
This commit updates the TestPostSilencesHandler acceptance tests
to expect 200 when making a POST request for a silence that does
not exist. In this case, a new silence is created with a different
ID.

Signed-off-by: George Robinson <[email protected]>
@@ -240,11 +240,13 @@ func TestPostSilencesHandler(t *testing.T) {
expectedCode int
}{
{
"with an non-existent silence ID - it returns 404",
// This can happen when updating an expired silence that has been GC'd.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are new semantics for silences, so please review with care. If considered problematic then I will reconsider how silences.Set works for missing IDs.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-silences-set branch from 51305d9 to 9f4a9a7 Compare June 22, 2024 17:04
@grobinson-grafana grobinson-grafana marked this pull request as draft June 24, 2024 12:40
@grobinson-grafana grobinson-grafana changed the title Issue #3878: Fix bugs in silences Fix bugs in silences Jun 24, 2024
@grobinson-grafana grobinson-grafana deleted the grobinson/fix-silences-set branch June 25, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant