Memory leak when repeating silences

**What did you do?**

I'm using Kthxbye. When an alert fires and I add a silence with Kthxbye, the memory usage of Alertmanager increases.

You can reproduce this without Kthxbye :

1/ generate an alert (or use any alert sent by Prometheus), for example `PrometheusNotIngestingSamples`.

2/ With Alertmanager, generate silences like this :

```
while true; do
  date # useless; just for tracing...
  amtool --alertmanager.url http://localhost:9093 silence add alertname=PrometheusNotIngestingSamples -a "MemoryLeakLover" -c "Test memory leak in Alertmanager" -d "1m"
  sleep 50
done
```

Note : The behaviour of Kthxbye is similar, but default config is 15 min instead of 1 min. However, with amtool you can see that Kthxbye has nothing to do with this bug.

**What did you expect to see?**

Nothing interesting (no abnormal memory increase)

**What did you see instead? Under which circumstances?**

Follow the metric `container_memory_working_set_bytes` for Alertmanager. After some hours you can see it slightly grow up.

Here is a screenshot of the above test, for a little more than 12 hours : test started at 12h20 and finished at 9h the day after.

![image](https://user-images.githubusercontent.com/67380650/127658457-270837b9-f52f-4caa-9d42-5dd0d620695d.png)

My Alertmanager is running with the default `--data.retention=120h`. I guessed that after 5 days it would stop increasing. Wrong guess : it stops increasing only at OOM and automatic kill.

![image](https://user-images.githubusercontent.com/67380650/127658713-a1301d8f-6aff-477b-9fa1-170e475315da.png)
The above graph was made with Kthxbye running. The pod restarts after an OOM (left side) or after a `kubectl delete pod` (right side).

**Environment**

* System information:

	Kubernetes (deployed with https://github.com/prometheus-community/helm-charts/tree/main/charts/alertmanager)

* Alertmanager version:

```
/alertmanager $ alertmanager --version
alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d)
  build user:       root@dee35927357f
  build date:       20200617-08:54:02
  go version:       go1.14.4
```


* Alertmanager configuration file:
```
/alertmanager $ cat /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
receivers:
- name: rocketchat
  webhook_configs:
  - send_resolved: true
    url: https://xxxx.rocketchat.xxxx/hooks/xxxxxx/xxxxxxxxx
route:
  group_by:
  - xxxxxxx
  - yyyyyyy
  - alertname
  group_interval: 5m
  group_wait: 30s
  receiver: rocketchat
  repeat_interval: 5m
  routes:
  - continue: true
    receiver: rocketchat
templates:
- /etc/alertmanager/*.tmpl

```

* Logs:
```
➜ k -n monitoring logs caascad-alertmanager-0 
level=info ts=2021-07-30T09:09:46.139Z caller=main.go:216 msg="Starting Alertmanager" version="(version=0.21.0, branch=HEAD, revision=4c6c03ebfe21009c546e4d1e9b92c371d67c021d)"
level=info ts=2021-07-30T09:09:46.139Z caller=main.go:217 build_context="(go=go1.14.4, user=root@dee35927357f, date=20200617-08:54:02)"
level=info ts=2021-07-30T09:09:46.171Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/alertmanager.yml
level=info ts=2021-07-30T09:09:46.171Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/alertmanager.yml
level=info ts=2021-07-30T09:09:46.174Z caller=main.go:485 msg=Listening address=:9093
level=warn ts=2021-07-30T12:29:49.530Z caller=notify.go:674 component=dispatcher receiver=rocketchat integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://xxxx.rocketchat.xxx/hooks/xxxxxx/xxxxxxxxx\": dial tcp x.x.x.x: connect: connection refused"
level=info ts=2021-07-30T12:32:17.213Z caller=notify.go:685 component=dispatcher receiver=rocketchat integration=webhook[0] msg="Notify success" attempts=13
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak when repeating silences #2659

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak when repeating silences #2659

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions