Drop some of the metrics exposed by prometheus-adapter #1409

dgrisonnet · 2021-09-29T11:13:47Z

Description

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.

The current implementation of prometheus-adapter exposes a lot of
metrics about the health of its aggregated apiserver. The issue is that
the some of these metrics are not very useful in the context of
prometheus-adapter, and we currently can't avoid exposing them since
they are registered to the Kubernetes global Prometheus registry. Until
this is improved in upstream Kubernetes, we could benefit from dropping
some of the metrics that are not very useful.

Before this change, in a default kube-prometheus installation, we would
have 800+ series for prometheus-adapter against 400+, so we divided the
number of series by two will focusing on the most valuable metrics for
prometheus-adapter.

Type of change

What type of changes does your code introduce to the kube-prometheus? Put an x in the box that apply.

CHANGE (fix or feature that would cause existing functionality to not work as expected)
FEATURE (non-breaking change which adds functionality)
BUGFIX (non-breaking change which fixes an issue)
ENHANCEMENT (non-breaking change which improves existing functionality)
NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Please put a one-line changelog entry below. Later this will be copied to the changelog file.

Drop some of prometheus-adapter metrics that are inherited from the apiserver code but aren't useful in the context of prometheus-adapter.

The current implementation of prometheus-adapter exposes a lot of metrics about the health of its aggregated apiserver. The issue is that the some of these metrics are not very useful in the context of prometheus-adapter, and we currently can't avoid exposing them since they are registered to the Kubernetes global Prometheus registry. Until this is improved in upstream Kubernetes, we could benefit from dropping some of the metrics that are not very useful. Before this change, in a default kube-prometheus installation, we would have 800+ series for prometheus-adapter against 400+, so we divided the number of series by two will focusing on the most valuable metrics for prometheus-adapter. Signed-off-by: Damien Grisonnet <[email protected]>

dgrisonnet · 2021-09-29T11:14:55Z

@fpetkovski @prashbnair can you please have a look at these changes and let me know if they make sense? Also if you have any other idea of metrics that we could drop let me know.

fpetkovski · 2021-09-29T11:30:44Z

I am not familiar with which metrics are exposed by prometheus-adapter, but would it make sense to make an allowlist instead of a denylist?

dgrisonnet · 2021-09-29T11:58:16Z

I don't think so since it would mean that we would have to think about updating the allowlist whenever we will add new health metrics in prometheus-adapter.

fpetkovski · 2021-09-29T12:23:43Z

Sounds good, then this lgtm

prashbnair · 2021-09-29T14:01:45Z

Do we need the ones related to authentication?

dgrisonnet · 2021-09-29T14:28:59Z

I never used them personally, but I thought that they might be useful if somehow there is an issue with the authentication since the authentication process of aggregated APIs is quite complex and these are the only metrics that we have to investigate. As far as I can tell, we don't have any intel from the apiserver itself since the requests are proxied. That's why I would be reluctant to remove them even though they are responsible for a big part of the series.

Maybe @s-urbaniak can chime in here since he has knowledge on both prometheus-adapter and kubernetes authentication.

Essentially the metrics that I think are worth keeping are:

But I am in no way an expert on that topic, so I don't really know if these metrics really make sense for an aggregated API.

prashbnair · 2021-09-29T15:26:29Z

lgtm
lets merge this and we can remove the others if needed.

s-urbaniak · 2021-09-30T12:21:49Z

/lgtm

dgrisonnet · 2021-09-30T15:45:12Z

Thanks everyone for the reviews.

dgrisonnet merged commit 374413f into prometheus-operator:main Sep 30, 2021

dgrisonnet deleted the drop-pa-metrics branch September 30, 2021 15:45

ArthurSens mentioned this pull request Oct 4, 2021

Add grafana ldap support #1145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop some of the metrics exposed by prometheus-adapter #1409

Drop some of the metrics exposed by prometheus-adapter #1409

dgrisonnet commented Sep 29, 2021

dgrisonnet commented Sep 29, 2021

fpetkovski commented Sep 29, 2021

dgrisonnet commented Sep 29, 2021

fpetkovski commented Sep 29, 2021

prashbnair commented Sep 29, 2021

dgrisonnet commented Sep 29, 2021 •

edited

Loading

prashbnair commented Sep 29, 2021 •

edited

Loading

s-urbaniak commented Sep 30, 2021

dgrisonnet commented Sep 30, 2021

Drop some of the metrics exposed by prometheus-adapter #1409

Drop some of the metrics exposed by prometheus-adapter #1409

Conversation

dgrisonnet commented Sep 29, 2021

Description

Type of change

Changelog entry

dgrisonnet commented Sep 29, 2021

fpetkovski commented Sep 29, 2021

dgrisonnet commented Sep 29, 2021

fpetkovski commented Sep 29, 2021

prashbnair commented Sep 29, 2021

dgrisonnet commented Sep 29, 2021 • edited Loading

prashbnair commented Sep 29, 2021 • edited Loading

s-urbaniak commented Sep 30, 2021

dgrisonnet commented Sep 30, 2021

dgrisonnet commented Sep 29, 2021 •

edited

Loading

prashbnair commented Sep 29, 2021 •

edited

Loading