-
Notifications
You must be signed in to change notification settings - Fork 2k
Drop some of the metrics exposed by prometheus-adapter #1409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop some of the metrics exposed by prometheus-adapter #1409
Conversation
The current implementation of prometheus-adapter exposes a lot of metrics about the health of its aggregated apiserver. The issue is that the some of these metrics are not very useful in the context of prometheus-adapter, and we currently can't avoid exposing them since they are registered to the Kubernetes global Prometheus registry. Until this is improved in upstream Kubernetes, we could benefit from dropping some of the metrics that are not very useful. Before this change, in a default kube-prometheus installation, we would have 800+ series for prometheus-adapter against 400+, so we divided the number of series by two will focusing on the most valuable metrics for prometheus-adapter. Signed-off-by: Damien Grisonnet <[email protected]>
@fpetkovski @prashbnair can you please have a look at these changes and let me know if they make sense? Also if you have any other idea of metrics that we could drop let me know. |
I am not familiar with which metrics are exposed by prometheus-adapter, but would it make sense to make an allowlist instead of a denylist? |
I don't think so since it would mean that we would have to think about updating the allowlist whenever we will add new health metrics in prometheus-adapter. |
Sounds good, then this lgtm |
Do we need the ones related to authentication? |
I never used them personally, but I thought that they might be useful if somehow there is an issue with the authentication since the authentication process of aggregated APIs is quite complex and these are the only metrics that we have to investigate. As far as I can tell, we don't have any intel from the apiserver itself since the requests are proxied. That's why I would be reluctant to remove them even though they are responsible for a big part of the series. Maybe @s-urbaniak can chime in here since he has knowledge on both prometheus-adapter and kubernetes authentication. Essentially the metrics that I think are worth keeping are:
But I am in no way an expert on that topic, so I don't really know if these metrics really make sense for an aggregated API. |
lgtm |
/lgtm |
Thanks everyone for the reviews. |
Description
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.
The current implementation of prometheus-adapter exposes a lot of
metrics about the health of its aggregated apiserver. The issue is that
the some of these metrics are not very useful in the context of
prometheus-adapter, and we currently can't avoid exposing them since
they are registered to the Kubernetes global Prometheus registry. Until
this is improved in upstream Kubernetes, we could benefit from dropping
some of the metrics that are not very useful.
Before this change, in a default kube-prometheus installation, we would
have 800+ series for prometheus-adapter against 400+, so we divided the
number of series by two will focusing on the most valuable metrics for
prometheus-adapter.
Type of change
What type of changes does your code introduce to the kube-prometheus? Put an
x
in the box that apply.CHANGE
(fix or feature that would cause existing functionality to not work as expected)FEATURE
(non-breaking change which adds functionality)BUGFIX
(non-breaking change which fixes an issue)ENHANCEMENT
(non-breaking change which improves existing functionality)NONE
(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)Changelog entry
Please put a one-line changelog entry below. Later this will be copied to the changelog file.