Skip to content

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
crhuber opened this issue May 30, 2025 · 4 comments
Open

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

crhuber opened this issue May 30, 2025 · 4 comments

Comments

@crhuber
Copy link

crhuber commented May 30, 2025

We have installed envoy-gateway 1.4 which runs of course envoy. Since datadog has the envoy integeration we would expect these metrics to be ingested as part of the envoy integration.
Instead they are being ingested as custom metrics.

Agent Environment
gcr.io/datadoghq/agent:7.65.1

Describe what happened:
I tried forcing the deployment to use the envoy checks by adding this annotation but it didnt seem to work

            ad.datadoghq.com/envoy.checks: |
              {
                "envoy": {
                  "init_config": {},
                  "instances": [
                    {
                      "openmetrics_endpoint": "http://%%host%%:19001/stats/prometheus",
                      "namespace": "envoygateway"
                    }
                  ]
                }
              }

Instead the metrics show up like this in metrics explorer

Image

Describe what you expected:
Envoy gateway metrics ingested as envoy integration metrics

Additional environment details (Operating System, Cloud provider, etc):
Kubernetes GKE

@steveny91
Copy link
Contributor

steveny91 commented May 30, 2025

@crhuber Thanks for reaching out and sorry you're having this issue. Is there a chance you can omit the namespace config here? The namespace config renames all the metrics and make them appear under a different namespace. I think that has some ramification with standard metric recognition (we should try to block this actually).

If you would like to distinguish the difference in data sets, then I think tags are the suggested approach.

Actually. The metrics here might be collected from something else. Because there isn't a namespace on the screenshotted metric at all. If you disable the envoy check here, do the metric go away? Specifically for this host for envoy_cluster_upstream_rq.count? I'm wondering if the metric is being collected from somewhere else like a an openmetrics check with a wildcar match for metrics or something.

@crhuber
Copy link
Author

crhuber commented Jun 2, 2025

@steveny91 i did remove the namespace but it didnt make a difference. I did see this on the datadog agent logs though which indicate its trying to check envoy on the wrong port 8877 and wrong url /metrics

2025-06-02 12:44:18 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:143 in LogMessage) | envoy:f573b3ff744d74d | (base.py:74) | There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))
2025-06-02 12:44:18 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:71 in Error) | check:envoy | Error running check: [{"message":"There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))","traceback":"Traceback (most recent call last):\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/base.py", line 1332, in run\n self.check(instance)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/envoy/check.py", line 177, in check\n super(EnvoyCheckV2, self).check(None)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/v2/base.py", line 75, in check\n raise type(e)("There was an error scraping endpoint {}: {}".format(endpoint, e)) from None\nrequests.exceptions.ConnectTimeout: There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))\n"}]

@steveny91
Copy link
Contributor

Hmm on the agent pod/container, can you run agent configcheck and providing the output? There might be some other instance that is causing this issue to happen. Because it's scraping a different port then it might be getting some config from somewhere else.

@crhuber
Copy link
Author

crhuber commented Jun 3, 2025

@steveny91
agent configcheck returns overlapping checks for both envoy and openmetrics

=== envoy check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
Config for instance ID: envoy:9630324ddc476956
openmetrics_endpoint: http://10.11.13.36:19001/stats/prometheus
tags:
  - image_name:docker.io/envoyproxy/envoy
  - image_tag:distroless-v1.34.1
  - kube_app_component:proxy
  - kube_app_managed_by:envoy-gateway
  - kube_app_name:envoy
  - kube_container_name:envoy
  - kube_deployment:envoy-envoy-gateway-system-external-7d05cf23
  - kube_namespace:envoy-gateway-system
  - kube_ownerref_kind:replicaset
  - kube_qos:Burstable
  - kube_replica_set:envoy-envoy-gateway-system-external-7d05cf23-594888b6bb
  - kube_service:envoy-envoy-gateway-system-external-7d05cf23
  - pod_phase:running
  - short_image:envoy
~
Init Config:
{}
Auto-discovery IDs:
* containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
===


=== openmetrics check ===
Configuration provider: prometheus-pods
Configuration source: prometheus_pods:containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
Config for instance ID: openmetrics:4952e36e5b45d05
histogram_buckets_as_distributions: true
ignore_tags:
  - image_id
  - image_name
  - kube_qos
  - kube_ownerref_kind
max_returned_metrics: 15000
metrics:
  - .*
namespace: ""
openmetrics_endpoint: http://10.11.13.36:19001/stats/prometheus
tag_by_endpoint: false
tags:
  - image_name:docker.io/envoyproxy/envoy
  - image_tag:distroless-v1.34.1
  - kube_app_component:proxy
  - kube_app_managed_by:envoy-gateway
  - kube_app_name:envoy
  - kube_container_name:envoy
  - kube_deployment:envoy-envoy-gateway-system-external-7d05cf23
  - kube_namespace:envoy-gateway-system
  - kube_ownerref_kind:replicaset
  - kube_qos:Burstable
  - kube_replica_set:envoy-envoy-gateway-system-external-7d05cf23-594888b6bb
  - kube_service:envoy-envoy-gateway-system-external-7d05cf23
  - pod_phase:running
  - short_image:envoy
~
Init Config:
{}
Auto-discovery IDs:
* containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
===

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants