[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

crhuber · 2025-05-30T10:00:11Z

We have installed envoy-gateway 1.4 which runs of course envoy. Since datadog has the envoy integeration we would expect these metrics to be ingested as part of the envoy integration.
Instead they are being ingested as custom metrics.

Agent Environment
gcr.io/datadoghq/agent:7.65.1

Describe what happened:
I tried forcing the deployment to use the envoy checks by adding this annotation but it didnt seem to work

            ad.datadoghq.com/envoy.checks: |
              {
                "envoy": {
                  "init_config": {},
                  "instances": [
                    {
                      "openmetrics_endpoint": "http://%%host%%:19001/stats/prometheus",
                      "namespace": "envoygateway"
                    }
                  ]
                }
              }

Instead the metrics show up like this in metrics explorer

Describe what you expected:
Envoy gateway metrics ingested as envoy integration metrics

Additional environment details (Operating System, Cloud provider, etc):
Kubernetes GKE

The text was updated successfully, but these errors were encountered:

steveny91 · 2025-05-30T18:07:17Z

@crhuber Thanks for reaching out and sorry you're having this issue. Is there a chance you can omit the namespace config here? The namespace config renames all the metrics and make them appear under a different namespace. I think that has some ramification with standard metric recognition (we should try to block this actually).

~~If you would like to distinguish the difference in data sets, then I think tags are the suggested approach.~~

Actually. The metrics here might be collected from something else. Because there isn't a namespace on the screenshotted metric at all. If you disable the envoy check here, do the metric go away? Specifically for this host for envoy_cluster_upstream_rq.count? I'm wondering if the metric is being collected from somewhere else like a an openmetrics check with a wildcar match for metrics or something.

crhuber · 2025-06-02T13:13:51Z

@steveny91 i did remove the namespace but it didnt make a difference. I did see this on the datadog agent logs though which indicate its trying to check envoy on the wrong port 8877 and wrong url /metrics

2025-06-02 12:44:18 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:143 in LogMessage) | envoy:f573b3ff744d74d | (base.py:74) | There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))
2025-06-02 12:44:18 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:71 in Error) | check:envoy | Error running check: [{"message":"There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))","traceback":"Traceback (most recent call last):\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/base.py", line 1332, in run\n self.check(instance)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/envoy/check.py", line 177, in check\n super(EnvoyCheckV2, self).check(None)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/v2/base.py", line 75, in check\n raise type(e)("There was an error scraping endpoint {}: {}".format(endpoint, e)) from None\nrequests.exceptions.ConnectTimeout: There was an error scraping endpoint http://10.11.13.34:8877/metrics: HTTPConnectionPool(host='10.11.13.34', port=8877): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7951b1d6da60>, 'Connection to 10.11.13.34 timed out. (connect timeout=10.0)'))\n"}]

steveny91 · 2025-06-02T17:04:14Z

Hmm on the agent pod/container, can you run agent configcheck and providing the output? There might be some other instance that is causing this issue to happen. Because it's scraping a different port then it might be getting some config from somewhere else.

crhuber · 2025-06-03T09:25:02Z

@steveny91
agent configcheck returns overlapping checks for both envoy and openmetrics

=== envoy check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
Config for instance ID: envoy:9630324ddc476956
openmetrics_endpoint: http://10.11.13.36:19001/stats/prometheus
tags:
  - image_name:docker.io/envoyproxy/envoy
  - image_tag:distroless-v1.34.1
  - kube_app_component:proxy
  - kube_app_managed_by:envoy-gateway
  - kube_app_name:envoy
  - kube_container_name:envoy
  - kube_deployment:envoy-envoy-gateway-system-external-7d05cf23
  - kube_namespace:envoy-gateway-system
  - kube_ownerref_kind:replicaset
  - kube_qos:Burstable
  - kube_replica_set:envoy-envoy-gateway-system-external-7d05cf23-594888b6bb
  - kube_service:envoy-envoy-gateway-system-external-7d05cf23
  - pod_phase:running
  - short_image:envoy
~
Init Config:
{}
Auto-discovery IDs:
* containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
===


=== openmetrics check ===
Configuration provider: prometheus-pods
Configuration source: prometheus_pods:containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
Config for instance ID: openmetrics:4952e36e5b45d05
histogram_buckets_as_distributions: true
ignore_tags:
  - image_id
  - image_name
  - kube_qos
  - kube_ownerref_kind
max_returned_metrics: 15000
metrics:
  - .*
namespace: ""
openmetrics_endpoint: http://10.11.13.36:19001/stats/prometheus
tag_by_endpoint: false
tags:
  - image_name:docker.io/envoyproxy/envoy
  - image_tag:distroless-v1.34.1
  - kube_app_component:proxy
  - kube_app_managed_by:envoy-gateway
  - kube_app_name:envoy
  - kube_container_name:envoy
  - kube_deployment:envoy-envoy-gateway-system-external-7d05cf23
  - kube_namespace:envoy-gateway-system
  - kube_ownerref_kind:replicaset
  - kube_qos:Burstable
  - kube_replica_set:envoy-envoy-gateway-system-external-7d05cf23-594888b6bb
  - kube_service:envoy-envoy-gateway-system-external-7d05cf23
  - pod_phase:running
  - short_image:envoy
~
Init Config:
{}
Auto-discovery IDs:
* containerd://f67430240e42d09c3f4dedd6fd9f7e4e48da4007a70b5a6b82abedb9ff134840
===

crhuber added the team/triage label May 30, 2025

github-actions bot added the team/ebpf-platform label May 30, 2025

brycekahle added team/agent-integrations and removed team/ebpf-platform labels May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

crhuber commented May 30, 2025

steveny91 commented May 30, 2025 •

edited

Loading

Uh oh!

crhuber commented Jun 2, 2025 •

edited

Loading

Uh oh!

steveny91 commented Jun 2, 2025

Uh oh!

crhuber commented Jun 3, 2025

Uh oh!

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

[BUG] Envoy Gateway Metrics Reported As Custom Metrics #37505

Comments

crhuber commented May 30, 2025

steveny91 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crhuber commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveny91 commented Jun 2, 2025

Uh oh!

crhuber commented Jun 3, 2025

Uh oh!

steveny91 commented May 30, 2025 •

edited

Loading

crhuber commented Jun 2, 2025 •

edited

Loading