Skip to content

Replace telegraf prometheus plugin #1709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dricross
Copy link
Contributor

@dricross dricross commented May 30, 2025

Description of the issue

The CloudWatch Agent does not support pushing Prometheus exponential histograms to CloudWatch Logs via EMF. Any exponential histograms are simply dropped by the agent. This is because the telegraf-based prometheus receiver doesn't support them. The OpenTelemetry one does support exponential histograms, so we are working on migrating all prometheus receiver uses to the OpenTelemetry plugin.

Description of changes

Staging some changes into a non-personal feature branch for replacing the telegraf-based Prometheus receiver with the OpenTelemetry-based Prometheus receciver.

  • Create prometheus adapter processor
  • [Translator] Update Prometheus -> EMF pipeline to use OpenTelemetry plugin + new prometheus adapter receiver. Remove TOML translation for prometheus receiver

The prometheus adapter processor will be used to translate the metrics emitted by the OpenTelemetry prometheus receiver to match what the Telegraf prometheus receiver would have output to maintain backwards compatibility.

This change "breaks" the current Prometheus -> EMF pipeline in the sense that the metrics do not show up as they used to, so these changes are not yet ready to merge to main.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Run Prometheus server to vend prometheus metrics. Configure agent to collect prometheus metrics and push to CW Logs. Metrics show up in CloudWatch, e.g.

{
  "agent": {
    "metrics_collection_interval": 10,
    "run_as_user": "root",
    "debug": true,
    "logfile": ""
  },
  "logs": {
    "metrics_collected": {
      "prometheus": {
        "prometheus_config_path": "/home/ec2-user/prometheus/prometheus.yaml",
        "log_group_name": "prometheusadapter",
        "emf_processor": {
          "metric_namespace": "prometheusadapter",
          "metric_declaration": [
            {
              "source_labels": [
                "include"
              ],
              "label_matcher": "^yes$",
              "dimensions": [
                [
                  "prom_type"
                ],
                [
                  "prom_type",
                  "quantile"
                ]
              ],
              "metric_selectors": [
                "^prometheus_test*"
              ]
            }
          ]
        }
      }
    },
    "force_flush_interval": 5
  }
}

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@dricross dricross requested a review from a team as a code owner May 30, 2025 14:58
@dricross dricross changed the title Replace telegrah prometheus plugin Replace telegraf prometheus plugin May 30, 2025
Comment on lines 15 to 16
// Validate does not check for unsupported dimension key-value pairs, because those
// get silently dropped and ignored during translation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm guessing this was copied from somewhere. Consider removing it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, copied from somewhere else. I'll remove the comments

)

type prometheusAdapaterProcessor struct {
*Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a reason we want to embed the Config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. Following convention from other processors (kueueattributes, gpuattributes, ec2tagger)

"go.uber.org/zap"
)

type prometheusAdapaterProcessor struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

Suggested change
type prometheusAdapaterProcessor struct {
type prometheusAdapterProcessor struct {

"go.uber.org/zap"
)

func TestProcessMetricsForKueueMetrics(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Looks like this was copied from the kueueattributes processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it was. I'll fix the name

- batch/prometheus/cloudwatchlogs
receivers:
- telegraf_prometheus
- prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to add a name to this receiver (e.g. prometheus/cloudwatchlogs) so we can differentiate it from the one configured in the metrics section. These can and should be distinct prometheus receivers unless we find a way to dedup them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks. will update

@dricross dricross merged commit ae49b00 into replace-telegrapf-prom-plugin Jun 9, 2025
@dricross dricross deleted the dricross/replace-telegraf-prom-plugin branch June 9, 2025 21:02
dricross added a commit that referenced this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants