Skip to content

[BUG]: Updating to 2.14.0 lead to high memory and termination of karafka worker #4626

Open
@wahlg

Description

@wahlg

Tracer Version(s)

2.14.0

Ruby Version(s)

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]

Relevent Library and Version(s)

No response

Bug Report

We have a Rails app that uses the karafka gem for kafka message consumption.

We have been using the built-in karafka tracing described here

# Initialize the listener 
dd_logger_listener = Karafka::Instrumentation::Vendors::Datadog::LoggerListener.new do |config| 
  config.client = Datadog::Tracing 
end
Karafka.monitor.subscribe(dd_logger_listener)

We recently upgraded our datadog tracing version to 2.14.0. We did NOT enable the karafka tracing that this provides (i.e. we did NOT add c.tracing.instrument :karafka to our datadog initializer).

However, after deploying the app we started to see very high memory usage of the container running karafka, which ultimately led to the container continuously getting killed due to out-of-memory issues. We also detected traces recorded by the worker of upwards of 90 minutes.

Once we disabled the karafka instrumentation for traces above, memory usage returned to normal. However, we now have no traces, and were not expecting this datadog upgrade to be a breaking change for our application.

Is there a way we can disable any instrumentation that the datadog gem is providing, and just use the tracing packaged with the karafka gem?

Thanks

Reproduction Code

Add this to karafka.rb, without adding any karafka tracing in Datadog.configure

# Initialize the listener 
dd_logger_listener = Karafka::Instrumentation::Vendors::Datadog::LoggerListener.new do |config| 
  config.client = Datadog::Tracing 
end
Karafka.monitor.subscribe(dd_logger_listener)

Configuration Block

Datadog.configure do |c|
  c.profiling.enabled = true
  c.service = 'service'
  c.version = ENV.fetch('GIT_SHA', 'none')
  c.env = Rails.env
  c.diagnostics.debug = false
  c.diagnostics.startup_logs.enabled = false
  c.telemetry.enabled = true
  c.tracing.log_injection = true
  c.tracing.enabled = true
  c.tracing.instrument :rails
  c.tracing.instrument :concurrent_ruby
  c.tracing.instrument :faraday, split_by_domain: true
  c.tracing.instrument :graphql, service_name: 'ruby-graphql'
  c.tracing.instrument :grpc, { distributed_tracing: true }
  c.tracing.instrument :pg, {
    comment_propagation: 'full',
  }
  c.tracing.instrument :rack
  c.tracing.instrument :ethon, split_by_domain: true
  c.logger.instance = Rails.logger
end

Error Logs

No response

Operating System

No response

How does Datadog help you?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugInvolves a bugcommunityWas opened by a community member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions