-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[service/internal/graph] Measure telemetry as it is passed between pipeline components #12812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[service/internal/graph] Measure telemetry as it is passed between pipeline components #12812
Conversation
b6bb02d
to
2f83f2b
Compare
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (86.89%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #12812 +/- ##
==========================================
- Coverage 91.65% 91.56% -0.10%
==========================================
Files 499 499
Lines 27426 27801 +375
==========================================
+ Hits 25138 25456 +318
- Misses 1809 1847 +38
- Partials 479 498 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial thoughts, overall implementation seems very sane to me though. Thank you!
2f83f2b
to
052cffc
Compare
052cffc
to
362a610
Compare
fbed573
to
b789109
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving this work forward @djaglowski, just a few questions
obsConsumer := obsconsumer.NewTraces(fanoutconsumer.NewTraces(consumers), tb.ReceiverProducedItems) | ||
n.Component, err = builder.CreateTraces(ctx, set, obsConsumer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocking comment:
obsConsumer := obsconsumer.NewTraces(fanoutconsumer.NewTraces(consumers), tb.ReceiverProducedItems) | |
n.Component, err = builder.CreateTraces(ctx, set, obsConsumer) | |
n.Component, err = builder.CreateTraces(ctx, set, obsconsumer.NewTraces(fanoutconsumer.NewTraces(consumers), tb.ReceiverProducedItems)) |
Thanks for the reviews. #12817 implements a subset of this PR. If we can get that merged in first, I'll rebase to reduce the scope of this one substantially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks really good 👍🏻
As shared in the other PR, I'd really like to see bytes counters too but this is a great start.
Subset of #12812 This internal package defines wrappers around consumers. These are useful for instrumenting the component graph, so that we can generate telemetry describing data as it is passed in between components. Currently, this supports only a single counter metric, but in the near future it can be enhanced to automatically capture multiple metrics (e.g. item count & size), and potentially spans and/or logs as well.
b789109
to
4afe6cd
Compare
4afe6cd
to
5bdd398
Compare
adec160
to
b66a5a8
Compare
4bfa6c4
to
c03ad2c
Compare
c03ad2c
to
1a1d65a
Compare
I am pretty stumped on the datadog exporter integration test failure. I don't see anything explicitly indicating that it is related to this change but it does appear to fail consistently on this PR. The panic indicates a relation to prometheus, which made me suspect that the new metric naming format (using @mx-psi is there anyone at DataDog that might be able to provide more insight into this test? |
if err != nil { | ||
return fmt.Errorf("failed to create %q processor, in pipeline %q: %w", set.ID, n.pipelineID.String(), err) | ||
} | ||
n.consumer = obsconsumer.NewProfiles(n.Component.(xconsumer.Profiles), tb.ProcessorConsumedItems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be done once after the switch statement
receiver.produced.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items emitted from the receiver. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true | ||
processor.consumed.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items passed to the processor. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true | ||
processor.produced.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items emitted from the processor. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true | ||
connector.consumed.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items passed to the connector. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true | ||
connector.produced.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items emitted from the connector. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true | ||
exporter.consumed.size: | ||
prefix: otelcol. | ||
enabled: false | ||
description: Size of items passed to the exporter. | ||
unit: "By" | ||
sum: | ||
value_type: int | ||
monotonic: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the size
metrics aren't used yet, I'd suggest removing these from the PR
@djaglowski not sure if it's helpful, but I ran into a similar issue back in the PR that was updating the collector to use the otel config package from otel-go and @songy23 commented on the failure: #11611 (comment) |
I built the collector using the branch and when running it, this is what I see at
|
testing this further, this appears to be the current behaviour in main, it looks like something broke in the last 20 commits |
Looks like things have been unhappy since this change: dc8e2dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR LGTM. However, I'd like to get an agreement on #12916 before releasing this
I just tested and saw the same, internal metrics are broken at mainline head. |
Yes, reverting that commit fixes it http://github.com/open-telemetry/opentelemetry-collector/pull/12917 |
Depends on #12856
Resolves #12676
This is a reboot of #11311, incorporating metrics defined in the component telemetry RFC and attributes added in #12617.
The basic pattern is:
TODO:
Next Steps: