Skip to content

Commit 3957bfb

Browse files
jmacdBogdan Drutu
and
Bogdan Drutu
authored
Remove the Metric Gauge instrument, recommend use of other instruments (open-telemetry#80)
* Remove the Metric Gauge instrument, recommend exclusive use of Observer instrument * Typos * More text Co-authored-by: Bogdan Drutu <[email protected]>
1 parent fd81fda commit 3957bfb

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed

oteps/0080-remove-metric-gauge.md

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Remove the Metric API Gauge instrument
2+
3+
The [Observer instrument](./0072-metric-observer.md) is semantically
4+
identical to the metric Gauge instrument, only it is reported via a
5+
callback instead of synchronous API calls. Implementation has shown
6+
that Gauge instruments are difficult to reason about because the
7+
semantics of a "last value" Aggregator have to address questions about
8+
statefulness--the SDK's ability to recall old values. Observer
9+
instruments avoid some of these concerns because they are reported
10+
once per collection period, making it easier to reason about "all
11+
values" in an aggregator.
12+
13+
## Motivation
14+
15+
Observer instruments improve on our ability to compute well-defined
16+
sum and average-value aggregations over a set of last-value aggregated
17+
data, compared with the existing Gauge instrument. Using data from an
18+
Observer instrument, we are easily able to pose queries about the
19+
current sum of all current values as well as the number of distinct
20+
values, which together define the average value.
21+
22+
To do the same with synchronous Gauge instruments, the SDK would
23+
potentially be required to maintain state outside a single collection
24+
window, which complicates memory management. The SDK is required to
25+
maintain state about all distinct label sets over the query evaluation
26+
interval.
27+
28+
The question is: how long should the SDK remember a gauge value?
29+
Observer instruments do not pose this complication, because
30+
observations are synchronized with collection instead of with the
31+
application.
32+
33+
Unlike with Gauge instruments, Observer instruments naturally define
34+
the current set of all values for a single collection period, making
35+
sum and average-value aggregations possible without mention of the
36+
query evaluation interval, and without the implied additional state
37+
management.
38+
39+
## Explanation
40+
41+
The Gauge instrument's most significant feature is that its
42+
measurement interval is arbitrary -- controlled by the application
43+
through explicit, synchronous calls to `Set()`. It is used to report
44+
a current value in a synchronous context, meaning the metric event is
45+
associated with a label set determined by some "request".
46+
47+
This proposal recommends that synchronously reporting Gauge values can
48+
always be accomplished using one of the three other kinds of
49+
instrument.
50+
51+
It was _already_ recommended in the specification that if the
52+
instrument reports values you would naturally sum, you should have
53+
used a Counter in the first place. These are not really "current"
54+
values when reported, they are current contributions to the sum. We
55+
still recommend Counters in this case.
56+
57+
If the gauge reports values, where you would naturally average the
58+
last value across distinct label sets, use a Measure instrument.
59+
Configure the instrument for last-value aggregation. Since last-value
60+
aggregation is not the default for Measure instruments, this will be
61+
non-standard and require extra configuration.
62+
63+
If the gauge reports values, where you would naturally sum the last
64+
value across distinct label sets, use an Observer instrument. The
65+
current set of entities (e.g., shards, active users, etc) constributes
66+
a last value that should be summed. These are different from Counter
67+
instruments because we are not interested in a sum across time, we are
68+
interested in a sum across distinct instances.
69+
70+
### Example: Reporting per-request CPU usage
71+
72+
Use a counter to report a quantity that is naturally summed over time,
73+
such as CPU usage.
74+
75+
### Example: Reporting per-shard memory holdings
76+
77+
There are a number of current shards holding variable amounts of
78+
memory by a widely-used library. Observe the current allocation per
79+
shard using an Observer instrument. These can be aggregated across
80+
hosts to compute cluster-wide memory holdings by shard, for example.
81+
82+
It does not make sense to compute a sum of memory holdings over
83+
multiple periods, as these are not additive quantities. It does makes
84+
sense to sum the last value across hosts.
85+
86+
### Example: Reporting a per-request finishing account balance
87+
88+
There's a number that rises and falls such as a bank account balance.
89+
This was being `Set()` at the finish of all transactions. Replace it
90+
with a Measure instrument and `Record()` the last value.
91+
92+
Similar cases: report a cpu load, specific temperature, fan speed, or
93+
altitude measurement associated with a request.
94+
95+
## Internal details
96+
97+
The Gauge instrument will be removed from the specification at the
98+
same time the Observer instrument is added. This will make the
99+
transition easier because in many cases, Observer instruments simply
100+
replace Gauge instruments in the text.
101+
102+
## Trade-offs and mitigations
103+
104+
Not much is lost to the user from removing Gauge instruments.
105+
106+
There may be situations where an Observer instrument is the natural
107+
choice but it is undesirable to be interrupted by the Metric SDK in
108+
order to execute an Observer callback. Situations where Observer
109+
semantics are correct (not Counter, not Measure) but a synchronous API
110+
is more acceptable are expected to be very rare.
111+
112+
To address such rare cases, here are two possibilities:
113+
114+
1. Implement a Gauge Set instrument backed by an Observer instrument.
115+
The Gauge Set's job is to maintain the current set of label sets
116+
(e.g., explicitly managed or by time-limit) and their last value, to
117+
be reported by the Observer at each collection interval.
118+
2. Implement an application-specific metric collection API that would
119+
allow the application to synchronize with the SDK on collection
120+
intervals. For example, a transactional API allowing the application
121+
to BEGIN and END synchronously reporting Observer instrument
122+
observations.
123+
124+
## Prior art and alternatives
125+
126+
Many existing Metric libraries support both synchronous and
127+
asynchronous Gauge-like instruments.
128+
129+
See the initial discussion in [Spec issue
130+
412](https://github.com/open-telemetry/opentelemetry-specification/issues/412).

0 commit comments

Comments
 (0)