-
Notifications
You must be signed in to change notification settings - Fork 251
add a negative cache to regexp decoder #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
786f90d
to
67101f0
Compare
decoder/regexp.go
Outdated
|
||
if r.cache == nil { | ||
r.cache = map[string]*regexp.Regexp{} | ||
} | ||
if conf.LruCacheSize > 0 && r.outputCache == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a cache for label sets, shouldn't it cover this too and at a higher level?
ebpf_exporter/decoder/decoder.go
Lines 96 to 100 in b62525b
cache, ok := s.cache[name] | |
if !ok { | |
cache = map[string][]string{} | |
s.cache[name] = cache | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobrik Yes, you are right. The LRU cache does not make sense. I think this should have been a negative cache instead, because the majority of overhead comes from matching input that would have produced ErrSkipLabelSet.
I've adjusted the implementation as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to limit the size here, as we don't do it for positive lookups. The most likely scenario is that you see the whole possible set of values on every scrape, so having an LRU that's one element too small will make it evict on every call and never hit anything. We don't have metrics, so it's easy to get this wrong and get somewhat higher CPU usage as an outcome.
Given that the decoding process is supposed to be deterministic (and we fully cache successes already), I think we need to cache errors as well and in the same place:
ebpf_exporter/decoder/decoder.go
Lines 108 to 111 in b62525b
values, err := s.decodeLabels(in, labels) | |
if err != nil { | |
return nil, err | |
} |
This will be both faster (as we do while set instead of one decoder kind) and universal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it some more: we skip things we don't want in metrics and they can in fact be higher cardinality than the thing we want to retain. Perhaps we do want an LRU cache after all, but it still makes sense to have it at the global level.
Not sure what to do about the sizing. Maybe log a message if too many errors are produced and hit rate is bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think we can lift the cache to be the same level as global, and make this a default behavior with configurable size. The question is whether we want multiple negative LRU cache, or only a single one. In practice, i think only regexp currently skips labels with potentially high cardinalities, so i think a single global LRU cache should be ok. It's simpler to configure too.
Regarding hit rates, I guess we can produce a prometheus metrics for this, and it's user responsibility to determine the desirable size by looking at the metrics and also load of the exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
33fbe97
to
3fd6913
Compare
Often, some decoders such as regexp can run repeatedly on the same input and skip them with regexp filter. An common example is matching cgroup path in a chain like so: ``` - name: cgroup - name: regexp regexps: - ^.*(system.slice).*$ ``` Anything that is not in system.slice cgroup will be skipped. When only a small subset of inputs is matched, the overhead of regexp matching can often be noticable. We add a skip cache here to test for input that would produce ErrSkipLabelSet and skip regex matching on them to reduce the work done on regexp matching. The cache size is customizable with the flag `config.skip-cache-size` Signed-off-by: Daniel Dao <[email protected]>
Often, some decoders such as regexp can run repeatedly on the same input and skip them
with regexp filter. An common example is matching cgroup path in a chain like so:
Anything that is not in system.slice cgroup will be skipped. When only a small
subset of inputs is matched, the overhead of regexp matching can often be noticable.
We add a skip cache here to test for input that would produce ErrSkipLabelSet and
skip regex matching on them to reduce the work done on regexp matching.
The cache size is customizable with the flag
config.skip-cache-size