Skip to content

add a negative cache to regexp decoder #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dqminh
Copy link
Contributor

@dqminh dqminh commented Feb 26, 2025

Often, some decoders such as regexp can run repeatedly on the same input and skip them
with regexp filter. An common example is matching cgroup path in a chain like so:

- name: cgroup
- name: regexp
  regexps:
  - ^.*(system.slice).*$

Anything that is not in system.slice cgroup will be skipped. When only a small
subset of inputs is matched, the overhead of regexp matching can often be noticable.

We add a skip cache here to test for input that would produce ErrSkipLabelSet and
skip regex matching on them to reduce the work done on regexp matching.
The cache size is customizable with the flag config.skip-cache-size

@dqminh dqminh force-pushed the regexp-cache branch 2 times, most recently from 786f90d to 67101f0 Compare February 26, 2025 16:15

if r.cache == nil {
r.cache = map[string]*regexp.Regexp{}
}
if conf.LruCacheSize > 0 && r.outputCache == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a cache for label sets, shouldn't it cover this too and at a higher level?

cache, ok := s.cache[name]
if !ok {
cache = map[string][]string{}
s.cache[name] = cache
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobrik Yes, you are right. The LRU cache does not make sense. I think this should have been a negative cache instead, because the majority of overhead comes from matching input that would have produced ErrSkipLabelSet.
I've adjusted the implementation as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to limit the size here, as we don't do it for positive lookups. The most likely scenario is that you see the whole possible set of values on every scrape, so having an LRU that's one element too small will make it evict on every call and never hit anything. We don't have metrics, so it's easy to get this wrong and get somewhat higher CPU usage as an outcome.

Given that the decoding process is supposed to be deterministic (and we fully cache successes already), I think we need to cache errors as well and in the same place:

values, err := s.decodeLabels(in, labels)
if err != nil {
return nil, err
}

This will be both faster (as we do while set instead of one decoder kind) and universal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more: we skip things we don't want in metrics and they can in fact be higher cardinality than the thing we want to retain. Perhaps we do want an LRU cache after all, but it still makes sense to have it at the global level.

Not sure what to do about the sizing. Maybe log a message if too many errors are produced and hit rate is bad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think we can lift the cache to be the same level as global, and make this a default behavior with configurable size. The question is whether we want multiple negative LRU cache, or only a single one. In practice, i think only regexp currently skips labels with potentially high cardinalities, so i think a single global LRU cache should be ok. It's simpler to configure too.

Regarding hit rates, I guess we can produce a prometheus metrics for this, and it's user responsibility to determine the desirable size by looking at the metrics and also load of the exporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@dqminh dqminh changed the title add a lru cache to regexp decoder add a negative cache to regexp decoder Mar 3, 2025
@dqminh dqminh force-pushed the regexp-cache branch 2 times, most recently from 33fbe97 to 3fd6913 Compare March 3, 2025 17:03
Often, some decoders such as regexp can run repeatedly on the same input and skip them
with regexp filter. An common example is matching cgroup path in a chain like so:

```
- name: cgroup
- name: regexp
  regexps:
  - ^.*(system.slice).*$
```

Anything that is not in system.slice cgroup will be skipped. When only a small
subset of inputs is matched, the overhead of regexp matching can often be noticable.

We add a skip cache here to test for input that would produce ErrSkipLabelSet and
skip regex matching on them to reduce the work done on regexp matching.
The cache size is customizable with the flag `config.skip-cache-size`

Signed-off-by: Daniel Dao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants