Skip to content

Performance improvements to use RWLock to access LRUQueryCache #13306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 10, 2024

Conversation

boicehuang
Copy link
Contributor

@boicehuang boicehuang commented Apr 15, 2024

Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields.

Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields.

One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.

So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,

I benchmarked this optimization by mocking some random LongPoint and querying them with one PointInSetQuery with bool filter.

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 10 1 2481 5102 105.6%
30000000 1000000 1 6396 6596.48 3.1%

I think this change can help filter queries that need to query low-cardinality fields.

@boicehuang
Copy link
Contributor Author

This optimization also benefits high-cost querys such as terms query with 10000 terms, by reading cache more frequently instead of searching inverted index

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 1000000 10000 160 473 191.9%

@@ -628,7 +632,7 @@ private class LeafCache implements Accountable {

LeafCache(Object key) {
this.key = key;
cache = new IdentityHashMap<>();
cache = Collections.synchronizedMap(new IdentityHashMap<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Aren't all accesses to LeafCache protected by the LRUQueryCache read/write locks?

Since LeafCache isn't a static class, it should have access to the enclosing class's lock.

For testing safety, putIfAbsent, remove, onDocIdSetCache, and onDocIdSetEviction should all do a assert writeLock.isHeldByCurrentThread();

Copy link
Contributor Author

@boicehuang boicehuang Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood here before. Since there is a read lock, concurrent access to IdentityHashMap does not require additional synchronized lock. I optimized the code as per your suggestion.

@benwtrent
Copy link
Member

@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement.

@boicehuang
Copy link
Contributor Author

boicehuang commented May 6, 2024

@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement.

doc count field cardinality query point baseline QPS candidate QPS diff percentage diff
30000000 10 1 2481 4428 78% using LongAdder. uniqueQueries to be Collections.synchronizedMap. cache to be IdentityHashMap.

It still has benefits for high-frequency queries with low cardinality fields.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query caching in Lucene is designed to update the content of the cache infrequently (e.g. because we wait until we've seen a query multiple times before caching it), so it makes sense that switching to a RWLock reduced contention.

The change looks correct, I just left minor comments.

private volatile long hitCount;
private volatile long missCount;
private volatile LongAdder hitCount;
private volatile LongAdder missCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields could be made final and non volative now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -911,15 +919,15 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
}

// If the lock is already busy, prefer using the uncached version than waiting
if (lock.tryLock() == false) {
if (writeLock.tryLock() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the write lock here? It looks like we're only reading so we could use the read lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, read lock can also be used here

@boicehuang boicehuang changed the title Performance improvements to use read lock to access LRUQueryCache Performance improvements to use RWLock to access LRUQueryCache May 8, 2024
@boicehuang
Copy link
Contributor Author

boicehuang commented May 8, 2024

Improvement number of currently submitted code version.

doc count field cardinality query point baseline QPS candidate QPS diff percentage diff
30000000 10 1 2481 4408 78% using LongAdder. uniqueQueries to be Collections.synchronizedMap. cache to be IdentityHashMap.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.

@boicehuang could you add a changes entry to 9.11 under optimizations?

@boicehuang
Copy link
Contributor Author

I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.

@boicehuang could you add a changes entry to 9.11 under optimizations?

Finished

@benwtrent benwtrent merged commit 5ac88c7 into apache:main May 10, 2024
3 checks passed
benwtrent pushed a commit that referenced this pull request May 14, 2024
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. 

Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields. 

One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high,  which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.

So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants