-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Performance improvements to use RWLock to access LRUQueryCache #13306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This optimization also benefits high-cost querys such as terms query with 10000 terms, by reading cache more frequently instead of searching inverted index
|
@@ -628,7 +632,7 @@ private class LeafCache implements Accountable { | |||
|
|||
LeafCache(Object key) { | |||
this.key = key; | |||
cache = new IdentityHashMap<>(); | |||
cache = Collections.synchronizedMap(new IdentityHashMap<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Aren't all accesses to LeafCache
protected by the LRUQueryCache read/write locks?
Since LeafCache
isn't a static class, it should have access to the enclosing class's lock.
For testing safety, putIfAbsent, remove, onDocIdSetCache, and onDocIdSetEviction should all do a assert writeLock.isHeldByCurrentThread();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood here before. Since there is a read lock, concurrent access to IdentityHashMap does not require additional synchronized lock. I optimized the code as per your suggestion.
@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement. |
It still has benefits for high-frequency queries with low cardinality fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query caching in Lucene is designed to update the content of the cache infrequently (e.g. because we wait until we've seen a query multiple times before caching it), so it makes sense that switching to a RWLock reduced contention.
The change looks correct, I just left minor comments.
private volatile long hitCount; | ||
private volatile long missCount; | ||
private volatile LongAdder hitCount; | ||
private volatile LongAdder missCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields could be made final and non volative now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -911,15 +919,15 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { | |||
} | |||
|
|||
// If the lock is already busy, prefer using the uncached version than waiting | |||
if (lock.tryLock() == false) { | |||
if (writeLock.tryLock() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using the write lock here? It looks like we're only reading so we could use the read lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, read lock can also be used here
Improvement number of currently submitted code version.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.
@boicehuang could you add a changes entry to 9.11 under optimizations?
Finished |
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields. One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache. So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields.
Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields.
One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.
So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
I benchmarked this optimization by mocking some random LongPoint and querying them with one PointInSetQuery with bool filter.
I think this change can help filter queries that need to query low-cardinality fields.