Skip to content

BooleanQuery rewrite for must_not RangeQuery clauses #17655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 4, 2025

Conversation

peteralfonsi
Copy link
Contributor

@peteralfonsi peteralfonsi commented Mar 21, 2025

Description

This PR automatically rewrites boolean queries which have a must_not RangeQuery clause to instead use a should clause of the complement of that range. This can be 2-30x faster depending on the query. See #17586 where this is described in more detail.

Example original query (on nyc_taxis):

"bool" : { 
  "must_not": [ 
    {"range" : {"dropoff_datetime" : {"gte": "01/07/2015", "lte": "01/09/2015", "format": "dd/MM/yyyy"}}}
  ]
}

Rewritten query:

"bool": { 
  "must":{
    "bool":{
      "should": [
        {"range" : {"dropoff_datetime" : {"lt": "01/07/2015", "format": "dd/MM/yyyy"}}},
        {"range" : {"dropoff_datetime" : {"gt": "01/09/2015", "format": "dd/MM/yyyy"}}}
      ]
    }
  }
}

Some benchmark numbers from http_logs and nyc_taxis (excluded ranges are on @timestamp and dropoff_datetime fields respectively). "Originally written as" means whether the query was sent to OpenSearch with a must_not clause, or if it was sent already rewritten with should clauses. Ideally, after the changes are applied, these p50s should be the same.

Excluded range Originally written as Dataset p50 before changes (ms) p50 after changes (ms)
6/10 - 6/13 must_not http_logs 259 38.2
6/10 - 6/13 should http_logs 34.2 39.5
6/9 - 6/10 must_not http_logs 269 30.8
6/9 - 6/10 should http_logs 26.3 30.8
7/1 - 9/1 must_not nyc_taxis 873 408
7/1 - 9/1 should nyc_taxis 427 405
1/1 - 9/1 must_not nyc_taxis 1214 111
1/1 - 9/1 should nyc_taxis 116 111
1/1 12:00 - 1/1 12:01 must_not nyc_taxis 599 19.5
1/1 12:00 - 1/1 12:01 should nyc_taxis 19.3 20.2

I believe the small differences between runs (for example, 7/1-9/1 should going from 427 -> 405 ms, when we'd expect no change) is just due to variation between different runs/instances. This is expected from what I've seen in tiered caching benchmarks. I've done a few runs and the direction/magnitude of the changes vary.

Related Issues

Part of #17586

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 2 commits March 20, 2025 09:55
Peter Alfonsi added 2 commits March 21, 2025 13:51
Copy link
Contributor

❌ Gradle check result for d9eee10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

✅ Gradle check result for 25367bb: SUCCESS

@peteralfonsi
Copy link
Contributor Author

Sorry for the very long delay, was caught up with other things. Looks like the method described by @froh is actually doable at QueryBuilder rewrite time, since we have access to LeafReaderContexts via IndexSearcher via QueryShardContext. I've added this check. I also reran the benchmarks to make sure getting PointValues wasn't slowing things down, but there was no change in latencies.

Copy link
Contributor

❕ Gradle check result for fc49ddd: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label May 18, 2025
Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to this one. I accidentally lost track of it.

The additional changes look safer, thanks! I just had a small clean-up suggestion on getting the LeafReaderContexts.

Signed-off-by: Peter Alfonsi <[email protected]>
@peteralfonsi peteralfonsi requested a review from a team as a code owner May 19, 2025 19:45
Copy link
Contributor

❌ Gradle check result for feed2e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

❕ Gradle check result for c273159: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@peteralfonsi
Copy link
Contributor Author

Hey @msfroh , any further comments on this one?

Copy link
Contributor

❌ Gradle check result for c4d29f2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

Flaky test: #18302

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

✅ Gradle check result for 58d9a45: SUCCESS

@peteralfonsi
Copy link
Contributor Author

Hey @msfroh , just bumping on this

@msfroh msfroh merged commit a6eb368 into opensearch-project:main Jun 4, 2025
30 checks passed
Gagan6164 pushed a commit to Gagan6164/OpenSearch that referenced this pull request Jun 8, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
Gagan6164 pushed a commit to Gagan6164/OpenSearch that referenced this pull request Jun 8, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Jun 9, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 9, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Co-authored-by: Peter Alfonsi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants