-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add highlighting for match_only_text type #17101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add highlighting for match_only_text type #17101
Conversation
❌ Gradle check result for 52b5bcf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
52b5bcf
to
10aeff3
Compare
❌ Gradle check result for 10aeff3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/search/fetch/subphase/highlight/HighlightPhase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etolbakov -- could you please add a test to https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/yamlRestTest/resources/rest-api-spec/test/search.query/11_match_field_match_only_text.yml ?
We have some other highlighting-specific tests (albeit with ngram token filters) over at https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/yamlRestTest/resources/rest-api-spec/test/search.query/30_ngram_highligthing_field_match_only_text.yml, which may serve as examples.
It looks like your change specifically affects highlighting when you specify the fields to highlight via a wildcard pattern, right? I think if a match_only_text
field is mentioned specifically, it has worked all along -- please let me know if I'm wrong about that.
Can you please also add an entry to CHANGELOG.md so we remember to include this in the release notes for 2.19?
Thanks a lot!
Hi @msfroh! Regarding the usecase - yes, you are correct. Sample response is hidden here ⬇️ "took": 61,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 90016,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_id": "5a008d341645f42e2b067c813ca6eb04410f50b4f9ab513cef52b29aa5067c1f",
"_version": 1,
"_score": null,
"_size": 4045,
"_source": {
<omitted>
},
"fields": {
<omitted>
},
"highlight": {
"my_free_text_field": [
<omitted>
]
},
"sort": [
1737709464709557000
]
},
<omitted>
] As soon as I changed it to UPD: a side note - I've noticed that |
Signed-off-by: Eugene Tolbakov <[email protected]>
❌ Gradle check result for 9f00822: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Eugene Tolbakov <[email protected]>
❌ Gradle check result for 432a6bd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Eugene Tolbakov <[email protected]>
❌ Gradle check result for 71ca068: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Eugene Tolbakov <[email protected]>
❌ Gradle check result for 6e908cc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@msfroh I see the |
Buried in the console output, I found the following:
I don't think that's a result of your change, so I'm going to retry the build. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17101 +/- ##
============================================
- Coverage 72.41% 72.38% -0.03%
- Complexity 65626 65738 +112
============================================
Files 5306 5319 +13
Lines 304927 305703 +776
Branches 44257 44352 +95
============================================
+ Hits 220804 221290 +486
- Misses 66007 66335 +328
+ Partials 18116 18078 -38 ☔ View full report in Codecov by Sentry. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17101-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1bf8b9c3d01417f2877b803ddf8b45432408e662
# Push it to GitHub
git push --set-upstream origin backport/backport-17101-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
@msfroh |
Thank you for fixing this, @etolbakov. It looks like the backport had a merge conflict, almost certainly on the CHANGELOG.md file, probably because we just cut the 2.19 branch and removed all of the pending "unreleased 2.x" lines. If you can follow the above manual backport steps ☝️ and cut a backport PR, that would be great. Otherwise, I'm happy to take care of it. You've done the useful work, so fighting with our branching nonsense is entirely optional. 😀 |
Sure, I'll do it. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.19
# Create a new branch
git switch --create backport/backport-17101-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1bf8b9c3d01417f2877b803ddf8b45432408e662
# Push it to GitHub
git push --set-upstream origin backport/backport-17101-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.19 Then, create a pull request where the |
D'oh... it looks like it also doesn't auto-backport to the 2.19 branch. I'll take care of that one. |
--------- Signed-off-by: Eugene Tolbakov <[email protected]>
Description
Following the guidance provided in this helpful blog post(Optimize storage and performance with the MatchOnlyText field in OpenSearch), I changed the data type of several fields from "text" to "match_only_text".
However, I noticed that the highlighting functionality, which was previously working well and proved to be quite convenient, seems to be no longer available after the field type change.
I'm aware that "match_only_text" type has it's limitations (scoring, frequencies and positions, interval/span queries) but I failed to find highlighting being a tradeoff.
OpenSearch Version: v 2.17.0
Could you please consider this PR!
Thank you in advance,
Eugene
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
elastic/elasticsearch#85493
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.