Skip to content

Add highlighting for match_only_text type #17101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jan 30, 2025

Conversation

etolbakov
Copy link
Contributor

@etolbakov etolbakov commented Jan 23, 2025

Description

Following the guidance provided in this helpful blog post(Optimize storage and performance with the MatchOnlyText field in OpenSearch), I changed the data type of several fields from "text" to "match_only_text".

However, I noticed that the highlighting functionality, which was previously working well and proved to be quite convenient, seems to be no longer available after the field type change.
I'm aware that "match_only_text" type has it's limitations (scoring, frequencies and positions, interval/span queries) but I failed to find highlighting being a tradeoff.

OpenSearch Version: v 2.17.0

Could you please consider this PR!
Thank you in advance,
Eugene

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

elastic/elasticsearch#85493

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 52b5bcf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@etolbakov etolbakov force-pushed the match_only_text-highlighting branch from 52b5bcf to 10aeff3 Compare January 23, 2025 16:30
Copy link
Contributor

❌ Gradle check result for 10aeff3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etolbakov -- could you please add a test to https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/yamlRestTest/resources/rest-api-spec/test/search.query/11_match_field_match_only_text.yml ?

We have some other highlighting-specific tests (albeit with ngram token filters) over at https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/yamlRestTest/resources/rest-api-spec/test/search.query/30_ngram_highligthing_field_match_only_text.yml, which may serve as examples.

It looks like your change specifically affects highlighting when you specify the fields to highlight via a wildcard pattern, right? I think if a match_only_text field is mentioned specifically, it has worked all along -- please let me know if I'm wrong about that.

Can you please also add an entry to CHANGELOG.md so we remember to include this in the release notes for 2.19?

Thanks a lot!

@etolbakov
Copy link
Contributor Author

etolbakov commented Jan 24, 2025

Hi @msfroh!
Thank you very much for the review, feedback and quick turnaround!
I'll try addressing your comments ASAP.

Regarding the usecase - yes, you are correct.
Let me provide more details: I'm using OpenSearch Dashboards. When my field has the text type and the highlighting worked fine.
In the Inspect -> Response tab, I saw the exact response from backend and every document had the highlight section in it.

Sample response is hidden here ⬇️
 "took": 61,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 90016,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_id": "5a008d341645f42e2b067c813ca6eb04410f50b4f9ab513cef52b29aa5067c1f",
        "_version": 1,
        "_score": null,
        "_size": 4045,
        "_source": {
          <omitted>
        },
        "fields": {
          <omitted>
        },
        "highlight": {
          "my_free_text_field": [
            <omitted>
          ]
        },
        "sort": [
          1737709464709557000
        ]
      },
  <omitted>
]

As soon as I changed it to match_only_text the highlight part does not present in the response.

UPD: a side note - I've noticed that 30_ngram_highligthing.yml and 30_ngram_highligthing_field_match_only_text.yml file names have typo in highlighting word (no typos within those files though). I'm happy to fix that in the scope of the current PR if you don't mind.

Copy link
Contributor

❌ Gradle check result for 9f00822: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Eugene Tolbakov <[email protected]>
Copy link
Contributor

❌ Gradle check result for 432a6bd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Eugene Tolbakov <[email protected]>
Copy link
Contributor

❌ Gradle check result for 71ca068: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Eugene Tolbakov <[email protected]>
Copy link
Contributor

❌ Gradle check result for 6e908cc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@etolbakov
Copy link
Contributor Author

@msfroh
Could you please give me a hand with the build when you have time?

I see the FAILURE: Build failed with an exception. in output
https://build.ci.opensearch.org/job/gradle-check/52931/console
However, could not find any failing tests 🤷‍♂️

@msfroh
Copy link
Contributor

msfroh commented Jan 30, 2025

Buried in the console output, I found the following:

FAILURE: Build failed with an exception.

* What went wrong:
Gradle build daemon has been stopped: since the JVM garbage collector is thrashing

I don't think that's a result of your change, so I'm going to retry the build.

Copy link
Contributor

✅ Gradle check result for 6e908cc: SUCCESS

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.38%. Comparing base (cd149a9) to head (6e908cc).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
...earch/fetch/subphase/highlight/HighlightPhase.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17101      +/-   ##
============================================
- Coverage     72.41%   72.38%   -0.03%     
- Complexity    65626    65738     +112     
============================================
  Files          5306     5319      +13     
  Lines        304927   305703     +776     
  Branches      44257    44352      +95     
============================================
+ Hits         220804   221290     +486     
- Misses        66007    66335     +328     
+ Partials      18116    18078      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msfroh msfroh merged commit 1bf8b9c into opensearch-project:main Jan 30, 2025
29 of 30 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17101-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1bf8b9c3d01417f2877b803ddf8b45432408e662
# Push it to GitHub
git push --set-upstream origin backport/backport-17101-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-17101-to-2.x.

@etolbakov etolbakov deleted the match_only_text-highlighting branch January 30, 2025 20:30
@etolbakov
Copy link
Contributor Author

@msfroh
Michael, thank you for your review and assistance with this PR!
Please let me know if there's anything required for backporting it

@msfroh
Copy link
Contributor

msfroh commented Jan 30, 2025

Thank you for fixing this, @etolbakov.

It looks like the backport had a merge conflict, almost certainly on the CHANGELOG.md file, probably because we just cut the 2.19 branch and removed all of the pending "unreleased 2.x" lines. If you can follow the above manual backport steps ☝️ and cut a backport PR, that would be great.

Otherwise, I'm happy to take care of it. You've done the useful work, so fighting with our branching nonsense is entirely optional. 😀

@etolbakov
Copy link
Contributor Author

Sure, I'll do it.
I will use PR "17210" / "17023" pair (deliberately copied ids instead of links to avoid mentioning) as an example and tag you once ready.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.19
# Create a new branch
git switch --create backport/backport-17101-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1bf8b9c3d01417f2877b803ddf8b45432408e662
# Push it to GitHub
git push --set-upstream origin backport/backport-17101-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-17101-to-2.19.

@msfroh
Copy link
Contributor

msfroh commented Jan 30, 2025

D'oh... it looks like it also doesn't auto-backport to the 2.19 branch. I'll take care of that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.19 backport-failed v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.