Closed
Description
What is the bug?
Receive an error when omitting the model_id
param in a nested neural_sparse query after configuring a default_model_id
for the index. This seems to only be for nested queries, I cannot reproduce it for non-nested queries.
How can one reproduce the bug?
Create an index called my_index
PUT /my_index
{
"settings": {
"index": {"knn": True},
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"default": {
"type": "standard"
}
}
}
},
"mappings": {
"properties": {
"id": {"type": "keyword"},
"chunks": {
"type": "nested",
"properties": {
"chunk_id": {"type": "keyword"},
"chunked_content": {"type": "text"},
"chunked_content_embedding": {"type": "rank_features"},
}
},
}
}
}
Update cluster settings
PUT /_cluster/settings
{
"persistent": {
"plugins": {
"ml_commons": {
"allow_registering_model_via_url": "true",
"only_run_on_ml_node": "false",
"model_access_control_enabled": "true",
"native_memory_threshold": "99"
}
}
}
}
Create the neural sparse model
POST /_plugins/_ml/model_groups/_register
{
"name": "my_model_group",
"description": "Models for search",
}
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "neural-sparse/opensearch-neural-sparse-encoding-v1",
"version": "1.0.1",
"model_group_id": <model_group_id>
"description": "This is a neural sparse encoding model: It transfers text into sparse vector, and then extract nonzero index and value to entry and weights. It serves in both ingestion and search.",
"model_format": "TORCH_SCRIPT",
"function_name": "SPARSE_ENCODING",
"model_content_size_in_bytes": 492184214,
"model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
"url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip",
"created_time": 1696913667239
}
Create the search pipeline with a neural query enricher
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"neural_query_enricher": {
"default_model_id": <model_id>
}
}
]
}
Update the index settings with the default pipeline
PUT /my_index/_settings
{
"index.search.default_pipeline" : "my_pipeline"
}
Search the index
POST /_search
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"nested": {
"path": "chunks",
"score_mode": "max",
"query": {
"bool": {
"must": [
{
"match": {
"chunks.chunked_content": {
"query": "contract"
}
}
},
{
"neural_sparse": {
"chunks.chunked_content_embedding": {
"query_text": "contract" # NO MODEL ID!
}
}
}
]
}
},
"inner_hits": {
"_source": [
"chunks.chunk_id",
"chunks.chunked_content"
]
}
}
}
]
}
},
"score_mode": "sum",
"min_score": 0.0
}
},
"_source": {
"excludes": [
"_index",
"chunks.chunked_content_embedding",
"chunks.chunked_content",
"chunks.chunk_id"
]
},
"size": 100,
"explain": "true"
}
Receive error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "query_text and model_id cannot be null"
}
],
"type": "illegal_argument_exception",
"reason": "query_text and model_id cannot be null"
},
"status": 400
}
What is the expected behavior?
There is no error and the default_model_id
configured on the search pipeline is used to embed the query.
What is your host/environment?
MacOS Sonoma 14.3, Docker 4.28.0, OpenSearch 2.16.0
Do you have any screenshots?
N/A
Do you have any additional context?
Might be due to this conditional not taking the default model id into account