Skip to content

[BUG] Cannot make use of default_model_id in neural_sparse query type #871

Closed
@jdnvn

Description

@jdnvn

What is the bug?

Receive an error when omitting the model_id param in a nested neural_sparse query after configuring a default_model_id for the index. This seems to only be for nested queries, I cannot reproduce it for non-nested queries.

How can one reproduce the bug?

Create an index called my_index

PUT /my_index

{
	"settings": {
		"index": {"knn": True},
		"number_of_shards": 1,
		"number_of_replicas": 1,
		"analysis": {
			"analyzer": {
				"default": {
					"type": "standard"
				}
			}
		}
	},
	"mappings": {
		"properties": {
			"id": {"type": "keyword"},
			"chunks": {
				"type": "nested",
				"properties": {
					"chunk_id": {"type": "keyword"},
					"chunked_content": {"type": "text"},
					"chunked_content_embedding": {"type": "rank_features"},
				}
			},
		}
	}
}

Update cluster settings

PUT /_cluster/settings
{
	"persistent": {
		"plugins": {
			"ml_commons": {
				"allow_registering_model_via_url": "true",
				"only_run_on_ml_node": "false",
				"model_access_control_enabled": "true",
				"native_memory_threshold": "99"
			}
		}
	}
}

Create the neural sparse model

POST /_plugins/_ml/model_groups/_register
{
	"name": "my_model_group",
	"description": "Models for search",
}

POST /_plugins/_ml/models/_register?deploy=true
{
	"name": "neural-sparse/opensearch-neural-sparse-encoding-v1",
	"version": "1.0.1",
	"model_group_id": <model_group_id>
	"description": "This is a neural sparse encoding model: It transfers text into sparse vector, and then extract nonzero index and value to entry and weights. It serves in both ingestion and search.",
	"model_format": "TORCH_SCRIPT",
	"function_name": "SPARSE_ENCODING",
	"model_content_size_in_bytes": 492184214,
	"model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
	"url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip",
	"created_time": 1696913667239
}

Create the search pipeline with a neural query enricher

PUT /_search/pipeline/my_pipeline
{
	"request_processors": [
		{
			"neural_query_enricher": {
				"default_model_id": <model_id>
			}
		}
	]
}

Update the index settings with the default pipeline

PUT /my_index/_settings
{
  "index.search.default_pipeline" : "my_pipeline"
}

Search the index

POST /_search

{
	"query": {
		"function_score": {
			"query": {
				"bool": {
					"should": [
						{
							"nested": {
								"path": "chunks",
								"score_mode": "max",
								"query": {
									"bool": {
										"must": [
											{
												"match": {
													"chunks.chunked_content": {
														"query": "contract"
													}	
												}	
											},
											{
												"neural_sparse": {
													"chunks.chunked_content_embedding": {
														"query_text": "contract" # NO MODEL ID!
													}
												}
											}
										]
									}
								},
								"inner_hits": {
									"_source": [
										"chunks.chunk_id",
										"chunks.chunked_content"
									]
								}
							}
						}
					]
				}
			},
			"score_mode": "sum",
			"min_score": 0.0
		}
	},
	"_source": {
		"excludes": [
			"_index",
			"chunks.chunked_content_embedding",
			"chunks.chunked_content",
			"chunks.chunk_id"
		]
	},
	"size": 100,
	"explain": "true"
}

Receive error:

{

	"error": {
		"root_cause": [
			{
				"type": "illegal_argument_exception",
				"reason": "query_text and model_id cannot be null"
			}
		],
		"type": "illegal_argument_exception",
		"reason": "query_text and model_id cannot be null"
	},
	"status": 400
}

What is the expected behavior?

There is no error and the default_model_id configured on the search pipeline is used to embed the query.

What is your host/environment?

MacOS Sonoma 14.3, Docker 4.28.0, OpenSearch 2.16.0

Do you have any screenshots?

N/A

Do you have any additional context?

Might be due to this conditional not taking the default model id into account

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions