Open
Description
What is the bug?
PPL join fails with circuit breaker exception even after adjusting circuit breaker settings. The join operation appears to be consuming excessive memory when trying to join large datasets on timestamp fields.
How can one reproduce the bug?
Steps to reproduce the behavior:
Enable calcite plugin:
curl -XPUT "http://localhost:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d'{
"persistent": {
"plugins.calcite.enabled": true
}
}'
Increase circuit breaker limits to maximum:
curl -XPUT "http://localhost:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d'{
"persistent": {
"indices.breaker.fielddata.limit": "95%",
"indices.breaker.total.limit": "95%",
"indices.breaker.request.limit": "90%",
"indices.breaker.fielddata.overhead": "1.03",
"indices.breaker.request.overhead": "1.0"
}
}'
Execute a simple JOIN query:
curl -XPOST "http://localhost:9200/_plugins/_ppl/" \
-H "Content-Type: application/json" \
-d'{
"query": "source = big5 | left join on @timestamp = @timestamp [source = big5 | where `event.id` = '\''ERROR'\'' | stats count() by span(@timestamp, 1h)]"
}'
Observe the circuit breaker error:
{
"error": {
"reason": "Error occurred in OpenSearch engine: all shards failed",
"details": "Shard[0]: OpenSearchException[java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [859053756/819.2mb], which is larger than the limit of [858993459/819.1mb]]]",
"type": "SearchPhaseExecutionException"
},
"status": 500
}
What is the expected behavior?
- Successfully execute join without hitting circuit breaker limits
- Handle large datasets efficiently
- Provide memory-efficient execution of joins
What is your host/environment?
- OS: Linux
- Version: 3.1
- Plugins
Do you have any additional context?
Using standard big5 index from OSB.
curl localhost:9200/_cat/indices
green open big5 Ta16685cTqeehNeEZ86wmw 1 0 116000000 0 25.8gb 25.8gb
Metadata
Metadata
Assignees
Type
Projects
Status
In progress