Closed
Description
Describe the bug
Observed panics due to segmentation faults in the ruler.
To Reproduce
Steps to reproduce the behavior:
Run Cortex 1.10.0 & run ruler
Expected behavior
Ruler should not panic
Environment:
- Infrastructure: kubernetes - AKS
- Deployment tool: customized yaml manifests
Storage Engine
- Blocks
- Chunks
Additional Context
We are seeing consistent panics from the ruler, with errors like
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e20df3]
goroutine 14595 [running]:
github.com/cortexproject/cortex/pkg/querier.querier.Select(0x2bdf130, 0xc0047b2820, 0xc0046fc440, 0x2, 0x2, 0x28908e0, 0x2bdeaa0, 0xc00206be60, 0x17ba6ec5d74, 0x17ba7234bf4, ...)
/__w/cortex/cortex/pkg/querier/querier.go:323 +0x193
github.com/cortexproject/cortex/pkg/querier/lazyquery.LazyQuerier.Select.func1(0xc0020e13e0, 0x2be0da0, 0xc000157c00, 0xc004122900, 0x0, 0xc004122900, 0xa, 0x10)
/__w/cortex/cortex/pkg/querier/lazyquery/lazyquery.go:52 +0x72
created by github.com/cortexproject/cortex/pkg/querier/lazyquery.LazyQuerier.Select
/__w/cortex/cortex/pkg/querier/lazyquery/lazyquery.go:51 +0xad
The below is the configuration diff from the defaults, as emitted from the ruler.
Note that I also tried with blocks_storage.bucket_store.index_header_lazy_loading_enabled: false
and experienced the same error.
alertmanager:
enable_api: true
external_url: https://alertmanager.cluster-monitor.*******.com/alertmanager
sharding_enabled: true
sharding_ring:
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-alertmanagers/
store: etcd
alertmanager_storage:
s3:
access_key_id: ******
bucket_name: cortex-alertmanager
endpoint: s3.storage.svc.cluster.local:9000
insecure: true
secret_access_key: '********'
api:
response_compression_enabled: true
blocks_storage:
bucket_store:
bucket_index:
enabled: true
chunks_cache:
backend: memcached
memcached:
addresses: dnssrv+_memcached._tcp.chunks-cache.cluster-monitor-cortex.svc.cluster.local
index_cache:
backend: memcached
memcached:
addresses: dnssrv+_memcached._tcp.index-cache.cluster-monitor-cortex.svc.cluster.local
index_header_lazy_loading_enabled: true
metadata_cache:
backend: memcached
bucket_index_content_ttl: 2m0s
memcached:
addresses: dnssrv+_memcached._tcp.metadata-cache.cluster-monitor-cortex.svc.cluster.local
metafile_doesnt_exist_ttl: 2m0s
tenant_blocks_list_ttl: 2m0s
sync_interval: 5m0s
s3:
access_key_id: *****
bucket_name: cortex
endpoint: s3.storage.svc.cluster.local:9000
insecure: true
secret_access_key: '********'
tsdb:
close_idle_tsdb_timeout: 15m0s
dir: /var/cortex/tsdb
max_exemplars: 1000
compactor:
block_deletion_marks_migration_enabled: false
cleanup_interval: 5m0s
distributor:
ha_tracker:
enable_ha_tracker: true
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-ha-tracker/
store: etcd
ring:
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-collectors/
store: etcd
shard_by_all_labels: true
frontend:
grpc_client_config:
grpc_compression: snappy
log_queries_longer_than: 1s
query_stats_enabled: true
frontend_worker:
frontend_address: query-frontend.cluster-monitor-cortex.svc.cluster.local:9095
grpc_client_config:
grpc_compression: snappy
max_send_msg_size: 33554432
ingester:
lifecycler:
availability_zone: westeurope-2
observe_period: 3s
ring:
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-collectors/
store: etcd
walconfig:
wal_enabled: true
ingester_client:
grpc_client_config:
grpc_compression: snappy
limits:
accept_ha_samples: true
ingestion_burst_size: 75000
ingestion_rate: 55000
max_series_per_metric: 70000
querier:
at_modifier_enabled: true
query_store_for_labels_enabled: true
query_range:
align_queries_with_step: true
cache_results: true
results_cache:
cache:
memcached:
expiration: 12h0m0s
memcached_client:
addresses: dnssrv+_memcached._tcp.index-cache.cluster-monitor-cortex.svc.cluster.local
split_queries_by_interval: 24h0m0s
ruler:
alertmanager_url: http://alertmanager.cluster-monitor-cortex.svc.cluster.local:3100/alertmanager
enable_api: true
enable_sharding: true
external_url: https://alertmanager.cluster-monitor.******.com
ring:
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-rulers/
store: etcd
ruler_client:
grpc_compression: snappy
ruler_storage:
s3:
access_key_id: ********
bucket_name: cortex-ruler
endpoint: s3.storage.svc.cluster.local:9000
insecure: true
secret_access_key: '********'
server:
http_listen_port: 3100
log_level: debug
storage:
engine: blocks
store_gateway:
sharding_enabled: true
sharding_ring:
kvstore:
etcd:
endpoints:
- client.etcd.svc.cluster.local:2379
prefix: cortex-collectors/
store: etcd
zone_awareness_enabled: true
target: ruler
Metadata
Metadata
Assignees
Labels
No labels