Skip to content

[BUG] Infinite loop of S3 GET/LIST requests during segment replication recovery after node loss/restart #18605

Open
@fullykubed

Description

@fullykubed

Describe the bug

After S3 remote-backed storage was fixed in v3, we enabled it with segment replication. After doing so, we noticed our OpenSearch cluster began issuing millions of S3 ListObjectsV2 and GetObject requests per hour to the configured S3 bucket.

This behavior appears pathological and triggered only after the shutdown of one OpenSearch node, leading to:

  • Persistent get and list requests to the /<index_uuid>/0/segments/metadata/metadata prefixes across all indices
  • Sustained high CPU usage on one surviving node (likely due to the looping requests)

This causes a HUGE spike in cluster costs ($1,000+ / month) even for our very small dataset (<100MB) since over the course of a month this has generated 200M requests to S3 (and we got fairly lucky since this fortunately wasn't occurring all the time ):

Image

In our debugging, here is what we have observed:

  • This behavior is not observed during normal operations, only when a node is lost.
  • It doesn't appear to matter whether the lost node is a leader or follower.
  • In a three-node cluster, will only impacts one of the remaining two nodes.
  • It doesn't occur all the time (~25% incident rate for us).
  • The issue cannot be fixed without deleting the malfunctioning node.
  • Oddly, the cluster itself appears to be functioning fine even while this is occurring -- even for indices with shards on the degenerate node.

Related component

Storage:Remote

To Reproduce

  1. Deploy a 3-node opensearch cluster on kubernetes using these opensearch.yml settings:
"cluster.allocator.existing_shards_allocator.batch_enabled": true
"cluster.indices.replication.strategy": "SEGMENT"
"cluster.initial_cluster_manager_nodes":
  - "opensearch-7d0b-0"
  - "opensearch-7d0b-1"
  - "opensearch-7d0b-2"
"cluster.name": "opensearch-7d0b"
"cluster.remote_store.publication.enabled": true
"cluster.remote_store.routing_table.path.prefix": "routing/"
"cluster.remote_store.state.enabled": true
"cluster.remote_store.state.path.prefix": "state/"
"cluster.routing.allocation.balance.prefer_primary": true
"cluster.routing.allocation.disk.watermark.high": "90%"
"cluster.routing.allocation.shard_movement_strategy": "PRIMARY_FIRST"
"discovery.seed_hosts":
  - "opensearch-7d0b-headless"
"indices.recovery.max_bytes_per_sec": "0mb"
"logger._root": "WARN"
"logger.org.opensearch.alerting.util.destinationmigration.DestinationMigrationCoordinator": "WARN"
"network.host": "0.0.0.0"
"node.attr.remote_store.repository.s3.settings.bucket": "default-opensearch-7d0b-storage-15cf133175f73b63"
"node.attr.remote_store.repository.s3.settings.region": "us-east-2"
"node.attr.remote_store.repository.s3.type": "s3"
"node.attr.remote_store.routing_table.repository": "s3"
"node.attr.remote_store.segment.repository": "s3"
"node.attr.remote_store.state.repository": "s3"
"node.attr.remote_store.translog.repository": "s3"
"node.roles":
  - "cluster_manager"
  - "ingest"
  - "data"
  - "remote_cluster_client"
"s3.client.default.endpoint": "s3.us-east-2.amazonaws.com"
"s3.client.default.region": "us-east-2"
"segrep.pressure.enabled": true
  1. Launch each node with the following bash script:
#!/usr/bin/env bash

set -euo pipefail
./bin/opensearch-plugin install -b repository-s3

echo "$AWS_ACCESS_KEY_ID" | ./bin/opensearch-keystore add --stdin --force s3.client.default.access_key
echo "$AWS_SECRET_ACCESS_KEY" | ./bin/opensearch-keystore add --stdin --force s3.client.default.secret_key

# Set JVM heap size dynamically to 50% of container memory request
HEAP_SIZE=$((CONTAINER_MEMORY_REQUEST / 2000000))
export OPENSEACH_JAVA_OPTS="-Xmx${HEAP_SIZE}M -Xms${HEAP_SIZE}M --enable-native-access=ALL-UNNAMED"

./bin/opensearch \
  -Enode.name="$POD_NAME" \
  -Eplugins.query.datasources.encryption.masterkey="$OPENSEARCH_MASTER_KEY"
  1. Deploy the demo data that comes with the OpenSearch Dashboard

  2. Restart a node (doesn't occur every restart -- looks to occur 25% of the time for us).

Expected behavior

The infinite loop of requests should not occur.

Additional Details

Plugins

  • repository-s3

Screenshots

Image

Host/Environment (please complete the following information):

  • Environment: AWS EKS 1.30
  • OS: Bottlerocket
  • Version: 3.0.0

Additional context

Count of the list requests that occurred in a single minute based on the CloudTrail events logs:

   1220 1vO1bkUERFWvJp2p9vNo9w/0/segments/metadata/metadata
   1195 eFMZ2857RDKzqWOKoEhjmg/0/segments/metadata/metadata
   1166 LZcXfsUHQoOfZ-VAe82Smw/0/segments/metadata/metadata
   1142 UZA4K5ruT-Opx6EmbOKbvg/0/segments/metadata/metadata
   1138 TaxiBYmZS4ypJ2DrGdxZEw/0/segments/metadata/metadata
   1105 NlAaRqFzRku2H2A7ABCYlA/0/segments/metadata/metadata
   1088 EkPLvxHwTZOWIWkS6_UgEA/0/segments/metadata/metadata
   1077 tJPUeD9ESA6X1XsPZxUxcg/0/segments/metadata/metadata
   1068 04mQ2jgvSMejOlTaUB9k6Q/0/segments/metadata/metadata
   1050 OJMqg9FsSy2m76o3vxZvVQ/0/segments/metadata/metadata

I also capture all the DEBUG logs across the entire cluster during a 5-minute segment when this issue occurred. Due to the insane amount of requests, the log file is quite large (2GB): log file.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions