Skip to content

Add metrics for tracking total disconnected time and reconnection attempts #3220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ggivo
Copy link
Contributor

@ggivo ggivo commented Mar 18, 2025

Description:
Introduces two new metrics to track the total time a connection remains disconnected until it is successfully reconnected and the number of reconnection attempts. The changes include:

  • New Metrics:
    • lettuce.reconnection.inactive.duration
      • Description: Measures the time taken for a successful reconnection after a disconnection.
      • Type: Timer
    • lettuce.reconnection.attempts
      • Description: Tracks the number of reconnection attempts made during a disconnection.
      • Implementation: Counter

Impact:

  • Provides better insights into connection stability and reconnection performance.

@ggivo ggivo force-pushed the lettuce-observability branch from d41a0f7 to e3fc47d Compare March 18, 2025 09:00
@ggivo ggivo marked this pull request as draft March 20, 2025 05:22
@ggivo
Copy link
Contributor Author

ggivo commented Mar 20, 2025

Considering adding a few more metrics
e.g

  • endpoint.command.queue - Gauge - tracks size of inflight commands (commands written to netty queue but not yet completed)
  • endpoint.command.buffer. - Gauge - tracks size of buffered commands (when autoflush=false)
  • endpoint.disconnected.buffer - Gauge - tracks size of buffered commands during disconnect

Those are implementation-specific and relevant to DefaultEndpoint.

This raises some open questions:
As of now, we have CommandLatencyRecorder responsible for gathering command latency metrics,

Do we continue with separate Metrics recorders, for example, one for ConnectionMonitoring (used by ConnectionWatchdog to track inactive connection time) and another for DefaultEndpoint (tracking the size of internal queues)... or have a single MetricsRecorder for both (ConnectionMonitoring, DefaultEndpoint)?

Do we want to enable/disable only connection-related, and endpoint-related metrics separately?

@tishun any opinion

@ggivo ggivo force-pushed the lettuce-observability branch 3 times, most recently from 8914047 to 37492c4 Compare March 25, 2025 18:10
@ggivo ggivo force-pushed the lettuce-observability branch from b672564 to bdb66b4 Compare April 14, 2025 14:44
@ggivo ggivo force-pushed the lettuce-observability branch 2 times, most recently from f4b049f to baa5183 Compare May 12, 2025 16:32
tishun and others added 21 commits May 19, 2025 14:34
Seems like AttributeMap.attr is not accurate and actually return's  null causing some unit test failures.
This commit adds the ability to listen for MIGRATING and MIGRATED messages
and trigger extended command expiry timeouts during Redis shard migration.

Key changes:
- Enhanced RebindAwareConnectionWatchdog to detect MIGRATING/MIGRATED messages
- RebindAwareExpiryWriter to trigger timeout relaxation whenever MIGRATING message is received

This feature allows commands to have relaxed timeouts during shard migration
operations, preventing unnecessary timeouts when Redis is temporarily busy
with migration tasks.
@ggivo ggivo force-pushed the lettuce-observability branch from 8ea2520 to 8adf0ad Compare June 12, 2025 16:55
@ggivo ggivo force-pushed the lettuce-observability branch from 5153ac3 to 392c406 Compare June 16, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants