-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Implement randomizationFactor in ExponentialBackOffWithMaxRetries #3849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@JoosJuliet This sounds like a generic matter and nothing specific to Spring Kafka. I wonder if we can discuss these types of things in the context of the Spring Retry project. cc @artembilan for his insights. |
The There is one is Spring Retry, though:
I think the idea of a jitter is OK, we indeed can implement it here as you suggest, but let's see if moving it up to Spring Framework would be much better for a broader number of users. |
I went ahead and created the corresponding issue in the Spring Framework repo here: spring-projects/spring-framework#34773. Let me know if there's anything I can do to help drive it forward or make it easier to review. |
Thanks. |
Based on the direction in spring-framework#22009, it seems that Spring Framework is not planning to include randomization logic directly, and instead recommends using spring-retry for such cases. That said, since spring-kafka already provides its own support for maxAttempts, perhaps it might be worth considering a similar built-in support for randomization—mainly for consistency within the project. |
Left comment on that issue. |
hello
Expected Behavior
The
ExponentialBackOffWithMaxRetries
class should allow for randomized backoff intervals to prevent simultaneous retries from multiple instances, which can overload servers. This randomness should be adjustable via arandomizationFactor
to provide flexibility in how the backoff intervals are calculated.Current Behavior
Currently, the
ExponentialBackOffWithMaxRetries
class calculates backoff intervals based solely on fixed exponential factors without any randomness. This can lead to predictable and synchronized retries in distributed systems, potentially causing spikes in load and collision risks.Context
To enhance the stability and fairness of our Kafka message processing system, we need to address a key challenge: consumer pod starvation caused by synchronized retries when using RecoveringBatchErrorHandler. The predictable nature of the current retry intervals is the root cause of this synchronization, leading to uneven load distribution.
The core solution is introducing a randomizationFactor within the ExponentialBackOffWithMaxRetries class. This element of randomness adds necessary jitter, fine-tuning each pod's retry interval to prevent simultaneous attempts and distribute them over time.
Consequently, this enhancement will effectively prevent pod starvation and ensure message consumption opportunities are distributed more evenly across all nodes. This stabilizes system load and significantly improves the resilience of our Kafka processing architecture.
Proposed Code Changes
Here is the proposed enhancement to the
ExponentialBackOffWithMaxRetries
classThe text was updated successfully, but these errors were encountered: