Skip to content

Implement randomizationFactor in ExponentialBackOffWithMaxRetries #3849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JoosJuliet opened this issue Apr 17, 2025 · 6 comments
Open

Implement randomizationFactor in ExponentialBackOffWithMaxRetries #3849

JoosJuliet opened this issue Apr 17, 2025 · 6 comments

Comments

@JoosJuliet
Copy link

JoosJuliet commented Apr 17, 2025

hello

Expected Behavior

The ExponentialBackOffWithMaxRetries class should allow for randomized backoff intervals to prevent simultaneous retries from multiple instances, which can overload servers. This randomness should be adjustable via a randomizationFactor to provide flexibility in how the backoff intervals are calculated.

Current Behavior

Currently, the ExponentialBackOffWithMaxRetries class calculates backoff intervals based solely on fixed exponential factors without any randomness. This can lead to predictable and synchronized retries in distributed systems, potentially causing spikes in load and collision risks.

Context

To enhance the stability and fairness of our Kafka message processing system, we need to address a key challenge: consumer pod starvation caused by synchronized retries when using RecoveringBatchErrorHandler. The predictable nature of the current retry intervals is the root cause of this synchronization, leading to uneven load distribution.

The core solution is introducing a randomizationFactor within the ExponentialBackOffWithMaxRetries class. This element of randomness adds necessary jitter, fine-tuning each pod's retry interval to prevent simultaneous attempts and distribute them over time.

Consequently, this enhancement will effectively prevent pod starvation and ensure message consumption opportunities are distributed more evenly across all nodes. This stabilizes system load and significantly improves the resilience of our Kafka processing architecture.

Proposed Code Changes

Here is the proposed enhancement to the ExponentialBackOffWithMaxRetries class

public class ExponentialBackOffWithMaxRetries extends ExponentialBackOff {

    private final int maxRetries;
    private double randomizationFactor = 0.0; // Default to no randomization for backward compatibility

    public ExponentialBackOffWithMaxRetries(int maxRetries) {
        this.maxRetries = maxRetries;
        calculateMaxElapsed();
    }

    public void setRandomizationFactor(double randomizationFactor) {
        if (randomizationFactor < 0 || randomizationFactor > 1) {
            throw new IllegalArgumentException("Randomization factor must be between 0 and 1");
        }
        this.randomizationFactor = randomizationFactor;
        calculateMaxElapsed();
    }

    private void calculateMaxElapsed() {
        long maxInterval = getMaxInterval();
        long maxElapsed = Math.min(getInitialInterval(), maxInterval);
        long current = maxElapsed;
        for (int i = 1; i < this.maxRetries; i++) {
            long next = Math.min((long) (current * getMultiplier()), maxInterval);
            current = applyRandomization(next);
            maxElapsed += current;
        }
        super.setMaxElapsedTime(maxElapsed);
    }

    private long applyRandomization(long interval) {
        double randomMultiplier = (1 - randomizationFactor) + Math.random() * 2 * randomizationFactor;
        return (long) (interval * randomMultiplier);
    }
}
@sobychacko
Copy link
Contributor

@JoosJuliet This sounds like a generic matter and nothing specific to Spring Kafka. I wonder if we can discuss these types of things in the context of the Spring Retry project. cc @artembilan for his insights.

@artembilan
Copy link
Member

The ExponentialBackOffWithMaxRetries class is based on the ExponentialBackOff from Spring Framework - nothing to do with Spring Retry.
You can inject your own implementation whenever we ask for a BackOff contract.

There is one is Spring Retry, though:

/**
 * Implementation of {@link org.springframework.retry.backoff.ExponentialBackOffPolicy}
 * that chooses a random multiple of the interval that would come from a simple
 * deterministic exponential. The random multiple is uniformly distributed between 1 and
 * the deterministic multiplier (so in practice the interval is somewhere between the next
 * and next but one intervals in the deterministic case). This is often referred to as
 * jitter.
 *
 * This has shown to at least be useful in testing scenarios where excessive contention is
 * generated by the test needing many retries. In test, usually threads are started at the
 * same time, and thus stomp together onto the next interval. Using this
 * {@link BackOffPolicy} can help avoid that scenario.
 *
 * Example: initialInterval = 50 multiplier = 2.0 maxInterval = 3000 numRetries = 5
 *
 * {@link ExponentialBackOffPolicy} yields: [50, 100, 200, 400, 800]
 *
 * {@link ExponentialRandomBackOffPolicy} may yield [76, 151, 304, 580, 901] or [53, 190,
 * 267, 451, 815] (random distributed values within the ranges of [50-100, 100-200,
 * 200-400, 400-800, 800-1600])
 *
 * @author Jon Travis
 * @author Dave Syer
 * @author Chase Diem
 */
@SuppressWarnings("serial")
public class ExponentialRandomBackOffPolicy extends ExponentialBackOffPolicy {

I think the idea of a jitter is OK, we indeed can implement it here as you suggest, but let's see if moving it up to Spring Framework would be much better for a broader number of users.

@JoosJuliet
Copy link
Author

I went ahead and created the corresponding issue in the Spring Framework repo here: spring-projects/spring-framework#34773.

Let me know if there's anything I can do to help drive it forward or make it easier to review.

@artembilan
Copy link
Member

Thanks.
Subscribed.
We can decide what to do here after conclusion on that issue.

@JoosJuliet
Copy link
Author

Based on the direction in spring-framework#22009, it seems that Spring Framework is not planning to include randomization logic directly, and instead recommends using spring-retry for such cases.

That said, since spring-kafka already provides its own support for maxAttempts, perhaps it might be worth considering a similar built-in support for randomization—mainly for consistency within the project.

@artembilan
Copy link
Member

Left comment on that issue.
Until decision is made there there is no reason to rush here for a possible duplication of API and work at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants