Skip to content

KAFKA-19367: Fix InitProducerId with TV2 double-increments epoch if ongoing transaction is aborted #19910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

rreddy-22
Copy link
Contributor

When InitProducerId is handled on the transaction coordinator, the producer epoch is incremented (so that we fence stale requests), then if a transaction was ongoing during this time, it's aborted. With transaction version 2 (a.k.a. KIP-890 part 2), abort increments the producer epoch again (it's the part of the new abort / commit protocol), so the epoch ends up incremented twice.

In most cases, this is benign, but in the case where the epoch of the ongoing transaction is 32766, it's incremented to 32767, which is the maximum value for short. Then, when it's incremented for the second time, it goes negative, causing an illegal argument exception.

To fix this we just avoid bumping the epoch a second time.

@github-actions github-actions bot added triage PRs from the community core Kafka Broker small Small PRs labels Jun 6, 2025
@github-actions github-actions bot removed the triage PRs from the community label Jun 6, 2025
if (!clientTransactionVersion.supportsEpochBump()) {
// For TV1, manually bump // the epoch in the transaction metadata we are about to append.
txnMetadata.producerEpoch = producerEpoch
txnMetadata.lastProducerEpoch = RecordBatch.NO_PRODUCER_EPOCH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check whether we need to reset lastProducerEpoch even for TV2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TV2, it looks like the last producer epoch is immediately set in prepareAbortOrCommit, so we don't need to set it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker small Small PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants