Clarification of topic configuration 'retention.bytes' #19051
+9
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Kafka docs describes 'retention.bytes' configuration in a misleading way.
I believe that a clarification is required to ease the tuning of this configuration and for preventing severe confusion.
'retention.bytes' is depicted like this:
This configuration controls the maximum size a partition (which consists of log segments) can grow to before we will discard old log segments to free up space...
There are several problems in this description:
because old log segments won't be discarded before this size.
only if this "minimum" size could be guaranteed.
I will give an example for the differences and the misunderstanding:
A topic with only 1 partition, retention.bytes=1GB, and segment.bytes=512MB.
The behavior I EXPECTED: It would reserve about 1 GB of storage.
Once the topic reaches 1GB (or a bit more),
old segments will be deleted as long as the total partition size is BIGGER THAN 1GB,
meaning the size of the partition will be approximately between 512MB-1GB+.
The ACTUAL behavior: It would reserve about 1.5+ GB of storage.
There's a guarantee that our topic size won't be LESS THAN 1 GB,
meaning that when the partition reaches 1GB - segments won't be deleted,
only when it reaches 1.5GB - allowing the deletion of 1 segment of 512MB, so we are left with size of 1GB+.
When the topic scale is bigger, we are talking about a difference of tens to hundred Gigabytes, which can cause storage exception and large confusion.
Therefore, I believe we can clarify that configuration very easily and perhaps prevent future issues.