Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of topic configuration 'retention.bytes' #19051

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

282ori
Copy link

@282ori 282ori commented Feb 27, 2025

The Kafka docs describes 'retention.bytes' configuration in a misleading way.
I believe that a clarification is required to ease the tuning of this configuration and for preventing severe confusion.

'retention.bytes' is depicted like this:
This configuration controls the maximum size a partition (which consists of log segments) can grow to before we will discard old log segments to free up space...

There are several problems in this description:

  1. It seems like it's suppose to be the minimum size of a partition,
    because old log segments won't be discarded before this size.
  2. Even when the partition size reaches this size, old segments won't be automatically deleted like we would expect,
    only if this "minimum" size could be guaranteed.

I will give an example for the differences and the misunderstanding:
A topic with only 1 partition, retention.bytes=1GB, and segment.bytes=512MB.

The behavior I EXPECTED: It would reserve about 1 GB of storage.
Once the topic reaches 1GB (or a bit more),
old segments will be deleted as long as the total partition size is BIGGER THAN 1GB,
meaning the size of the partition will be approximately between 512MB-1GB+.

The ACTUAL behavior: It would reserve about 1.5+ GB of storage.
There's a guarantee that our topic size won't be LESS THAN 1 GB,
meaning that when the partition reaches 1GB - segments won't be deleted,
only when it reaches 1.5GB - allowing the deletion of 1 segment of 512MB, so we are left with size of 1GB+.

When the topic scale is bigger, we are talking about a difference of tens to hundred Gigabytes, which can cause storage exception and large confusion.

Therefore, I believe we can clarify that configuration very easily and perhaps prevent future issues.

@github-actions github-actions bot added triage PRs from the community storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature clients small Small PRs labels Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clients small Small PRs storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature triage PRs from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant