Skip to content

Doc: Hint implications on use write.data.path and orphan removal together #12890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dramaticlly
Copy link
Contributor

@dramaticlly dramaticlly commented Apr 24, 2025

Today orphan removal does not respect write.data.path and I believe we shall let customer be ware of potential implications when used together. Karu did attempt to fix in #12278 but we shall at least call out the risk, this is particular important for GDPR compliance and privacy related data needs

See local build

image

@github-actions github-actions bot added the docs label Apr 24, 2025
@Fokko
Copy link
Contributor

Fokko commented Apr 24, 2025

I would never recommend setting the write.data.path outside of the location path. I'm not sure if we would go down a path where we give hints on how to fix things in a way that aren't recommended. I think this is a slippery slope that we don't want to go down, for example, the hint does not include write.metadata.path, which could also lead to orphaned files.

@dramaticlly
Copy link
Contributor Author

I would never recommend setting the write.data.path outside of the location path. I'm not sure if we would go down a path where we give hints on how to fix things in a way that aren't recommended. I think this is a slippery slope that we don't want to go down, for example, the hint does not include write.metadata.path, which could also lead to orphaned files.

Thanks @Fokko, I think that's the better way to structure this hint. We kind of get the same message from AWS S3 folks, where write.data.path and write.metadata.path allows for customization, but generally recommended to stay within table location. Do you think it would be helpful to highlight such recommendation somewhere in the documentation? I am thinking about on the configuration page https://iceberg.apache.org/docs/nightly/configuration/

@Fokko
Copy link
Contributor

Fokko commented Apr 24, 2025

Thanks for the context @dramaticlly. After thinking about a bit more, I think it makes more sense to put this in the orphan files cleanup routine? Including a warning that it will delete files that are not associated with the table. My main concern was that people would use orphan file cleanup at shared locations, where it might delete data that should not be deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants