Skip to content

Deferred hashing #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Deferred hashing #82

wants to merge 1 commit into from

Conversation

smiklosovic
Copy link
Collaborator

Defer hashing of files until they are really needed to be uploaded, otherwise just skip. This will make backups way faster on frequent backups with large sstables.

@smiklosovic
Copy link
Collaborator Author

it is a little bit more complicated than it seems, the problem is that we are saving hashes in manifest we upload. When we take a backup and we evaluate that file is already there, we do not hash, but then we will upload manifest of a snapshot with no hash either. Then it will be missing when we restore it.

The solution is to have some kind of a "register" where all hashes are present so we will try to get a hash by not computing it but by looking into existing list of hashes.

I am not completely sure about the architecture of that, naive solution would be to go over all existing manifests and try to match file we go to upload with existing one somewhere in some manifest just for the sake of getting a hash of it (because it had to be hashed at least once) but I do not think like this is robust enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant