You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notes and exploration of disk space usage by the Wyng archive format
Possible sub-topics:
Impact of dest filesystem metadata (incl. chunk sizes, links and end-tying)
Deduplication & Compression effectiveness
Pruning parameters and possible tweaks
Archive metadata size
etc.
Initial observations:
There is a 3-way tradeoff between the impacts of chunk size, compression and deduplication. The Wyng defaults try to strike a balance for typical use cases. For example, a smaller chunk size allows dedup of more data however this increases the dest filesystem (and internal archive) metadata usage; it also makes compression slightly less effective.
Dedup Anecdote: The default 128KB chunk size can yield great dedup results for distantly-related volumes. A pair of Qubes template root imgs, one basic Debian img and a fancy, large KDE variant which diverged years ago (and upgraded twice) enjoy a 21% dedup savings when the basic/small img is already in the archive and then the large KDE img is added to it – from a test send performed today. The raw on-disk usage of these imgs are 5.4GB and 10.8GB, respectively, which means that a very large portion of the small volume was utilized in the Wyng dedup process. This should be representative and its worth noting that these two volumes have never been internally defragged or otherwise re-packed or re-organized, so they are about as randomly arrayed as one could expect for an Ext4 root fs.
Even without --dedup the incremental send mode functions like a very simple dedup. Interestingly, this form of dedup incurs zero metadata overhead both within the archive and on the dest fs.
Changes in the compressor implementation (such as upgrading the compression library to newer versions) can result in large reduction in dedup effectiveness, since the chunks compressed before being hashed.
The text was updated successfully, but these errors were encountered:
Notes and exploration of disk space usage by the Wyng archive format
Possible sub-topics:
Initial observations:
There is a 3-way tradeoff between the impacts of chunk size, compression and deduplication. The Wyng defaults try to strike a balance for typical use cases. For example, a smaller chunk size allows dedup of more data however this increases the dest filesystem (and internal archive) metadata usage; it also makes compression slightly less effective.
Dedup Anecdote: The default 128KB chunk size can yield great dedup results for distantly-related volumes. A pair of Qubes template root imgs, one basic Debian img and a fancy, large KDE variant which diverged years ago (and upgraded twice) enjoy a 21% dedup savings when the basic/small img is already in the archive and then the large KDE img is added to it – from a test
send
performed today. The raw on-disk usage of these imgs are 5.4GB and 10.8GB, respectively, which means that a very large portion of the small volume was utilized in the Wyng dedup process. This should be representative and its worth noting that these two volumes have never been internally defragged or otherwise re-packed or re-organized, so they are about as randomly arrayed as one could expect for an Ext4 root fs.Even without
--dedup
the incremental send mode functions like a very simple dedup. Interestingly, this form of dedup incurs zero metadata overhead both within the archive and on the dest fs.Changes in the compressor implementation (such as upgrading the compression library to newer versions) can result in large reduction in dedup effectiveness, since the chunks compressed before being hashed.
The text was updated successfully, but these errors were encountered: