Optimization - Archive space efficiency #238

tasket · 2025-02-22T20:10:21Z

Notes and exploration of disk space usage by the Wyng archive format

Possible sub-topics:

Impact of dest filesystem metadata (incl. chunk sizes, links and end-tying)
Deduplication & Compression effectiveness
Pruning parameters and possible tweaks
Archive metadata size
etc.

Initial observations:

There is a 3-way tradeoff between the impacts of chunk size, compression and deduplication. The Wyng defaults try to strike a balance for typical use cases. For example, a smaller chunk size allows dedup of more data however this increases the dest filesystem (and internal archive) metadata usage; it also makes compression slightly less effective.

Dedup Anecdote: The default 128KB chunk size can yield great dedup results for distantly-related volumes. A pair of Qubes template root imgs, one basic Debian img and a fancy, large KDE variant which diverged years ago (and upgraded twice) enjoy a 21% dedup savings when the basic/small img is already in the archive and then the large KDE img is added to it – from a test send performed today. The raw on-disk usage of these imgs are 5.4GB and 10.8GB, respectively, which means that a very large portion of the small volume was utilized in the Wyng dedup process. This should be representative and its worth noting that these two volumes have never been internally defragged or otherwise re-packed or re-organized, so they are about as randomly arrayed as one could expect for an Ext4 root fs.

Even without --dedup the incremental send mode functions like a very simple dedup. Interestingly, this form of dedup incurs zero metadata overhead both within the archive and on the dest fs.

Changes in the compressor implementation (such as upgrading the compression library to newer versions) can result in large reduction in dedup effectiveness, since the chunks compressed before being hashed.

The text was updated successfully, but these errors were encountered:

tasket added optimization research labels Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization - Archive space efficiency #238

Optimization - Archive space efficiency #238

tasket commented Feb 22, 2025 •

edited

Loading

Optimization - Archive space efficiency #238

Optimization - Archive space efficiency #238

Comments

tasket commented Feb 22, 2025 • edited Loading

Notes and exploration of disk space usage by the Wyng archive format

Possible sub-topics:

Initial observations:

tasket commented Feb 22, 2025 •

edited

Loading