-
Notifications
You must be signed in to change notification settings - Fork 59
Difference between COW filesystems and non-COW when using file based dedupe #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CoW is not for dedupe. Rather it allows for snapshots and safe file updates: data is never written inplace, you get either all the new data or none of the new data if a file is updated and the system crashes, the old version of the file is kept intact until all new data has been successfully written. This is different to ext4 which writes data inplace (it has a data journal mode but that cuts write speeds in half). Having something like dedupe is a side effect of CoW: You can make blocks of different files becoming shared in the same physical on-disk block. If one of the files will be updated, a copy of that block is made (copy on write), thus keeping both copies intact. ext2/3/4 dedupe works differently: It creates hardlinks of whole files. Firstly, the granularity is lower: you either dedupe the whole file or nothing. Secondly, both file names of a hardlink essentially reference the same data on disk: modify one file's content and the other file will change, too. This can come at a surprise: hardlinks are not dedupes. CoW file systems just work for dedupe without such surprises or limitations. xfs supports something similar but it isn't really CoW: It can share file blocks for dedupe just like btrfs, and it will unshare these blocks if modified, just like CoW. But it is still a filesystem which changes file blocks inplace like ext4. zfs is also a CoW filesystem which supports snapshots, redundancy, self-healing, pooling and virtual block devices. ReFS in Windows is probably a lot more like zfs or btrfs, it supports CoW, snapshots, redundancy, self-healing, pooling and tiering. NTFS in Windows supports snapshots, and probably some sort of tiering. But it cannot dedupe despite using some simple CoW features for snapshots obviously. Thus, it's not a CoW filesystem despite supporting snapshots. btrfs is somewhere all over that area, too: snapshots, simple self-healing, redundancy (just mirroring currently as the stable option) and pooling. ext4 offers none of such options. So btrfs is a lot more than just a single-device filesystem. There are probably more filesystems that have one or another of such features:
In theory, you can pool any filesystem in Linux through lvm/md (and RAID, too) but this is implemented at a separate layer. zfs und btrfs support native pooling/redundancy without an intermediate layer. |
Maybe I'm thinking of dedupes in a different way then? I understand the file/block level granularity, but if a hard link is not de-duping an identical file, then what is it? If the implementation of de-duping at the file-level in a btrfs system using something like duperemove or fdupes uses something available only to a btrfs filesystem, then if I were to rsync an entire btrfs filesystem (with maximum deduped file data) to an ext4 filesystem, I would expect the resulting ext4 system to be larger since it doesn't support the dedupe features that btrfs does. However, if I had a btrfs filesystem and just ran a de-duping userspace tool that de-duped by creating hard links, I could rsync that btrfs filesystem to an ext4 filesystem and the resulting size would be exactly the same, no? |
It creates a second file name for the same file object.
In your thinking, yes. But rsync is not aware of shared file blocks, it only knows hardlinks. So it would unshare all the files anyway, even if the target is a btrfs. You need to think "files" as two distinctive things: A file consists of its file name, and the file contents. The name is only a pointer. Symlinks and hardlinks can point to the same file data. Btrfs goes a bit further: It supports symlinks and hardlinks the same way as other file systems. But - as with other filesystems - the file data is a list of extents. In reality, a file name points to this list of extents. Btrfs can reference such an extent from multiple files. But since it never overwrites an extent, on any write, it will create a new extent, and swap the point to that extent in the file. So even if two files shared an extent, you now end up with two extents: one in the original file, and one in the modified file.
Yes, if you tell rsync to detect and recreate hardlinks. However, "deduping" via hardlink is usually not what you want: If you modify such a file, it modifies the other hardlinks, too. Think of it like this: Imagine you had a complex project, and wanted to make complicated or experimental modifications. In case something may break, you will create a backup copy first.
With both filesystem, you could create a full backup copy of the project. But if you want to save space later:
|
Yes, that’s exactly what happens. CoW filesystems like Btrfs (and to some extent XFS) can share physical blocks across different files even when their logical block structures are distinct. Tools like rsync operate purely at the file level and don't understand the distinction between logical and physical blocks—so when you copy a Btrfs filesystem with deduplicated blocks to ext4, rsync will write each shared block separately. The result is a larger destination filesystem, because ext4 lacks the underlying block-sharing mechanism. The key difference is in how deduplication is supported. ext4 only supports file-level deduplication via hardlinks—if two files are identical, they can be hardlinked, and that’s it. There's no concept of sharing data at a finer granularity. Filesystems like XFS (with reflink) and Btrfs allow block-level deduplication, where logically distinct files or blocks can reference the same underlying data without needing to be identical across the entire file. This also affects tool behavior. Tools like fdupes or fclones work on any filesystem that supports hardlinks because they operate at the file level—they only look for whole-file duplicates. Their behavior is mostly independent of the filesystem in use. In contrast, duperemove and bees operate at the block level. They can find and deduplicate partial matches between files—something that’s only meaningful on filesystems that support block-level sharing. fclones in clone mode can replace an entire duplicate file with reflinks to another, but it still works at the file granularity. What bees brings to the table is continuous, incremental, block-level deduplication at the lowest layers of the filesystem. It’s not the same as running duperemove in a cron job—it’s closer in spirit to ZFS’s approach: you write your data and forget about it, and deduplication happens automatically in the background. The implementation is completely different, but the user experience of “just write the data and let the system optimize it later” is quite similar. |
Thanks for the explanations, both of you. So for the casual user of a filesystem like me who doesn't care about the more "enterprise" features like snapshotting, I was only interested in the de-dupe features with an expectation in storage savings for things like rsync'ed backups, it seems like at least how I was expecting it to work is definitely not going to give me the space savings I was expecting. Not to diminish the features of btrfs or say that it's overblown, but it does seem like some of that might be more marketing speak with respect to dedupe support. I get the inband/outband dedupe support, but to claim that btrfs "supports" deduping with external tools like bees is kind of a given. Ext4 also supports deduping with the support of external tools like With respect to how btrfs implements the block level dedupe, I assume that the filesystem simply provides more granular metadata that userspace utils can then utilize. Would that mean then that at some point in the future that tools like rsync could implement features to support an approximation of "in-band" dedupe support? Something like |
That would be a misleading statement. "Deduplication" as a term of art generally refers to a filesystem feature that separates logical and physical storage layers—so that modifying one logical copy of shared data doesn't affect the others. Hardlinks, by contrast, are a legacy of the original Unix design (dating back over 50 years), where a file could have multiple names but only one physical instance. Writing to any hardlink modifies the shared file. Hardlink-based deduplication tools have existed since the late 1980s, but they typically rely on users keeping files read-only afterward to avoid unintended data corruption. There's no built-in mechanism to preserve the independence of logically identical files. By contrast, Btrfs supports true deduplication through three core mechanisms:
Component (1) is what Btrfs, XFS (with reflink), ZFS, and bcachefs have. ext4 does not.
That’s certainly possible, and there have been efforts to add support for It’s also worth distinguishing rsync-then-bees from rsync-with-reflink:
By comparison, |
Thank you so much for the in-depth explanation, I appreciate it. I have one more question/though experiment related to another aspect of deduping. Let's say I have 1TB of data that I want to backup to a 3TB btrfs partition nightly to a directory named from the current timestamp (so essentially every night I would In the current state of btrfs/bees, I would literally be transferring 1TB of data every single night, and then bees would consolidate those duplicated blocks to conserve storage space, but even still, my nightly transfer and disk write activity is going to be the full 1TB. So essentially, my destination btrfs partition would need always need to be as large as the total de-duped data set, because I have to write all of the data first before it can be deduped? The alternative is using rsync-with-reflink or some other yet to be created userspace tool which could essentially dedupe at write time (but still only at the file, not block, level) which would mean I could theoretically backup 5 nights (5 TB) of data to a 3TB drive (assuming no data changes enough to actually warrant needing 5 full TB to represent the single 1TB data). I also assume that this is essentially what zfs provides through in-band dedupe (whose system requirements are beyond reasonable for my simple scenarios) |
I handle this by using btrfs snapshots to avoid extra copies: # one-time setup:
btrfs sub create staging
# repeatable backup step:
rsync -aHS --del src staging && btrfs sub snap -r staging $(date)
For added safety during unreliable transfers: if rsync -aHS --del src staging; then
btrfs sub snap -r staging $(date)-good
else
btrfs sub snap -r staging $(date)-bad
fi This way, you retain both successful and failed transfers, clearly labeled—useful if something goes wrong and you need to inspect partial results. If your source is ext4, xfs, or any filesystem without data checksums, you can add One nice property here is that |
I did something similar but my staging area was called scratch... ;-) About using rsync together with bees: I can confirm that bees is extremely fast in keeping pace with rsync. If using a scratch area with rsync, it may actually be better to use |
This reminds me to add a disclaimer, especially for anybody who lands here from a search: Please don't cut+paste the commands as I have written them-- |
No. Snapshotting isn't an enterprise feature. It's a feature that helps a lot of ordinary users to recover data from failed upgrades, mistyped deletions, unexpected command results, etc.
No. Thinking hard links as deduping is harmful and may cause you data loss---unless your filesystem is squashfs or iso9660. What do you expect when you modify one of the hard links? If you expect all of them to update, some programs update a file by writing a new file and replacing the old file via |
Indeed. Snapshotting in btrfs is merely a faster way to write |
Can you offer a short summary of the implementation differences when using a file-based dedupe tool like jdupe/duperemove on a COW filesystem vs something like ext4?
When I first came to to btrfs I was expecting inband dedupe. So when I realized it was out of band, I couldn't understand what benefits a COW fs offered if you still have a use a userspace utility to link/delete dupes after the fact. After all I was using fdupes on standard ext2/3/4 filesystems with essentially the same results.
So what advantages (at least specifically for dedupe/COW) does btrfs provide over legacy filesystems?
The text was updated successfully, but these errors were encountered: