-
Notifications
You must be signed in to change notification settings - Fork 20
Feature request: Support BTRFS and XFS Reflink source volumes #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The two interesting parts about this are the suggestion that Thin LVM is less reliable than Btrfs (this might be accurate), and the point about providing authentication (which might not be accurate). I could make a point about perceived efficiency and speed for Thin LVM vs Btrfs, the main one being that no one ever seems to actually compare them with benchmarks, not even @michaellarabel. My experience says that Btrfs would lag behind Thin LVM in overall use, but that is just my impression. I also saw a tendency for Btrfs to "blow up" where metadata use would suddenly skyrocket when reflinking large image files in combination with snapshotting the parent (sub)volume; this was with the late 3.x kernels so ymmv. Its worth noting WRT the future of Linux storage, Red Hat appears to actively dislike both Thin LVM and Btrfs and they are reported to be building a successor flexible storage system called Stratis Since interest in backups on Qubes (at least incremental backups) is not high, a change to using Btrfs as the Qubes default would not impact Wyng greatly. But also, adding Btrfs support to Wyng should not be a huge undertaking if people want it. |
A quick note about Stratis... It appears to be a configuration management system for "storage pools", where a pool is an XFS filesystem spanning one or more block devices. XFS is used in reflink mode to manage disk image files and "snapshots" containing online shrink-capable filesystems. Red Hat claims to be doing this bc Btrfs code tree was supposedly not maintainable for enterprise environments. The only tangible benefit I'd expect is a performance advantage over Btrfs (it would be interesting to compare Xfs and Btrfs for hosting large reflinked disk image files). |
@tasket @tlaurion Would you be willing to comment on QubesOS/qubes-issues#6476? That is a mere proposal, not a final decision, and commentary (including by those who are not QubesOS users!) would be greatly appreciated. I am no expert whatsoever on the Linux storage stack. |
I am still going to wait for detailed benchmark comparisons before supporting this. As it stands now, the general wisdom and experience is that Btrfs can be slow, and large disk image files with snapshots is exactly its worst performance case. Even ZFS created a special mode (ZVOLs) to handle disk images efficiently. I would wager that the best way to wring performance from Btrfs with disk image snapshots is to flag them nodatacow and add them to separate subvolumes, instead of using reflinks. If that's the case, it would mean a) Qubes getting a refactored Btrfs driver, b) quite different coding details when adding Btrfs to Wyng. |
Snapshots automatically turn CoW back on, so nodatacow will not help. |
IIRC nodatacow can be set for individual disk image files that are sitting in a subvolume. So the files only experience a data CoW-like event after a subvol snapshot, not on a second-by-second basis whenever any data is written. |
In Qubes OS, all persistent volumes have at least one snapshot, by default. So the only difference would be second and further writes to the same extent after qube startup. |
Stratis uses device-mapper thin volumes (without LVM) to store its XFS filesystems. |
Yes, so the difference in performance should be somewhere between the cases shown in these benchmarks. We still need benchmarks that are performed in a Qubes environment. In relation to Wyng, Stratis mapping should be very similar since the current thin-pool method is to ask LVM what the dm-thin device ID is, then use the dm-thin tools on that device. |
get_reflink_deltas() and update_delta_digest_reflink()
Work has begun on Btrfs reflink volume support. The algorithms needed to obtain metadata and find differences between two snapshots were added, however at present the code needed to recognize and snapshot reflink vols still needs to be written to make this usable. A side-effect of the approach I took (using simple FIEMAP tables obtained via |
Remove or mark unconverted lvm code issue #75
To continue a line of thought from code comments: Its worth noting that file extent maps have 4KB blocks, which is an order of magnitude more detail than the most detailed thin lvm map with 64KB chunks. So 'do it in Python' is a big maybe here, as even Python libs tend to fall down on either speed or memory requirements. Using Linux commands to pre-process the maps gives me delta lists (to use in Python) that are much smaller than the input maps, and they're fast and work on data streams instead of in memory. Python's difflib does look interesting, though. I would love to see an alternate implementation using that or something similar to see how it performs. Right now the Wyng alpha work in progress is balancing different qualities like low dependency count, CPU portability (as in use I'd also like to note that our systems are based on the same Linux commands that I'm invoking from Wyng, and I'm being pretty conservative in my choices. I would consider custom re-implemention of those commands' functions or replacement with 3rd-party libs to be as much or more of a security risk. |
The Linux FIEMAP ioctl output doesn't carry block device numbers, Edit: On further inspection, Btrfs may be synthesizing its own singular address space to account for multiple devices. So we are seeing the numbers from Btrfs' internal raid. If this is true, then the resulting FIEMAP data may be good enough to reliably show where reflinked files have the same blocks. Edit 2: The issue/solution is explained in a Linux bugzilla record. |
I've added close checking of the column layout to the Also checked the The next hurdle will be getting Wyng to recognize & access regular files as logical volumes. At that point, this feature will be ready to test. |
OK, so over in filefrag land, a prominent Linux dev doesn't want me to use filefrag with Btrfs because:
Egads. The FIEMAP describes the data composition of the file. But he is implying the ioctl strips something important from FIEMAP data (it doesn't because Btrfs virtual addresses encompass multiple devices). Plus meaningless hand waving about Btrfs subvolumes (as if this were the debate about Btrfs inodes) and total lack of concern about filefrag used on other raid-like storage, and I get the impression Btrfs is not exactly TT's area. IOW, this looks like get-off-my-lawn bs. Unless a Btrfs dev says an extent address is not unique within a Btrfs filesystem, I consider the question settled. |
Update: Since I've been lured into combing Btrfs dev notes and source code to address spurious claims about the supposed deep, dark messy pit that is Btrfs internals, I keep seeing details that are actually reassuring. Btrfs does indeed use logical extent addresses (claiming it doesn't is weird), they are a crucial part of the disk format itself, and – the really good part – they are one of the higher-level abstractions in the format. What the Btrfs design is telling me so far is that they wanted to insulate extent All this is making me eager to start testing Wyng on multi-device Btrfs setups. And if big issues do arise, there is still XFS as a way to do reflink snapshots. |
Local storage abstraction classes including ReflinkVolume have been added. Most required functions are now there, including the ability to make read-only Btrfs subvolume snapshots and monitor fs maintenance incursions via the snapshot's transaction generation property. This changes Wyng's model of local storage from collections of Lvm_VolGroups containing tables of Lvm_Volumes and pools to a single LocalStorage class pointed at the archive's local storage location. The resulting 'storage' object's lvols dict is populated with objects based on relevant volume and snapshot names (which may or may not exist). The next steps will be:
Also to do:
|
@tasket: what advantages will Wyng have over e.g. |
Edit: One could tongue-in-cheek say that the reasons for using Wyng are the reasons why Edit: |
@tlaurion @DemiMarie Wyng now has basically a full implementation of reflink support and is ready to try out on Btrfs for anyone curious enough at this stage (note: it still has not yet returned to alpha). The prerequisite for using Wyng with Btrfs is to make the
Since we are now accessing local filesystem objects, you must be mindful of directory structure. In fact, the current implementation treats subdirectories as part of the Archive volume's name. To demonstrate,
You don't have to specify
It also raises the question of whether users might want to set aside a special dir where they create symlinks to the image files they want to back up, and then point Wyng at that special dir. This would be interesting to try. |
Allow multi-vol receive with reflink --local Update Readme
Optimize: do not init_dedup_index if no vol changes
Btrfs reflink and LVM have now been tested and are working. |
Convert remove_local_metadata() issue #75
Thoughts?
QubesOS/qubes-issues#6476
https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
The text was updated successfully, but these errors were encountered: