Skip to content

incusd/storage/zfs: Fix ZFS CreateVolume deletes pre-existing data on failure #1749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

mrstux
Copy link

@mrstux mrstux commented Mar 12, 2025

The issue was first reported by TrueNAS 25.04 beta test users. TrueNAS 25.04 includes Incus as a major feature

When the ZFS driver's CreateVolume function is unable to create the desired volume due to the existence of a pre-existing dataset, it would fail and proceed to recursively destroy the pre-existing dataset, potentially leading to unexpected data loss.

This has occurred to users who have been experimenting with trying to "import" their existing zvols into an incus zfs storage pool.

All other drivers were surveyed and they do not erase pre-existing data on failure, unless they first succeed in creating a volume or directory. The user is then responsible for correcting the situation in order to enable the CreateVolume call to succeed

The fix splits and moves the current revert.add(DeleteVolume) to only be used after a dataset that needs to be destroyed on reversion has been created or restored.


The issue and resolution are demonstrated below

Setup:

# create zfs storage pool
incus storage create zfsPool zfs
>Storage pool zfsPool created

# create pre-existing data (yes, users should not do this)
zfs create zfsPool/containers/lxc-blowaway
zfs create zfsPool/custom/default_blowaway_fs
zfs create zfsPool/custom/default_blowaway_block
zfs create zfsPool/virtual-machines/vm-blowaway
zfs create zfsPool/virtual-machines/vm-blowaway.block

# verify dataset existance
zfs list -o name -r zfsPool | grep blowaway
>zfsPool/containers/lxc-blowaway
>zfsPool/custom/default_blowaway_block
>zfsPool/custom/default_blowaway_fs
>zfsPool/virtual-machines/vm-blowaway
>zfsPool/virtual-machines/vm-blowaway.block

Without fix:

stux@incusdev:~/git/incus$ incus storage volume create zfsPool blowaway_fs
Error: Failed to run: zfs create -o mountpoint=legacy -o canmount=noauto zfsPool/custom/default_blowaway_fs: exit status 1 (cannot create 'zfsPool/custom/default_blowaway_fs': dataset already exists)

stux@incusdev:~/git/incus$ incus storage volume create zfsPool blowaway_block --type block
Error: Failed to run: zfs create -s -V 10737418240 -o volmode=none -o incus:content_type=block -o sync=disabled zfsPool/custom/default_blowaway_block: exit status 1 (cannot create 'zfsPool/custom/default_blowaway_block': dataset already exists)

stux@incusdev:~/git/incus$ incus create --empty lxc-blowaway --storage zfsPool
Creating lxc-blowaway
Error: Failed creating instance: Failed to run: zfs create -o mountpoint=legacy -o canmount=noauto zfsPool/containers/lxc-blowaway: exit status 1 (cannot create 'zfsPool/containers/lxc-blowaway': dataset already exists)

stux@incusdev:~/git/incus$ incus create --empty vm-blowaway --vm --storage zfsPool
Creating vm-blowaway
Error: Failed creating instance: Failed to run: zfs create -s -V 10737418240 -o volmode=none -o primarycache=metadata -o secondarycache=metadata -o sync=disabled zfsPool/virtual-machines/vm-blowaway.block: exit status 1 (cannot create 'zfsPool/virtual-machines/vm-blowaway.block': dataset already exists)

root@incusdev:~# zfs list -o name -r zfsPool | grep blowaway
root@incusdev:~# 

The pre-existing datasets are unexpectedly erased, and its important to note that none of the other Incus storage drivers behave this way.

With the fix:

stux@incusdev:~/git/incus$ incus storage volume create zfsPool blowaway_fs
Error: Failed to run: zfs create -o mountpoint=legacy -o canmount=noauto zfsPool/custom/default_blowaway_fs: exit status 1 (cannot create 'zfsPool/custom/default_blowaway_fs': dataset already exists)

stux@incusdev:~/git/incus$ incus storage volume create zfsPool blowaway_block --type block
Error: Failed to run: zfs create -s -V 10737418240 -o volmode=none -o incus:content_type=block -o sync=disabled zfsPool/custom/default_blowaway_block: exit status 1 (cannot create 'zfsPool/custom/default_blowaway_block': dataset already exists)

stux@incusdev:~/git/incus$ incus create --empty lxc-blowaway --storage zfsPool
Creating lxc-blowaway
Error: Failed creating instance: Failed to run: zfs create -o mountpoint=legacy -o canmount=noauto zfsPool/containers/lxc-blowaway: exit status 1 (cannot create 'zfsPool/containers/lxc-blowaway': dataset already exists)

stux@incusdev:~/git/incus$ incus create --empty vm-blowaway --vm --storage zfsPool
Creating vm-blowaway
Error: Failed creating instance: Failed to run: zfs create -s -V 10737418240 -o volmode=none -o primarycache=metadata -o secondarycache=metadata -o sync=disabled zfsPool/virtual-machines/vm-blowaway.block: exit status 1 (cannot create 'zfsPool/virtual-machines/vm-blowaway.block': dataset already exists)

root@incusdev:~# zfs list -o name -r zfsPool | grep blowaway
zfsPool/containers/lxc-blowaway
zfsPool/custom/default_blowaway_block
zfsPool/custom/default_blowaway_fs
zfsPool/virtual-machines/vm-blowaway
zfsPool/virtual-machines/vm-blowaway.block

The pre-existing datasets are not erased.

Cleanup:

stux@incusdev:~/git/incus$ incus storage rm zfsPool
Storage pool zfsPool deleted

When the ZFS driver's CreateVolume function is unable to create the
desired volume due to the existance of a pre-existing dataset, it would
fail and proceed to recursively destroy the pre-existing dataset,
potentially leading to unexpected data loss.

This has occurred to users who have been experimenting with trying to
"import" their existing zvols into an incus zfs storage pool.

All other drivers were surveyed and they do not erase pre-existing data
on failure, unless they first succeed in creating a volume or directory.
The user is then responsible for correcting the situation in order to
enable the CreateVolume to succeed

The fix splits and moves the current revert.add(DeleteVolume) to only be
used after a dataset that needs to be destroyed on reversion has been
created or restored.

Signed-off-by: Stuart Espey <[email protected]>
@mrstux mrstux requested a review from stgraber as a code owner March 12, 2025 07:56
@mrstux mrstux changed the title incus/drivers: Fix ZFS CreateVolume deletes pre-existing data on failure incusd/storage/zfs: Fix ZFS CreateVolume deletes pre-existing data on failure Mar 12, 2025
@stgraber stgraber enabled auto-merge March 13, 2025 04:56
@stgraber stgraber merged commit be0301a into lxc:main Mar 13, 2025
36 checks passed
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 31, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [lxc/incus](https://github.com/lxc/incus) | minor | `v6.10.1` -> `v6.11.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>lxc/incus (lxc/incus)</summary>

### [`v6.11.0`](https://github.com/lxc/incus/releases/tag/v6.11.0): Incus 6.11

[Compare Source](lxc/incus@v6.10.1...v6.11.0)

### Announcement

https://discuss.linuxcontainers.org/t/incus-6-11-has-been-released/23322

#### What's Changed

-   Allow ICMP and low ports for unprivileged users in OCI containers by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1706
-   doc: Clarify virtiofsd requirements by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1718
-   Fix generate-database usage for incusd/db by [@&#8203;breml](https://github.com/breml) in lxc/incus#1719
-   Do not allow mounting of custom block volume snapshots by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1720
-   generate-database: Abstract db connection / db transaction by [@&#8203;breml](https://github.com/breml) in lxc/incus#1721
-   Fix snapshot size handling in cross-pool copy/move by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1717
-   generate-database: Accept interface in PrepareStmts by [@&#8203;breml](https://github.com/breml) in lxc/incus#1725
-   Simplify `evaluateShorthandFilter` by reducing nesting levels by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1727
-   incusd/storage: Don't use sparse writer on thick LVM by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1729
-   generate-database: Add support for marshal to JSON by [@&#8203;breml](https://github.com/breml) in lxc/incus#1731
-   Fixed incus edk2 path overwrite issue by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1726
-   Do not download instance types if cache loadable by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1732
-   Clarify security.secureboot setting by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1740
-   Fix DNS for isolated OVN networks by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1738
-   Allow announcing extra routes through DHCPv4 by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1734
-   Fix link parsing failure on non-ethernet devices by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1742
-   Fix revert on OCI container creation failure by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1744
-   generate-database: Handle non tx DB connections by [@&#8203;breml](https://github.com/breml) in lxc/incus#1745
-   incus file edit extension by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1746
-   Cleanup internal API endpoints by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1747
-   Tweak help message for rebuild by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1754
-   Use lego binary for DNS-01 challenge by [@&#8203;accuser](https://github.com/accuser) in lxc/incus#1753
-   incusd/storage/zfs: Fix ZFS CreateVolume deletes pre-existing data on failure by [@&#8203;mrstux](https://github.com/mrstux) in lxc/incus#1749
-   incus/file: Always use 1MB chunks for SFTP by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1758
-   Use the correct path for ingesting DNS-01 challenge certificate outputs by [@&#8203;accuser](https://github.com/accuser) in lxc/incus#1759
-   incusd/bgp: Rework start/stop logic by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1761
-   incusd/network/ovn: Skip existing static routes by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1762
-   incusd/instance/qemu: Set caching-mode with intel-iommu by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1772
-   incus-agent: Improve SFTP performance by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1773
-   incusd/network/ovn: Keep getting router name when network none by [@&#8203;diegofernandes](https://github.com/diegofernandes) in lxc/incus#1771
-   make `incus copy --device xx,type=none` drop remaining device properties by [@&#8203;schnoddelbotz](https://github.com/schnoddelbotz) in lxc/incus#1764
-   incusd/instance/qemu: rtc base localtime for windows by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1767
-   Add option to configure DNS server for bridge and OVN networks by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1739
-   Use lego binary for http 01 challenge by [@&#8203;accuser](https://github.com/accuser) in lxc/incus#1770
-   Handle live migration between QEMU versions by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1775
-   incusd/instance/qemu: Skip to link nvram to itself by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1760
-   Switch to new MAC address prefix by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1776
-   client: Fix spelling errors found by codespell by [@&#8203;cjwatson](https://github.com/cjwatson) in lxc/incus#1777
-   Add ipv4.dhcp.expiry option for ovn networks by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1781
-   Configure DHCP on existing instance interfaces when it is enabled on a network by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1780
-   incusd/instance/edk2: Select SecureBoot capable firmware on Debian by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1782
-   Fix some `go  vet` warnings by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1784
-   Clear gofumpt by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1803
-   Fix some BGP issues by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1805
-   incusd/instance/qemu: bad pid check by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1806
-   Fix spelling errors and run codespell automatically by [@&#8203;cjwatson](https://github.com/cjwatson) in lxc/incus#1778
-   incus/file: Properly handle relative source paths by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1809
-   cmd/storage:  incorrect CLI syntax in storage pool creation examples by [@&#8203;ViniRodrig](https://github.com/ViniRodrig) in lxc/incus#1810
-   Improve DB performance by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1811
-   incusd/network/ovn: Fix default DNS IPv4 server by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1812
-   Extend OS detection logic by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1813
-   Add allocated CPU time to instance state by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1807
-   incusd/certificates: Properly handle bad PEM data by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1816
-   Extra `generate-database` features by [@&#8203;masnax](https://github.com/masnax) in lxc/incus#1817
-   incusd/network/common: Handle missing BGP peer by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1818
-   incusd/cluster/evacuate: Don't live-migrate stopped instances by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1819
-   Fix generator table pluralization by [@&#8203;masnax](https://github.com/masnax) in lxc/incus#1823
-   incusd/instance/qemu enable s4 by default by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1820
-   Add support for USB NICs by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1814
-   incusd/storage/s3 Fixed minio client mc too ambious issue by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1821
-   incusd/networks: Validate configuration on join too by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1824
-   Update gomod for go-jwt vulnerability by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1825
-   cmd/generate-database/db: Fix GetNames spacing by [@&#8203;masnax](https://github.com/masnax) in lxc/incus#1826
-   github: Rework issue templates by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1827
-   Update Debian installation documentation by [@&#8203;gibmat](https://github.com/gibmat) in lxc/incus#1830
-   Extend minio client naming by [@&#8203;gibmat](https://github.com/gibmat) in lxc/incus#1829
-   Various fixes from address set MR by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1831
-   incusd/instance/lxc: Cleanup OCI mount paths by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1834
-   Add `io.bus=usb` for disks by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1835
-   golangci: Upgrade to version 2 by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1836
-   golangci: Disable STI005 error checks by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1841
-   Standalone changes from the Linstor branch by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1842
-   incusd/storage/s3 minio client check enhancement by [@&#8203;nanjj](https://github.com/nanjj) in lxc/incus#1839
-   incusd/network/ovn: Remove internal routes to forward/load-balancers by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1843
-   incusd/instance/edk2: Always prefer the EDK2 override by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1847
-   Fixes from Linstor branch by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1846
-   Add `linstor` storage driver by [@&#8203;luissimas](https://github.com/luissimas) in lxc/incus#1621
-   Add `linstor.remove_snapshots` config option by [@&#8203;luissimas](https://github.com/luissimas) in lxc/incus#1848
-   doc/support: Update feature release version by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1853
-   incusd/instance: Don't enforce device/config validation on snapshots by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1854
-   OCI entrypoint configuration by [@&#8203;gwenya](https://github.com/gwenya) in lxc/incus#1845

#### New Contributors

-   [@&#8203;mrstux](https://github.com/mrstux) made their first contribution in lxc/incus#1749
-   [@&#8203;diegofernandes](https://github.com/diegofernandes) made their first contribution in lxc/incus#1771
-   [@&#8203;schnoddelbotz](https://github.com/schnoddelbotz) made their first contribution in lxc/incus#1764
-   [@&#8203;cjwatson](https://github.com/cjwatson) made their first contribution in lxc/incus#1777
-   [@&#8203;ViniRodrig](https://github.com/ViniRodrig) made their first contribution in lxc/incus#1810
-   [@&#8203;masnax](https://github.com/masnax) made their first contribution in lxc/incus#1817

**Full Changelog**: lxc/incus@v6.10.1...v6.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMTguMSIsInVwZGF0ZWRJblZlciI6IjM5LjIxOC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants