Skip to content

Today's update to ca0141f325ec706d38a06f9aeb8e5eb6c6a8d09a (almost identical to current 2.3.0 RC) caused permanent pool corruption #16631

Closed
@Rudd-O

Description

@Rudd-O

One of my machines (FC40) recently received a kernel update

  • from kernel-6.8.11-300.fc40.x86_64 to kernel-6.10.12-200.fc40.x86_64, and
  • from ZFS+DKMS+Dracut master lineage, from commit 02c5aa9 to ca0141f

simultaneously. This took place earlier today. The pool was healthy, in use, and recently scrubbed multiple times. No error anywhere, in the kernel log, or in the journal.

Mere minutes after I rebooted to the new kernel and ZFS, my Prometheus setup alerted me to 30 checksum errors, several write errors, and 4 data errors. Upon inspection:

[root@penny ~]# zpool status -v
  pool: chest
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
...abbreviated...

	NAME                                                                                                STATE     READ WRITE CKSUM
...abbreviated...

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x16>
        <metadata>:<0x3c>
        <metadata>:<0x44>
        <metadata>:<0x594>

The kernel also didn't show any hardware errors in the kernel ring buffer.

I rebooted to the older kernel and ZFS module, and started a scrub. It's still ongoing, but it has not found any problem nor produced any WRITE or CKSUM errors.

Interestingly, neither the cache nor the ZIL devices had that error. The boot drive also seemed unaffected.

This, to me, indicates a software issue, probably related to the LUKS write path (we've had these before) or mirroring (only the pool with the mirrored drives were hit — the single boot drive was not hit by the problem despite being the same otherwise).

The affected LUKS2 devices are all whole-disk formatted with 4K sector size, and my pool is ashift 12. (The unaffected root pool, cache device and ZIL device for the affected pool are not formatted with LUKS 4K sector size).

In the interest of seeing if it makes a difference, the affected LUKS devices are tuned with the following persistent flags:

Flags:       	allow-discards same-cpu-crypt submit-from-crypt-cpus no-read-workqueue no-write-workqueue 

The pool has the following features:

NAME   PROPERTY                       VALUE                          SOURCE
chest  size                           10.9T                          -
chest  capacity                       67%                            -
chest  altroot                        -                              default
chest  health                         ONLINE                         -
chest  guid                           2537396116593781450            -
chest  version                        -                              default
chest  bootfs                         -                              default
chest  delegation                     on                             default
chest  autoreplace                    off                            default
chest  cachefile                      -                              default
chest  failmode                       wait                           default
chest  listsnapshots                  off                            default
chest  autoexpand                     off                            default
chest  dedupratio                     1.00x                          -
chest  free                           3.55T                          -
chest  allocated                      7.34T                          -
chest  readonly                       off                            -
chest  ashift                         12                             local
chest  comment                        -                              default
chest  expandsize                     -                              -
chest  freeing                        0                              -
chest  fragmentation                  9%                             -
chest  leaked                         0                              -
chest  multihost                      off                            default
chest  checkpoint                     -                              -
chest  load_guid                      16604087848420727134           -
chest  autotrim                       off                            default
chest  compatibility                  off                            default
chest  bcloneused                     0                              -
chest  bclonesaved                    0                              -
chest  bcloneratio                    1.00x                          -
chest  feature@async_destroy          enabled                        local
chest  feature@empty_bpobj            active                         local
chest  feature@lz4_compress           active                         local
chest  feature@multi_vdev_crash_dump  enabled                        local
chest  feature@spacemap_histogram     active                         local
chest  feature@enabled_txg            active                         local
chest  feature@hole_birth             active                         local
chest  feature@extensible_dataset     active                         local
chest  feature@embedded_data          active                         local
chest  feature@bookmarks              enabled                        local
chest  feature@filesystem_limits      enabled                        local
chest  feature@large_blocks           enabled                        local
chest  feature@large_dnode            enabled                        local
chest  feature@sha512                 enabled                        local
chest  feature@skein                  enabled                        local
chest  feature@edonr                  enabled                        local
chest  feature@userobj_accounting     active                         local
chest  feature@encryption             enabled                        local
chest  feature@project_quota          active                         local
chest  feature@device_removal         enabled                        local
chest  feature@obsolete_counts        enabled                        local
chest  feature@zpool_checkpoint       enabled                        local
chest  feature@spacemap_v2            active                         local
chest  feature@allocation_classes     enabled                        local
chest  feature@resilver_defer         enabled                        local
chest  feature@bookmark_v2            enabled                        local
chest  feature@redaction_bookmarks    enabled                        local
chest  feature@redacted_datasets      enabled                        local
chest  feature@bookmark_written       enabled                        local
chest  feature@log_spacemap           active                         local
chest  feature@livelist               enabled                        local
chest  feature@device_rebuild         enabled                        local
chest  feature@zstd_compress          enabled                        local
chest  feature@draid                  enabled                        local
chest  feature@zilsaxattr             disabled                       local
chest  feature@head_errlog            disabled                       local
chest  feature@blake3                 disabled                       local
chest  feature@block_cloning          disabled                       local
chest  feature@vdev_zaps_v2           disabled                       local
chest  feature@redaction_list_spill   disabled                       local
chest  feature@raidz_expansion        disabled                       local

This is where my pool currently sits:

[root@penny ~]# zpool status
  pool: chest
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Oct  9 22:37:54 2024
	900G / 7.35T scanned at 950M/s, 136G / 7.35T issued at 143M/s
	0B repaired, 1.80% done, 14:39:54 to go
config:

	NAME                                                                                                STATE     READ WRITE CKSUM
	chest                                                                                               ONLINE       0     0     0
	  mirror-0                                                                                          ONLINE       0     0     0
	    dm-uuid-CRYPT-LUKS2-sda  ONLINE       0     0     0
	    dm-uuid-CRYPT-LUKS2-sdb  ONLINE       0     0     0
	  mirror-3                                                                                          ONLINE       0     0     0
	    dm-uuid-CRYPT-LUKS2-sdc  ONLINE       0     0     0
	    dm-uuid-CRYPT-LUKS2-sdd  ONLINE       0     0     0
	logs	
	  dm-uuid-CRYPT-LUKS2-sde    ONLINE       0     0     0
	cache
	  dm-uuid-CRYPT-LUKS2-sdf    ONLINE       0     0     0

errors: 4 data errors, use '-v' for a list

Update: got good news! After reverting to the prior kernel + commit mentioned above, I am very happy to report that the scrub found no errors, and the data errors listed previously simply disappeared. So not a single bit of data loss!

The less good news: this does indicate, strongly, that under these use cases, there is a software defect in OpenZFS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions