Description
System information
Type | Version/Name |
---|---|
Distribution Name | Debian GNU/Linux |
Distribution Version | unstable |
Kernel Version | 6.12.17 / 6.12.25 |
Architecture | x86_64 |
OpenZFS Version | 2.3.1 / 2.3.2 |
Describe the problem you're observing
Accessing certain parts of the file system with ZFS 2.3.2 causes reproducibly kernel panics and I/O hangs, while it works apparently flawlessly with ZFS 2.3.1 and earlier.
I also reported to Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104724
Even if the diagnostic of my dataset being somehow corrupted is true
- I wonder why it apparently worked flawlessly with all earlier versions of ZFS for over 1.5 years
- I think ZFS should not react that harshly in case a dataset really is corrupt.
Describe how to reproduce the problem
Until May 1st I was using kernel 6.12.17 and ZFS 2.3.1, everything working fine.
On May 1st, I booted into kernel 6.12.25, still ZFS 2.3.1, everything working fine.
On May 2nd, the upgrade to ZFS 2.3.2 was installed, did not reboot, so zfs-kmod still on 2.3.1, everything working fine.
On May 5th, I rebooted, and from there on, system misbehaved strangely, e.g. I could not open a "fish" shell any more as any access to ~/.config
and ~/.cache
would reproducibly result in those I/O hangs.
Luckily I still have 6.12.17/2.3.1 available to boot into a fully working environment, booting into 6.12.25/2.3.2 immediately breaks with the above symptoms.
Include any warning/errors/backtraces from the system logs
Kernel panic:
kernel: PANIC: zroot: blkptr at ffffb9932045c080 has no valid DVAs
kernel: Showing stack for process 1931
kernel: CPU: 14 UID: 0 PID: 1931 Comm: z_wr_iss Tainted: P OE 6.12.25-amd64 #1 Debian 6.12.25-1
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1205 06/14/2023
kernel: Call Trace:
kernel: <TASK>
kernel: dump_stack_lvl+0x5d/0x80
kernel: vcmn_err.cold+0x54/0x7f [spl]
kernel: zfs_panic_recover+0x79/0xa0 [zfs]
kernel: zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel: zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel: ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel: dbuf_write_ready+0xf5/0x410 [zfs]
kernel: arc_write_ready+0xe9/0x560 [zfs]
kernel: ? mutex_lock+0x12/0x30
kernel: zio_ready+0x4b/0x400 [zfs]
kernel: zio_execute+0x8f/0x130 [zfs]
kernel: taskq_thread+0x352/0x6f0 [spl]
kernel: ? __pfx_default_wake_function+0x10/0x10
kernel: ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel: kthread+0xcf/0x100
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x31/0x50
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork_asm+0x1a/0x30
kernel: </TASK>
I/O hang 1:
kernel: INFO: task txg_sync:811 blocked for more than 120 seconds.
kernel: Tainted: P OE 6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:txg_sync state:D stack:0 pid:811 tgid:811 ppid:2 flags:0x00004000
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x505/0xbf0
kernel: schedule+0x27/0xf0
kernel: schedule_timeout+0x9e/0x160
kernel: ? __pfx_process_timeout+0x10/0x10
kernel: io_schedule_timeout+0x51/0x70
kernel: __cv_timedwait_common+0x138/0x170 [spl]
kernel: ? __pfx_autoremove_wake_function+0x10/0x10
kernel: __cv_timedwait_io+0x19/0x20 [spl]
kernel: zio_wait+0x14e/0x2f0 [zfs]
kernel: dsl_pool_sync+0xf2/0x510 [zfs]
kernel: spa_sync+0x577/0x1070 [zfs]
kernel: ? spa_txg_history_init_io+0x115/0x120 [zfs]
kernel: txg_sync_thread+0x20a/0x3b0 [zfs]
kernel: ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
kernel: ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
kernel: thread_generic_wrapper+0x5a/0x70 [spl]
kernel: kthread+0xcf/0x100
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x31/0x50
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork_asm+0x1a/0x30
kernel: </TASK>
I/O hang 2:
kernel: INFO: task z_wr_iss:970 blocked for more than 120 seconds.
kernel: Tainted: P OE 6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:z_wr_iss state:D stack:0 pid:970 tgid:970 ppid:2 flags:0x00004000
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x505/0xbf0
kernel: schedule+0x27/0xf0
kernel: vcmn_err.cold+0x69/0x7f [spl]
kernel: zfs_panic_recover+0x79/0xa0 [zfs]
kernel: zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel: zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel: ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel: dbuf_write_ready+0xf5/0x410 [zfs]
kernel: arc_write_ready+0xe9/0x560 [zfs]
kernel: ? mutex_lock+0x12/0x30
kernel: zio_ready+0x4b/0x400 [zfs]
kernel: zio_execute+0x8f/0x130 [zfs]
kernel: taskq_thread+0x352/0x6f0 [spl]
kernel: ? __pfx_default_wake_function+0x10/0x10
kernel: ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel: kthread+0xcf/0x100
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x31/0x50
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork_asm+0x1a/0x30
kernel: </TASK>
I/O hang 3:
kernel: INFO: task exe:2620 blocked for more than 120 seconds.
kernel: Tainted: P OE 6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:exe state:D stack:0 pid:2620 tgid:2609 ppid:2489 flags:0x00000002
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x505/0xbf0
kernel: ? arc_buf_alloc_impl.isra.0+0x28e/0x300 [zfs]
kernel: schedule+0x27/0xf0
kernel: schedule_preempt_disabled+0x15/0x30
kernel: __mutex_lock.constprop.0+0x3d0/0x6d0
kernel: dbuf_dirty+0x4d/0x9b0 [zfs]
kernel: dbuf_dirty+0x78b/0x9b0 [zfs]
kernel: dnode_setdirty+0x96/0xf0 [zfs]
kernel: dbuf_dirty+0x8b1/0x9b0 [zfs]
kernel: sa_attr_op+0x27a/0x3d0 [zfs]
kernel: sa_bulk_update_impl+0x62/0x100 [zfs]
kernel: sa_bulk_update+0x50/0x90 [zfs]
kernel: zfs_dirty_inode+0x2ab/0x3a0 [zfs]
kernel: zpl_dirty_inode+0x2b/0x40 [zfs]
kernel: __mark_inode_dirty+0x54/0x350
kernel: generic_update_time+0x4e/0x60
kernel: touch_atime+0xed/0x120
kernel: zpl_iter_read+0x17b/0x190 [zfs]
kernel: vfs_read+0x299/0x370
kernel: __x64_sys_pread64+0x98/0xd0
kernel: do_syscall_64+0x82/0x190
kernel: ? eventfd_write+0xe2/0x210
kernel: ? aa_file_perm+0x122/0x4d0
kernel: ? cgroup_rstat_updated+0x69/0x220
kernel: ? kmem_cache_alloc_noprof+0x106/0x2f0
kernel: ? posix_lock_inode+0x516/0xa40
kernel: ? fcntl_setlk+0x272/0x400
kernel: ? __lruvec_stat_mod_folio+0x83/0xd0
kernel: ? do_fcntl+0x5e9/0x740
kernel: ? __x64_sys_fcntl+0x87/0xe0
kernel: ? syscall_exit_to_user_mode+0x4d/0x210
kernel: ? do_syscall_64+0x8e/0x190
kernel: ? __count_memcg_events+0x53/0xf0
kernel: ? count_memcg_events.constprop.0+0x1a/0x30
kernel: ? handle_mm_fault+0x1bb/0x2c0
kernel: ? do_user_addr_fault+0x36c/0x620
kernel: ? exc_page_fault+0x7e/0x180
kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x414cae
kernel: RSP: 002b:000000c000051750 EFLAGS: 00000212 ORIG_RAX: 0000000000000011
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000414cae
kernel: RDX: 0000000000000040 RSI: 000000c00031e680 RDI: 0000000000000003
kernel: RBP: 000000c000051790 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 000000c00031e680
kernel: R13: 0000000000000080 R14: 000000c000002380 R15: 0000000000000000
kernel: </TASK>
I/O hang 4:
kernel: INFO: task fish:2909 blocked for more than 120 seconds.
kernel: Tainted: P OE 6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:fish state:D stack:0 pid:2909 tgid:2909 ppid:1 flags:0x00000006
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x505/0xbf0
kernel: schedule+0x27/0xf0
kernel: schedule_preempt_disabled+0x15/0x30
kernel: __mutex_lock.constprop.0+0x3d0/0x6d0
kernel: dbuf_find+0xe1/0x250 [zfs]
kernel: dbuf_hold_impl+0x6f/0x7e0 [zfs]
kernel: ? dbuf_find+0x1b6/0x250 [zfs]
kernel: dbuf_hold_impl+0x4d4/0x7e0 [zfs]
kernel: dbuf_hold+0x31/0x60 [zfs]
kernel: dnode_hold_impl+0x100/0x1310 [zfs]
kernel: ? zfs_znode_hold_enter+0x118/0x170 [zfs]
kernel: dmu_bonus_hold+0x3c/0x90 [zfs]
kernel: zfs_zget+0x70/0x290 [zfs]
kernel: zfs_dirent_lock+0x42b/0x6c0 [zfs]
kernel: zfs_dirlook+0xb4/0x320 [zfs]
kernel: ? zfs_zaccess+0x26f/0x450 [zfs]
kernel: zfs_lookup+0x264/0x410 [zfs]
kernel: zpl_lookup+0xd9/0x2d0 [zfs]
kernel: lookup_one_qstr_excl+0x6f/0xa0
kernel: filename_create+0xc6/0x1a0
kernel: do_mkdirat+0x61/0x180
kernel: __x64_sys_mkdir+0x46/0x70
kernel: do_syscall_64+0x82/0x190
kernel: ? syscall_exit_to_user_mode+0x4d/0x210
kernel: ? do_syscall_64+0x8e/0x190
kernel: ? exc_page_fault+0x7e/0x180
kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f36923ac687
kernel: RSP: 002b:00007ffff6008618 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
kernel: RAX: ffffffffffffffda RBX: 00007ffff6008620 RCX: 00007f36923ac687
kernel: RDX: 0000000000000019 RSI: 00000000000001c0 RDI: 00007ffff6008620
kernel: RBP: 00007ffff60088e0 R08: fffefffefffcfcff R09: 632e2f6e6f6c6c65
kernel: R10: 8080808080808080 R11: 0000000000000246 R12: 000055a4a068ef70
kernel: R13: 00007ffff6008910 R14: 000055a4a67cd110 R15: 0000000000000019
kernel: </TASK>