Skip to content

2.3.2 causing kernel panic and I/O hangs, 2.3.1 works on same dataset #17307

Open
@sbellon

Description

@sbellon

System information

Type Version/Name
Distribution Name Debian GNU/Linux
Distribution Version unstable
Kernel Version 6.12.17 / 6.12.25
Architecture x86_64
OpenZFS Version 2.3.1 / 2.3.2

Describe the problem you're observing

Accessing certain parts of the file system with ZFS 2.3.2 causes reproducibly kernel panics and I/O hangs, while it works apparently flawlessly with ZFS 2.3.1 and earlier.

I also reported to Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104724

Even if the diagnostic of my dataset being somehow corrupted is true

  1. I wonder why it apparently worked flawlessly with all earlier versions of ZFS for over 1.5 years
  2. I think ZFS should not react that harshly in case a dataset really is corrupt.

Describe how to reproduce the problem

Until May 1st I was using kernel 6.12.17 and ZFS 2.3.1, everything working fine.

On May 1st, I booted into kernel 6.12.25, still ZFS 2.3.1, everything working fine.

On May 2nd, the upgrade to ZFS 2.3.2 was installed, did not reboot, so zfs-kmod still on 2.3.1, everything working fine.

On May 5th, I rebooted, and from there on, system misbehaved strangely, e.g. I could not open a "fish" shell any more as any access to ~/.config and ~/.cache would reproducibly result in those I/O hangs.

Luckily I still have 6.12.17/2.3.1 available to boot into a fully working environment, booting into 6.12.25/2.3.2 immediately breaks with the above symptoms.

Include any warning/errors/backtraces from the system logs

Kernel panic:

kernel: PANIC: zroot: blkptr at ffffb9932045c080 has no valid DVAs
kernel: Showing stack for process 1931
kernel: CPU: 14 UID: 0 PID: 1931 Comm: z_wr_iss Tainted: P           OE      6.12.25-amd64 #1  Debian 6.12.25-1
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1205 06/14/2023
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x5d/0x80
kernel:  vcmn_err.cold+0x54/0x7f [spl]
kernel:  zfs_panic_recover+0x79/0xa0 [zfs]
kernel:  zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel:  zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel:  ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel:  dbuf_write_ready+0xf5/0x410 [zfs]
kernel:  arc_write_ready+0xe9/0x560 [zfs]
kernel:  ? mutex_lock+0x12/0x30
kernel:  zio_ready+0x4b/0x400 [zfs]
kernel:  zio_execute+0x8f/0x130 [zfs]
kernel:  taskq_thread+0x352/0x6f0 [spl]
kernel:  ? __pfx_default_wake_function+0x10/0x10
kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 1:

kernel: INFO: task txg_sync:811 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:txg_sync        state:D stack:0     pid:811   tgid:811   ppid:2      flags:0x00004000
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  schedule_timeout+0x9e/0x160
kernel:  ? __pfx_process_timeout+0x10/0x10
kernel:  io_schedule_timeout+0x51/0x70
kernel:  __cv_timedwait_common+0x138/0x170 [spl]
kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
kernel:  __cv_timedwait_io+0x19/0x20 [spl]
kernel:  zio_wait+0x14e/0x2f0 [zfs]
kernel:  dsl_pool_sync+0xf2/0x510 [zfs]
kernel:  spa_sync+0x577/0x1070 [zfs]
kernel:  ? spa_txg_history_init_io+0x115/0x120 [zfs]
kernel:  txg_sync_thread+0x20a/0x3b0 [zfs]
kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
kernel:  thread_generic_wrapper+0x5a/0x70 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 2:

kernel: INFO: task z_wr_iss:970 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:z_wr_iss        state:D stack:0     pid:970   tgid:970   ppid:2      flags:0x00004000
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  vcmn_err.cold+0x69/0x7f [spl]
kernel:  zfs_panic_recover+0x79/0xa0 [zfs]
kernel:  zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel:  zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel:  ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel:  dbuf_write_ready+0xf5/0x410 [zfs]
kernel:  arc_write_ready+0xe9/0x560 [zfs]
kernel:  ? mutex_lock+0x12/0x30
kernel:  zio_ready+0x4b/0x400 [zfs]
kernel:  zio_execute+0x8f/0x130 [zfs]
kernel:  taskq_thread+0x352/0x6f0 [spl]
kernel:  ? __pfx_default_wake_function+0x10/0x10
kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 3:

kernel: INFO: task exe:2620 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:exe             state:D stack:0     pid:2620  tgid:2609  ppid:2489   flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  ? arc_buf_alloc_impl.isra.0+0x28e/0x300 [zfs]
kernel:  schedule+0x27/0xf0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x3d0/0x6d0
kernel:  dbuf_dirty+0x4d/0x9b0 [zfs]
kernel:  dbuf_dirty+0x78b/0x9b0 [zfs]
kernel:  dnode_setdirty+0x96/0xf0 [zfs]
kernel:  dbuf_dirty+0x8b1/0x9b0 [zfs]
kernel:  sa_attr_op+0x27a/0x3d0 [zfs]
kernel:  sa_bulk_update_impl+0x62/0x100 [zfs]
kernel:  sa_bulk_update+0x50/0x90 [zfs]
kernel:  zfs_dirty_inode+0x2ab/0x3a0 [zfs]
kernel:  zpl_dirty_inode+0x2b/0x40 [zfs]
kernel:  __mark_inode_dirty+0x54/0x350
kernel:  generic_update_time+0x4e/0x60
kernel:  touch_atime+0xed/0x120
kernel:  zpl_iter_read+0x17b/0x190 [zfs]
kernel:  vfs_read+0x299/0x370
kernel:  __x64_sys_pread64+0x98/0xd0
kernel:  do_syscall_64+0x82/0x190
kernel:  ? eventfd_write+0xe2/0x210
kernel:  ? aa_file_perm+0x122/0x4d0
kernel:  ? cgroup_rstat_updated+0x69/0x220
kernel:  ? kmem_cache_alloc_noprof+0x106/0x2f0
kernel:  ? posix_lock_inode+0x516/0xa40
kernel:  ? fcntl_setlk+0x272/0x400
kernel:  ? __lruvec_stat_mod_folio+0x83/0xd0
kernel:  ? do_fcntl+0x5e9/0x740
kernel:  ? __x64_sys_fcntl+0x87/0xe0
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? __count_memcg_events+0x53/0xf0
kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
kernel:  ? handle_mm_fault+0x1bb/0x2c0
kernel:  ? do_user_addr_fault+0x36c/0x620
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x414cae
kernel: RSP: 002b:000000c000051750 EFLAGS: 00000212 ORIG_RAX: 0000000000000011
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000414cae
kernel: RDX: 0000000000000040 RSI: 000000c00031e680 RDI: 0000000000000003
kernel: RBP: 000000c000051790 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 000000c00031e680
kernel: R13: 0000000000000080 R14: 000000c000002380 R15: 0000000000000000
kernel:  </TASK>

I/O hang 4:

kernel: INFO: task fish:2909 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:fish            state:D stack:0     pid:2909  tgid:2909  ppid:1      flags:0x00000006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x3d0/0x6d0
kernel:  dbuf_find+0xe1/0x250 [zfs]
kernel:  dbuf_hold_impl+0x6f/0x7e0 [zfs]
kernel:  ? dbuf_find+0x1b6/0x250 [zfs]
kernel:  dbuf_hold_impl+0x4d4/0x7e0 [zfs]
kernel:  dbuf_hold+0x31/0x60 [zfs]
kernel:  dnode_hold_impl+0x100/0x1310 [zfs]
kernel:  ? zfs_znode_hold_enter+0x118/0x170 [zfs]
kernel:  dmu_bonus_hold+0x3c/0x90 [zfs]
kernel:  zfs_zget+0x70/0x290 [zfs]
kernel:  zfs_dirent_lock+0x42b/0x6c0 [zfs]
kernel:  zfs_dirlook+0xb4/0x320 [zfs]
kernel:  ? zfs_zaccess+0x26f/0x450 [zfs]
kernel:  zfs_lookup+0x264/0x410 [zfs]
kernel:  zpl_lookup+0xd9/0x2d0 [zfs]
kernel:  lookup_one_qstr_excl+0x6f/0xa0
kernel:  filename_create+0xc6/0x1a0
kernel:  do_mkdirat+0x61/0x180
kernel:  __x64_sys_mkdir+0x46/0x70
kernel:  do_syscall_64+0x82/0x190
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f36923ac687
kernel: RSP: 002b:00007ffff6008618 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
kernel: RAX: ffffffffffffffda RBX: 00007ffff6008620 RCX: 00007f36923ac687
kernel: RDX: 0000000000000019 RSI: 00000000000001c0 RDI: 00007ffff6008620
kernel: RBP: 00007ffff60088e0 R08: fffefffefffcfcff R09: 632e2f6e6f6c6c65
kernel: R10: 8080808080808080 R11: 0000000000000246 R12: 000055a4a068ef70
kernel: R13: 00007ffff6008910 R14: 000055a4a67cd110 R15: 0000000000000019
kernel:  </TASK>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions