VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

guxing1841 · 2023-11-29T06:42:54Z

System information

Type	Version/Name
Distribution Name	fedora
Distribution Version	39
Kernel Version	6.5.12-300.fc39.x86_64
Architecture	x86_64
OpenZFS Version	zfs-2.2.1-1,zfs-kmod-2.2.1-1

Describe the problem you're observing

Kernel Panic with zfs on root.
Rebooting the system hangs after "dracut wraning: unmounted /oldroot"

11月 29 13:57:23 fedora kernel: VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096)
11月 29 13:57:23 fedora kernel: PANIC at vdev_indirect_mapping.c:528:vdev_indirect_mapping_increment_obsolete_count()
11月 29 13:57:23 fedora kernel: Showing stack for process 741
11月 29 13:57:23 fedora kernel: CPU: 1 PID: 741 Comm: z_indirect_cond Tainted: P           OE      6.5.12-300.fc39.x86_64 #1
11月 29 13:57:23 fedora kernel: Hardware name: LENOVO 20WE/LNVNB161216, BIOS FNCN40WW(V2.10) 09/14/2022
11月 29 13:57:23 fedora kernel: Call Trace:
11月 29 13:57:23 fedora kernel:  <TASK>
11月 29 13:57:23 fedora kernel:  dump_stack_lvl+0x47/0x60
11月 29 13:57:23 fedora kernel:  spl_panic+0x100/0x120 [spl]
11月 29 13:57:23 fedora kernel:  vdev_indirect_mapping_increment_obsolete_count+0xe2/0x120 [zfs]
11月 29 13:57:23 fedora kernel:  load_obsolete_sm_callback+0x20/0x30 [zfs]
11月 29 13:57:23 fedora kernel:  space_map_iterate+0x195/0x410 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_load_obsolete_sm_callback+0x10/0x10 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
11月 29 13:57:23 fedora kernel:  vdev_indirect_mapping_load_obsolete_spacemap+0x47/0x90 [zfs]
11月 29 13:57:23 fedora kernel:  spa_condense_indirect_thread+0xdf/0x560 [zfs]
11月 29 13:57:23 fedora kernel:  ? __slab_free+0xf1/0x330
11月 29 13:57:23 fedora kernel:  ? set_next_entity+0xe2/0x160
11月 29 13:57:23 fedora kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
11月 29 13:57:23 fedora kernel:  zthr_procedure+0x137/0x150 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_zthr_procedure+0x10/0x10 [zfs]
11月 29 13:57:23 fedora kernel:  thread_generic_wrapper+0x5b/0x70 [spl]
11月 29 13:57:23 fedora kernel:  kthread+0xe5/0x120
11月 29 13:57:23 fedora kernel:  ? __pfx_kthread+0x10/0x10
11月 29 13:57:23 fedora kernel:  ret_from_fork+0x31/0x50
11月 29 13:57:23 fedora kernel:  ? __pfx_kthread+0x10/0x10
11月 29 13:57:23 fedora kernel:  ret_from_fork_asm+0x1b/0x30
11月 29 13:57:23 fedora kernel:  </TASK>

Describe how to reproduce the problem

Boot with commad line : single init=/bin/bash

zfs import -N <pool>

or

Booting with zfs on root when dracut import pool

Include any warning/errors/backtraces from the system logs

The text was updated successfully, but these errors were encountered:

guxing1841 · 2023-12-01T21:40:59Z

I don’t know what the problem is. The pool in question was on the previous system and it has been working normally. This problem occurred when I set root on zfs. There was no response when exporting and destroying the pool. In the end, I could only delete all partitions of this pool. .

I have created a new pool, copied the original data, reset zfs on root, and now there are no problems.

alexdrl · 2024-02-19T13:44:14Z

I have the exact same issue on TrueNAS Scale, same ZFS version I assume, for some reason (there was a indirect vdev after removing a mirrored vdev), I can't see the indirect device now, and I am seeing those problems, also, I need to remove a missing SLOG device, and I get this error :( #8748

snajpa · 2024-11-28T13:26:59Z

@guxing1841 @alexdrl

two questions:

can you please verify if this still is a problem with more recent releases of 2.2.x branch?
if it isn't, do you think we could figure out a reproducer for this? would probably entail creating the pool on an older release, right? what else?

pnc87 · 2025-01-13T00:32:29Z

Hi,

I'm having a very similar issue. About 7 or 8 days ago, one of my hard drives in one of the mirrored VDEVs went down, all the other drives were reporting back as good, so I decided to offline the drive and remove the VDEV entirely since my overall pool has enough storage space to absorb the data. I reassigned the now extra drive as a spare vdev for the pool. Everything seemed to run fine and I did a SMART test to make sure all the other drives were good, and a scrub to make sure the pool was still healthy - all checks came back with no errors. Then last night I was trying to access my SMB share and I noticed that it was inaccessible, so I tried to log in to the TrueNAS server and realized it was completely frozen. This is the first time this has ever happened, so I rebooted it and when it was starting it went directly into a Kernel Panic. I’ve tried to do research on what may be causing this issue but I haven’t been able to find much aside from one issue that seemed unrelated. So far the only troubleshooting I could think of is running a MEMTEST86 test to make sure the memory is good, but aside from that I don’t know what else to do.

pnc87 · 2025-01-14T23:16:53Z

Hi! For anyone that is looking for further assistance if they're running into a similar issue, please see this thread: https://forums.truenas.com/t/kernel-panic-with-strange-error/30832/17

Shout out to the TrueNAS community for ALL the help!

marcelfarres · 2025-02-27T07:20:15Z

Hi!

Another instance of the same problem.

I added a new vdev of 2 drives and remove the original vdev using the TrueNas GUI.

It work for a couple of days and reboots no problem and today the server just went offline and got stuck in a boot loop worjt the same kernel panic.

When after device removal we handle block pointers remap, skip blocks that might be cloned. BRTs are indexed by vdev id and offset from block pointer's DVA[0]. So if we start addressing the same block by some different DVA, we won't get the proper reference counter. As result, we might either remap the block twice, that may result in assertion during indirect mapping condense, or free it prematurely, that may result in data overwrite, or free it twice, that may result in assertion in spacemap code. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Fixes openzfs#15604

When after device removal we handle block pointers remap, skip blocks that might be cloned. BRTs are indexed by vdev id and offset from block pointer's DVA[0]. So if we start addressing the same block by some different DVA, we won't get the proper reference counter. As result, we might either remap the block twice, that may result in assertion during indirect mapping condense, or free it prematurely, that may result in data overwrite, or free it twice, that may result in assertion in spacemap code. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15604 Closes #17180

When after device removal we handle block pointers remap, skip blocks that might be cloned. BRTs are indexed by vdev id and offset from block pointer's DVA[0]. So if we start addressing the same block by some different DVA, we won't get the proper reference counter. As result, we might either remap the block twice, that may result in assertion during indirect mapping condense, or free it prematurely, that may result in data overwrite, or free it twice, that may result in assertion in spacemap code. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Fixes openzfs#15604

When after device removal we handle block pointers remap, skip blocks that might be cloned. BRTs are indexed by vdev id and offset from block pointer's DVA[0]. So if we start addressing the same block by some different DVA, we won't get the proper reference counter. As result, we might either remap the block twice, that may result in assertion during indirect mapping condense, or free it prematurely, that may result in data overwrite, or free it twice, that may result in assertion in spacemap code. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15604 Closes openzfs#17180

guxing1841 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 29, 2023

amotin mentioned this issue Mar 26, 2025

Block remap for cloned blocks on device removal #17180

Merged

13 tasks

behlendorf closed this as completed in #17180 Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

guxing1841 commented Nov 29, 2023 •

edited

Loading

guxing1841 commented Dec 1, 2023

Uh oh!

alexdrl commented Feb 19, 2024

Uh oh!

snajpa commented Nov 28, 2024

Uh oh!

pnc87 commented Jan 13, 2025 •

edited

Loading

Uh oh!

pnc87 commented Jan 14, 2025

Uh oh!

marcelfarres commented Feb 27, 2025

Uh oh!

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

Comments

guxing1841 commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

guxing1841 commented Dec 1, 2023

Uh oh!

alexdrl commented Feb 19, 2024

Uh oh!

snajpa commented Nov 28, 2024

Uh oh!

pnc87 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pnc87 commented Jan 14, 2025

Uh oh!

marcelfarres commented Feb 27, 2025

Uh oh!

guxing1841 commented Nov 29, 2023 •

edited

Loading

pnc87 commented Jan 13, 2025 •

edited

Loading