Skip to content

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
guxing1841 opened this issue Nov 29, 2023 · 6 comments · Fixed by #17180
Closed

VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096) #15604

guxing1841 opened this issue Nov 29, 2023 · 6 comments · Fixed by #17180
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@guxing1841
Copy link

guxing1841 commented Nov 29, 2023

System information

Type Version/Name
Distribution Name fedora
Distribution Version 39
Kernel Version 6.5.12-300.fc39.x86_64
Architecture x86_64
OpenZFS Version zfs-2.2.1-1,zfs-kmod-2.2.1-1

Describe the problem you're observing

Kernel Panic with zfs on root.
Rebooting the system hangs after "dracut wraning: unmounted /oldroot"

11月 29 13:57:23 fedora kernel: VERIFY3(counts[index] + inner_size <= size) failed (8192 <= 4096)
11月 29 13:57:23 fedora kernel: PANIC at vdev_indirect_mapping.c:528:vdev_indirect_mapping_increment_obsolete_count()
11月 29 13:57:23 fedora kernel: Showing stack for process 741
11月 29 13:57:23 fedora kernel: CPU: 1 PID: 741 Comm: z_indirect_cond Tainted: P           OE      6.5.12-300.fc39.x86_64 #1
11月 29 13:57:23 fedora kernel: Hardware name: LENOVO 20WE/LNVNB161216, BIOS FNCN40WW(V2.10) 09/14/2022
11月 29 13:57:23 fedora kernel: Call Trace:
11月 29 13:57:23 fedora kernel:  <TASK>
11月 29 13:57:23 fedora kernel:  dump_stack_lvl+0x47/0x60
11月 29 13:57:23 fedora kernel:  spl_panic+0x100/0x120 [spl]
11月 29 13:57:23 fedora kernel:  vdev_indirect_mapping_increment_obsolete_count+0xe2/0x120 [zfs]
11月 29 13:57:23 fedora kernel:  load_obsolete_sm_callback+0x20/0x30 [zfs]
11月 29 13:57:23 fedora kernel:  space_map_iterate+0x195/0x410 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_load_obsolete_sm_callback+0x10/0x10 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
11月 29 13:57:23 fedora kernel:  vdev_indirect_mapping_load_obsolete_spacemap+0x47/0x90 [zfs]
11月 29 13:57:23 fedora kernel:  spa_condense_indirect_thread+0xdf/0x560 [zfs]
11月 29 13:57:23 fedora kernel:  ? __slab_free+0xf1/0x330
11月 29 13:57:23 fedora kernel:  ? set_next_entity+0xe2/0x160
11月 29 13:57:23 fedora kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
11月 29 13:57:23 fedora kernel:  zthr_procedure+0x137/0x150 [zfs]
11月 29 13:57:23 fedora kernel:  ? __pfx_zthr_procedure+0x10/0x10 [zfs]
11月 29 13:57:23 fedora kernel:  thread_generic_wrapper+0x5b/0x70 [spl]
11月 29 13:57:23 fedora kernel:  kthread+0xe5/0x120
11月 29 13:57:23 fedora kernel:  ? __pfx_kthread+0x10/0x10
11月 29 13:57:23 fedora kernel:  ret_from_fork+0x31/0x50
11月 29 13:57:23 fedora kernel:  ? __pfx_kthread+0x10/0x10
11月 29 13:57:23 fedora kernel:  ret_from_fork_asm+0x1b/0x30
11月 29 13:57:23 fedora kernel:  </TASK>

Describe how to reproduce the problem

Boot with commad line : single init=/bin/bash

zfs import -N <pool>

or

Booting with zfs on root when dracut import pool

Include any warning/errors/backtraces from the system logs

@guxing1841 guxing1841 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 29, 2023
@guxing1841
Copy link
Author

I don’t know what the problem is. The pool in question was on the previous system and it has been working normally. This problem occurred when I set root on zfs. There was no response when exporting and destroying the pool. In the end, I could only delete all partitions of this pool. .

I have created a new pool, copied the original data, reset zfs on root, and now there are no problems.

@alexdrl
Copy link

alexdrl commented Feb 19, 2024

I have the exact same issue on TrueNAS Scale, same ZFS version I assume, for some reason (there was a indirect vdev after removing a mirrored vdev), I can't see the indirect device now, and I am seeing those problems, also, I need to remove a missing SLOG device, and I get this error :( #8748

@snajpa
Copy link
Contributor

snajpa commented Nov 28, 2024

@guxing1841 @alexdrl

two questions:

  • can you please verify if this still is a problem with more recent releases of 2.2.x branch?
  • if it isn't, do you think we could figure out a reproducer for this? would probably entail creating the pool on an older release, right? what else?

@pnc87
Copy link

pnc87 commented Jan 13, 2025

Hi,

I'm having a very similar issue. About 7 or 8 days ago, one of my hard drives in one of the mirrored VDEVs went down, all the other drives were reporting back as good, so I decided to offline the drive and remove the VDEV entirely since my overall pool has enough storage space to absorb the data. I reassigned the now extra drive as a spare vdev for the pool. Everything seemed to run fine and I did a SMART test to make sure all the other drives were good, and a scrub to make sure the pool was still healthy - all checks came back with no errors. Then last night I was trying to access my SMB share and I noticed that it was inaccessible, so I tried to log in to the TrueNAS server and realized it was completely frozen. This is the first time this has ever happened, so I rebooted it and when it was starting it went directly into a Kernel Panic. I’ve tried to do research on what may be causing this issue but I haven’t been able to find much aside from one issue that seemed unrelated. So far the only troubleshooting I could think of is running a MEMTEST86 test to make sure the memory is good, but aside from that I don’t know what else to do.

IMG20250112123917

@pnc87
Copy link

pnc87 commented Jan 14, 2025

Hi! For anyone that is looking for further assistance if they're running into a similar issue, please see this thread: https://forums.truenas.com/t/kernel-panic-with-strange-error/30832/17

Shout out to the TrueNAS community for ALL the help!

@marcelfarres
Copy link

Hi!

Another instance of the same problem.

I added a new vdev of 2 drives and remove the original vdev using the TrueNas GUI.

It work for a couple of days and reboots no problem and today the server just went offline and got stuck in a boot loop worjt the same kernel panic.

amotin added a commit to amotin/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
amotin added a commit to truenas/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
amotin added a commit to truenas/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
amotin added a commit to truenas/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
amotin added a commit to truenas/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
amotin added a commit to truenas/zfs that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
behlendorf pushed a commit that referenced this issue Mar 26, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes #15604
Closes #17180
amotin added a commit to truenas/zfs that referenced this issue Mar 27, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Fixes openzfs#15604
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 3, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15604
Closes openzfs#17180
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this issue Apr 11, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15604
Closes openzfs#17180
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this issue Apr 11, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15604
Closes openzfs#17180
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 16, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15604
Closes openzfs#17180
behlendorf pushed a commit to tonyhutter/zfs that referenced this issue Jun 2, 2025
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15604
Closes openzfs#17180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants