Skip to content

Commit ebac8f0

Browse files
robnlundman
authored andcommitted
zio: lock parent zios when updating wait counts on reexecute
As zios are reexecuted after resume from suspension, their ready and wait states need to be propagated to wait counts on all their parents. It's possible for those parents to have active children passing through READY or DONE, which then end up in zio_notify_parent(), take their parent's lock, and decrement the wait count. Without also taking a lock here, it's possible for an increment race to occur, which leads to either there being no references left (tripping the assert in zio_notify_parent()), or a parent waiting forever for a nonexistent child to complete. To protect against this, we simply take the appropriate zio locks in zio_reexecute() before updating the wait counts. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17016
1 parent 0d91f3b commit ebac8f0

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

module/zfs/zio.c

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
* Copyright (c) 2011, 2022 by Delphix. All rights reserved.
2424
* Copyright (c) 2011 Nexenta Systems, Inc. All rights reserved.
2525
* Copyright (c) 2017, Intel Corporation.
26-
* Copyright (c) 2019, 2023, 2024, Klara Inc.
26+
* Copyright (c) 2019, 2023, 2024, 2025, Klara, Inc.
2727
* Copyright (c) 2019, Allan Jude
2828
* Copyright (c) 2021, Datto, Inc.
2929
* Copyright (c) 2021, 2024 by George Melikov. All rights reserved.
@@ -2537,13 +2537,29 @@ zio_reexecute(void *arg)
25372537
pio->io_state[ZIO_WAIT_READY] = (pio->io_stage >= ZIO_STAGE_READY) ||
25382538
(pio->io_pipeline & ZIO_STAGE_READY) == 0;
25392539
pio->io_state[ZIO_WAIT_DONE] = (pio->io_stage >= ZIO_STAGE_DONE);
2540+
2541+
/*
2542+
* It's possible for a failed ZIO to be a descendant of more than one
2543+
* ZIO tree. When reexecuting it, we have to be sure to add its wait
2544+
* states to all parent wait counts.
2545+
*
2546+
* Those parents, in turn, may have other children that are currently
2547+
* active, usually because they've already been reexecuted after
2548+
* resuming. Those children may be executing and may call
2549+
* zio_notify_parent() at the same time as we're updating our parent's
2550+
* counts. To avoid races while updating the counts, we take
2551+
* gio->io_lock before each update.
2552+
*/
25402553
zio_link_t *zl = NULL;
25412554
while ((gio = zio_walk_parents(pio, &zl)) != NULL) {
2555+
mutex_enter(&gio->io_lock);
25422556
for (int w = 0; w < ZIO_WAIT_TYPES; w++) {
25432557
gio->io_children[pio->io_child_type][w] +=
25442558
!pio->io_state[w];
25452559
}
2560+
mutex_exit(&gio->io_lock);
25462561
}
2562+
25472563
for (int c = 0; c < ZIO_CHILD_TYPES; c++)
25482564
pio->io_child_error[c] = 0;
25492565

0 commit comments

Comments
 (0)