Skip to content

Commit a2c1c57

Browse files
committed
workqueue: consider work function when searching for busy work items
To avoid executing the same work item concurrenlty, workqueue hashes currently busy workers according to their current work items and looks up the the table when it wants to execute a new work item. If there already is a worker which is executing the new work item, the new item is queued to the found worker so that it gets executed only after the current execution finishes. Unfortunately, a work item may be freed while being executed and thus recycled for different purposes. If it gets recycled for a different work item and queued while the previous execution is still in progress, workqueue may make the new work item wait for the old one although the two aren't really related in any way. In extreme cases, this false dependency may lead to deadlock although it's extremely unlikely given that there aren't too many self-freeing work item users and they usually don't wait for other work items. To alleviate the problem, record the current work function in each busy worker and match it together with the work item address in find_worker_executing_work(). While this isn't complete, it ensures that unrelated work items don't interact with each other and in the very unlikely case where a twisted wq user triggers it, it's always onto itself making the culprit easy to spot. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Andrey Isakov <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51701 Cc: [email protected]
1 parent 42f8570 commit a2c1c57

File tree

1 file changed

+31
-8
lines changed

1 file changed

+31
-8
lines changed

kernel/workqueue.c

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ struct worker {
137137
};
138138

139139
struct work_struct *current_work; /* L: work being processed */
140+
work_func_t current_func; /* L: current_work's fn */
140141
struct cpu_workqueue_struct *current_cwq; /* L: current_work's cwq */
141142
struct list_head scheduled; /* L: scheduled works */
142143
struct task_struct *task; /* I: worker task */
@@ -861,9 +862,27 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
861862
* @gcwq: gcwq of interest
862863
* @work: work to find worker for
863864
*
864-
* Find a worker which is executing @work on @gcwq. This function is
865-
* identical to __find_worker_executing_work() except that this
866-
* function calculates @bwh itself.
865+
* Find a worker which is executing @work on @gcwq by searching
866+
* @gcwq->busy_hash which is keyed by the address of @work. For a worker
867+
* to match, its current execution should match the address of @work and
868+
* its work function. This is to avoid unwanted dependency between
869+
* unrelated work executions through a work item being recycled while still
870+
* being executed.
871+
*
872+
* This is a bit tricky. A work item may be freed once its execution
873+
* starts and nothing prevents the freed area from being recycled for
874+
* another work item. If the same work item address ends up being reused
875+
* before the original execution finishes, workqueue will identify the
876+
* recycled work item as currently executing and make it wait until the
877+
* current execution finishes, introducing an unwanted dependency.
878+
*
879+
* This function checks the work item address, work function and workqueue
880+
* to avoid false positives. Note that this isn't complete as one may
881+
* construct a work function which can introduce dependency onto itself
882+
* through a recycled work item. Well, if somebody wants to shoot oneself
883+
* in the foot that badly, there's only so much we can do, and if such
884+
* deadlock actually occurs, it should be easy to locate the culprit work
885+
* function.
867886
*
868887
* CONTEXT:
869888
* spin_lock_irq(gcwq->lock).
@@ -878,8 +897,10 @@ static struct worker *find_worker_executing_work(struct global_cwq *gcwq,
878897
struct worker *worker;
879898
struct hlist_node *tmp;
880899

881-
hash_for_each_possible(gcwq->busy_hash, worker, tmp, hentry, (unsigned long)work)
882-
if (worker->current_work == work)
900+
hash_for_each_possible(gcwq->busy_hash, worker, tmp, hentry,
901+
(unsigned long)work)
902+
if (worker->current_work == work &&
903+
worker->current_func == work->func)
883904
return worker;
884905

885906
return NULL;
@@ -2114,7 +2135,6 @@ __acquires(&gcwq->lock)
21142135
struct worker_pool *pool = worker->pool;
21152136
struct global_cwq *gcwq = pool->gcwq;
21162137
bool cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE;
2117-
work_func_t f = work->func;
21182138
int work_color;
21192139
struct worker *collision;
21202140
#ifdef CONFIG_LOCKDEP
@@ -2154,6 +2174,7 @@ __acquires(&gcwq->lock)
21542174
debug_work_deactivate(work);
21552175
hash_add(gcwq->busy_hash, &worker->hentry, (unsigned long)worker);
21562176
worker->current_work = work;
2177+
worker->current_func = work->func;
21572178
worker->current_cwq = cwq;
21582179
work_color = get_work_color(work);
21592180

@@ -2186,7 +2207,7 @@ __acquires(&gcwq->lock)
21862207
lock_map_acquire_read(&cwq->wq->lockdep_map);
21872208
lock_map_acquire(&lockdep_map);
21882209
trace_workqueue_execute_start(work);
2189-
f(work);
2210+
worker->current_func(work);
21902211
/*
21912212
* While we must be careful to not use "work" after this, the trace
21922213
* point will only record its address.
@@ -2198,7 +2219,8 @@ __acquires(&gcwq->lock)
21982219
if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
21992220
pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d\n"
22002221
" last function: %pf\n",
2201-
current->comm, preempt_count(), task_pid_nr(current), f);
2222+
current->comm, preempt_count(), task_pid_nr(current),
2223+
worker->current_func);
22022224
debug_show_held_locks(current);
22032225
dump_stack();
22042226
}
@@ -2212,6 +2234,7 @@ __acquires(&gcwq->lock)
22122234
/* we're done with it, release */
22132235
hash_del(&worker->hentry);
22142236
worker->current_work = NULL;
2237+
worker->current_func = NULL;
22152238
worker->current_cwq = NULL;
22162239
cwq_dec_nr_in_flight(cwq, work_color);
22172240
}

0 commit comments

Comments
 (0)