Skip to content

Commit 2dcb96b

Browse files
KAGA-KOKOdavem330
authored andcommitted
net: core: Correct the sock::sk_lock.owned lockdep annotations
lock_sock_fast() and lock_sock_nested() contain lockdep annotations for the sock::sk_lock.owned 'mutex'. sock::sk_lock.owned is not a regular mutex. It is just lockdep wise equivalent. In fact it's an open coded trivial mutex implementation with some interesting features. sock::sk_lock.slock is a regular spinlock protecting the 'mutex' representation sock::sk_lock.owned which is a plain boolean. If 'owned' is true, then some other task holds the 'mutex', otherwise it is uncontended. As this locking construct is obviously endangered by lock ordering issues as any other locking primitive it got lockdep annotated via a dedicated dependency map sock::sk_lock.dep_map which has to be updated at the lock and unlock sites. lock_sock_nested() is a straight forward 'mutex' lock operation: might_sleep(); spin_lock_bh(sock::sk_lock.slock) while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } The lockdep annotation for sock::sk_lock.owned is for unknown reasons _after_ the lock has been acquired, i.e. after the code block above and after releasing sock::sk_lock.slock, but inside the bottom halves disabled region: spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); The placement after the unlock is obvious because otherwise the mutex_acquire() would nest into the spin lock held region. But that's from the lockdep perspective still the wrong place: 1) The mutex_acquire() is issued _after_ the successful acquisition which is pointless because in a dead lock scenario this point is never reached which means that if the deadlock is the first instance of exposing the wrong lock order lockdep does not have a chance to detect it. 2) It only works because lockdep is rather lax on the context from which the mutex_acquire() is issued. Acquiring a mutex inside a bottom halves and therefore non-preemptible region is obviously invalid, except for a trylock which is clearly not the case here. This 'works' stops working on RT enabled kernels where the bottom halves serialization is done via a local lock, which exposes this misplacement because the 'mutex' and the local lock nest the wrong way around and lockdep complains rightfully about a lock inversion. The placement is wrong since the initial commit a5b5bb9 ("[PATCH] lockdep: annotate sk_locks") which introduced this. Fix it by moving the mutex_acquire() in front of the actual lock acquisition, which is what the regular mutex_lock() operation does as well. lock_sock_fast() is not that straight forward. It looks at the first glance like a convoluted trylock operation: spin_lock_bh(sock::sk_lock.slock) if (!sock::sk_lock.owned) return false; while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); return true; But that's not the case: lock_sock_fast() is an interesting optimization for short critical sections which can run with bottom halves disabled and sock::sk_lock.slock held. This allows to shortcut the 'mutex' operation in the non contended case by preventing other lockers to acquire sock::sk_lock.owned because they are blocked on sock::sk_lock.slock, which in turn avoids the overhead of doing the heavy processing in release_sock() including waking up wait queue waiters. In the contended case, i.e. when sock::sk_lock.owned == true the behavior is the same as lock_sock_nested(). Semantically this shortcut means, that the task acquired the 'mutex' even if it does not touch the sock::sk_lock.owned field in the non-contended case. Not telling lockdep about this shortcut acquisition is hiding potential lock ordering violations in the fast path. As a consequence the same reasoning as for the above lock_sock_nested() case vs. the placement of the lockdep annotation applies. The current placement of the lockdep annotation was just copied from the original lock_sock(), now renamed to lock_sock_nested(), implementation. Fix this by moving the mutex_acquire() in front of the actual lock acquisition and adding the corresponding mutex_release() into unlock_sock_fast(). Also document the fast path return case with a comment. Reported-by: Sebastian Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 48e6d08 commit 2dcb96b

File tree

2 files changed

+24
-14
lines changed

2 files changed

+24
-14
lines changed

include/net/sock.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1640,6 +1640,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
16401640
release_sock(sk);
16411641
__release(&sk->sk_lock.slock);
16421642
} else {
1643+
mutex_release(&sk->sk_lock.dep_map, _RET_IP_);
16431644
spin_unlock_bh(&sk->sk_lock.slock);
16441645
}
16451646
}

net/core/sock.c

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3179,17 +3179,15 @@ EXPORT_SYMBOL(sock_init_data);
31793179

31803180
void lock_sock_nested(struct sock *sk, int subclass)
31813181
{
3182+
/* The sk_lock has mutex_lock() semantics here. */
3183+
mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
3184+
31823185
might_sleep();
31833186
spin_lock_bh(&sk->sk_lock.slock);
31843187
if (sk->sk_lock.owned)
31853188
__lock_sock(sk);
31863189
sk->sk_lock.owned = 1;
3187-
spin_unlock(&sk->sk_lock.slock);
3188-
/*
3189-
* The sk_lock has mutex_lock() semantics here:
3190-
*/
3191-
mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
3192-
local_bh_enable();
3190+
spin_unlock_bh(&sk->sk_lock.slock);
31933191
}
31943192
EXPORT_SYMBOL(lock_sock_nested);
31953193

@@ -3227,24 +3225,35 @@ EXPORT_SYMBOL(release_sock);
32273225
*/
32283226
bool lock_sock_fast(struct sock *sk) __acquires(&sk->sk_lock.slock)
32293227
{
3228+
/* The sk_lock has mutex_lock() semantics here. */
3229+
mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
3230+
32303231
might_sleep();
32313232
spin_lock_bh(&sk->sk_lock.slock);
32323233

3233-
if (!sk->sk_lock.owned)
3234+
if (!sk->sk_lock.owned) {
32343235
/*
3235-
* Note : We must disable BH
3236+
* Fast path return with bottom halves disabled and
3237+
* sock::sk_lock.slock held.
3238+
*
3239+
* The 'mutex' is not contended and holding
3240+
* sock::sk_lock.slock prevents all other lockers to
3241+
* proceed so the corresponding unlock_sock_fast() can
3242+
* avoid the slow path of release_sock() completely and
3243+
* just release slock.
3244+
*
3245+
* From a semantical POV this is equivalent to 'acquiring'
3246+
* the 'mutex', hence the corresponding lockdep
3247+
* mutex_release() has to happen in the fast path of
3248+
* unlock_sock_fast().
32363249
*/
32373250
return false;
3251+
}
32383252

32393253
__lock_sock(sk);
32403254
sk->sk_lock.owned = 1;
3241-
spin_unlock(&sk->sk_lock.slock);
3242-
/*
3243-
* The sk_lock has mutex_lock() semantics here:
3244-
*/
3245-
mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
32463255
__acquire(&sk->sk_lock.slock);
3247-
local_bh_enable();
3256+
spin_unlock_bh(&sk->sk_lock.slock);
32483257
return true;
32493258
}
32503259
EXPORT_SYMBOL(lock_sock_fast);

0 commit comments

Comments
 (0)