Skip to content

Commit 305d568

Browse files
committed
RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
The FSM can run in a circle allowing rdma_resolve_ip() to be called twice on the same id_priv. While this cannot happen without going through the work, it violates the invariant that the same address resolution background request cannot be active twice. CPU 1 CPU 2 rdma_resolve_addr(): RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) #1 process_one_req(): for #1 addr_handler(): RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND mutex_unlock(&id_priv->handler_mutex); [.. handler still running ..] rdma_resolve_addr(): RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) !! two requests are now on the req_list rdma_destroy_id(): destroy_id_handler_unlock(): _destroy_id(): cma_cancel_operation(): rdma_addr_cancel() // process_one_req() self removes it spin_lock_bh(&lock); cancel_delayed_work(&req->work); if (!list_empty(&req->list)) == true ! rdma_addr_cancel() returns after process_on_req #1 is done kfree(id_priv) process_one_req(): for #2 addr_handler(): mutex_lock(&id_priv->handler_mutex); !! Use after free on id_priv rdma_addr_cancel() expects there to be one req on the list and only cancels the first one. The self-removal behavior of the work only happens after the handler has returned. This yields a situations where the req_list can have two reqs for the same "handle" but rdma_addr_cancel() only cancels the first one. The second req remains active beyond rdma_destroy_id() and will use-after-free id_priv once it inevitably triggers. Fix this by remembering if the id_priv has called rdma_resolve_ip() and always cancel before calling it again. This ensures the req_list never gets more than one item in it and doesn't cost anything in the normal flow that never uses this strange error path. Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Fixes: e51060f ("IB: IP address based RDMA connection manager") Reported-by: [email protected] Signed-off-by: Jason Gunthorpe <[email protected]>
1 parent bc0bdc5 commit 305d568

File tree

2 files changed

+24
-0
lines changed

2 files changed

+24
-0
lines changed

drivers/infiniband/core/cma.c

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1783,6 +1783,14 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv,
17831783
{
17841784
switch (state) {
17851785
case RDMA_CM_ADDR_QUERY:
1786+
/*
1787+
* We can avoid doing the rdma_addr_cancel() based on state,
1788+
* only RDMA_CM_ADDR_QUERY has a work that could still execute.
1789+
* Notice that the addr_handler work could still be exiting
1790+
* outside this state, however due to the interaction with the
1791+
* handler_mutex the work is guaranteed not to touch id_priv
1792+
* during exit.
1793+
*/
17861794
rdma_addr_cancel(&id_priv->id.route.addr.dev_addr);
17871795
break;
17881796
case RDMA_CM_ROUTE_QUERY:
@@ -3425,6 +3433,21 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
34253433
if (dst_addr->sa_family == AF_IB) {
34263434
ret = cma_resolve_ib_addr(id_priv);
34273435
} else {
3436+
/*
3437+
* The FSM can return back to RDMA_CM_ADDR_BOUND after
3438+
* rdma_resolve_ip() is called, eg through the error
3439+
* path in addr_handler(). If this happens the existing
3440+
* request must be canceled before issuing a new one.
3441+
* Since canceling a request is a bit slow and this
3442+
* oddball path is rare, keep track once a request has
3443+
* been issued. The track turns out to be a permanent
3444+
* state since this is the only cancel as it is
3445+
* immediately before rdma_resolve_ip().
3446+
*/
3447+
if (id_priv->used_resolve_ip)
3448+
rdma_addr_cancel(&id->route.addr.dev_addr);
3449+
else
3450+
id_priv->used_resolve_ip = 1;
34283451
ret = rdma_resolve_ip(cma_src_addr(id_priv), dst_addr,
34293452
&id->route.addr.dev_addr,
34303453
timeout_ms, addr_handler,

drivers/infiniband/core/cma_priv.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ struct rdma_id_private {
9191
u8 afonly;
9292
u8 timeout;
9393
u8 min_rnr_timer;
94+
u8 used_resolve_ip;
9495
enum ib_gid_type gid_type;
9596

9697
/*

0 commit comments

Comments
 (0)