Skip to content

New vader SEGV possibly PR 5829 #5842

Closed
@gpaulsen

Description

@gpaulsen

@amckinstry commented on #5829

Original issue was: #5638

Since that was merged to all release branches, this is a blocker on all release branches.

Unfortunately we now see a related crash on other codes (lammps):

#0 0x0000000000000000 in ()
#1 0x00007f7a700f8a5f in mca_btl_vader_poll_handle_frag (hdr=0x7f7a6a1bf049, endpoint=endpoint@entry=0x55e02df87bd0) at btl_vader_component.c:603
#2 0x00007f7a700f8f83 in mca_btl_vader_check_fboxes () at btl_vader_fbox.h:225

#3 0x00007f7a700f8f83 in mca_btl_vader_component_progress () at btl_vader_component.c:702
#4 0x00007f7a828fb1cc in opal_progress () at runtime/opal_progress.c:228
#5 0x00007f7a8d816ebd in ompi_request_wait_completion (req=0x55e02df8d900) at ../ompi/request/request.h:413
#6 0x00007f7a8d816ebd in ompi_request_default_wait (req_ptr=0x7fffa70c3708, status=0x7fffa70c3710) at request/req_wait.c:42
#7 0x00007f7a8d86ec41 in ompi_coll_base_sendrecv_actual (sendbuf=sendbuf@entry=0x55e02df97de0, scount=scount@entry=1, sdatatype=sdatatype@entry=0x55e02cace1e0 <ompi_mpi_int>, dest=dest@entry=0, stag=stag@entry=-12, recvbuf=recvbuf@entry=0x7fffa70c3934, rcount=1, rdatatype=0x55e02cace1e0 <ompi_mpi_int>, source=0, rtag=-12, comm=0x55e02cacf700 <ompi_mpi_comm_world>, status=0x0) at base/coll_base_util.c:59

it looks like hdr->tag is invaliid, hence segdfault

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions