Closed
Description
Cisco just added OSHMEM testing to its MTT 2 weeks ago (at the Dallas engineering meeting).
We're seeing a large failure rate on v2.x with OSHMEM testing using TCP,vader,self. For example: https://mtt.open-mpi.org/index.php?do_redir=2347
This shows 1,624 failures and 6,546 passes. I.e., a nearly 20% failure rate. 😱
Many of the failures show this kind of error message:
[mpi006:31821] Error base/memheap_base_mkey.c:162 - memheap_attach_segment() tr_id: 1 key 54ba0015
attach failed: errno = 12
Does anyone know what this means?