Skip to content

OSHMEM yoda spml failures: need to update to BTL v3.0 #2028

Closed
@jsquyres

Description

@jsquyres

Cisco just added OSHMEM testing to its MTT 2 weeks ago (at the Dallas engineering meeting).

We're seeing a large failure rate on v2.x with OSHMEM testing using TCP,vader,self. For example: https://mtt.open-mpi.org/index.php?do_redir=2347

This shows 1,624 failures and 6,546 passes. I.e., a nearly 20% failure rate. 😱

Many of the failures show this kind of error message:

[mpi006:31821] Error base/memheap_base_mkey.c:162 - memheap_attach_segment() tr_id: 1 key 54ba0015
attach failed: errno = 12

Does anyone know what this means?

@artpol84 @jladd-mlnx

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions