Skip to content

Device to Device transfers don't work with OpenMPI + LinkX provider on AMD GPUs #13048

Open
@angainor

Description

@angainor

OpenMPI 5.0.6 with shm+cxi:lnx fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with

export FI_LNX_PROV_LINKS=shm+cxi
mpirun --mca opal_common_ofi_provider_include "shm+cxi:lnx" -np 2 -map-by numa ./osu_bibw -m 131072: D D

# OSU MPI-ROCM Bi-Directional Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
--------------------------------------------------------------------------
Open MPI failed to register your buffer.
This error is fatal, your job will abort

  Buffer Type: rocm
  Buffer Address: 0x154beaa00000
  Buffer Length: 131072
  Error: Required key not available (4294967030)
--------------------------------------------------------------------------

@hppritcha identified the problem to be related to #11076. There was a fix for this issue in #12290, but it was not merged to the 5.x branch.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions