Open
Description
OpenMPI 5.0.6 with shm+cxi:lnx
fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with
export FI_LNX_PROV_LINKS=shm+cxi
mpirun --mca opal_common_ofi_provider_include "shm+cxi:lnx" -np 2 -map-by numa ./osu_bibw -m 131072: D D
# OSU MPI-ROCM Bi-Directional Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size Bandwidth (MB/s)
--------------------------------------------------------------------------
Open MPI failed to register your buffer.
This error is fatal, your job will abort
Buffer Type: rocm
Buffer Address: 0x154beaa00000
Buffer Length: 131072
Error: Required key not available (4294967030)
--------------------------------------------------------------------------
@hppritcha identified the problem to be related to #11076. There was a fix for this issue in #12290, but it was not merged to the 5.x branch.