Closed
Description
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.0.x
master
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
./configure --prefix=`pwd`/install --enable-orterun-prefix-by-default --with-slurm --with-pmi --with-ucx
Please describe the system on which you are running
- Operating system/version:
RedHat 7.2 - Computer hardware:
Intel dual socket Broadwell - Network type:
IB
Details of the problem
Running on nodes node1,node2
works well, but if change the order of the nodes to node2,node1
this will result to failure:
ssh node1
mpirun --bind-to core --map-by node -H node2,node1 -np 2 $HPCX_MPI_DIR/tests/osu-micro-benchmarks-5.3.2/osu_allreduce
--------------------------------------------------------------------------
[node2:13941] Error: pml_yalla.c:95 - recv_ep_address() Failed to receive EP address
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)