Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.7
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Obtained from https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.gz
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
N/A
Please describe the system on which you are running
- Operating system/version: Ubuntu 24.04 LTS
- Computer hardware: x86_64
- Network type: single node, single GPU
Details of the problem
I'm really struggling to run CUDA-aware MPI on just one node. I want to do this so that I can test my code locally before deploying to a cluster. I've reproduced this on a fresh install of Ubuntu 24.04 on two different machines.
Here's my install steps:
- Install NVIDIA CUDA toolkit from https://developer.nvidia.com/cuda-downloads
- Execute commands below to build OpenMPI from source
tar xf openmpi-5.0.7.tar.gz
cd openmpi-5.0.7
mkdir build
cd build
../configure --with-cuda=/usr/local/cuda --prefix=/opt/openmpi | tee config.out
make -j$(nproc) all | tee make.out
sudo make install
Now, I build a very simple test program:
// mpi_check.c
#include "mpi.h"
#include <stdio.h>
#if !defined(OPEN_MPI) || !OPEN_MPI
#error This source code uses an Open MPI-specific extension
#endif
/* Needed for MPIX_Query_cuda_support(), below */
#include "mpi-ext.h"
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
printf("Compile time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT) && MPIX_CUDA_AWARE_SUPPORT
printf("This MPI library has CUDA-aware support.\n");
#else
printf("This MPI library does not have CUDA-aware support.\n");
#endif /* MPIX_CUDA_AWARE_SUPPORT */
printf("Run time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT)
if (1 == MPIX_Query_cuda_support()) {
printf("This MPI library has CUDA-aware support.\n");
}
else {
printf("This MPI library does not have CUDA-aware support.\n");
}
#endif /* MPIX_CUDA_AWARE_SUPPORT */
MPI_Finalize();
return 0;
}
This was built with:
/opt/openmpi/bin/mpicc mpi_check.c -o mpi_check
/opt/openmpi/bin/mpirun -n 1 ./mpi_check
Then, we get this output:
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.
However, if I just run ./mpi_check
, i.e. no mpirun
, I get this output:
Authorization required, but no authorization protocol specified
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.
There's no other MPI installations, this was reproduced on two independent machines.
Perhaps I'm missing a step, or missing some configuration, but I've tried lots of variations of each of the above commands to no avail, and (I think?) I've followed the install instructions in the documentation correctly. So I believe it is a bug.
If I'm missing something, please let me know. Also please let me know if you'd like the config.out
and make.out
log files.