Closed
Description
MPI and System
OpenMPI from the NVIDIA HPC SDK. IIUC, built from HPC-X 2.15 sources.
ompi_info
Package: Open MPI qa@sky1 Distribution
Open MPI: 4.1.5rc2
Open MPI repo revision: v4.1.5rc1-16-g5980bac
Open MPI release date: Unreleased developer copy
Open RTE: 4.1.5rc2
Open RTE repo revision: v4.1.5rc1-16-g5980bac
Open RTE release date: Unreleased developer copy
OPAL: 4.1.5rc2
OPAL repo revision: v4.1.5rc1-16-g5980bac
OPAL release date: Unreleased developer copy
MPI API: 3.1.0
Ident string: 4.1.5rc2
Prefix: /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi
Configured architecture: x86_64-pc-linux-gnu
Configure host: sky1
Configured by: qa
Configured on: Wed May 10 16:39:18 UTC 2023
Configure host: sky1
Configure command line: 'CC=gcc' 'CXX=g++' 'FC=nvfortran'
'LDFLAGS=-Wl,-rpath-link=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucx/lib
-Wl,-rpath-link=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/hcoll/lib'
'--with-platform=../contrib/platform/nvhpc/optimized'
'--enable-mpi1-compatibility'
'--with-libevent=internal' '--without-xpmem'
'--with-slurm'
'--with-cuda=/proj/cuda/12.1/Linux_x86_64'
'--with-hcoll=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/hcoll'
'--with-ucc=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucc'
'--with-ucx=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucx'
'--prefix=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ompi'
Built by: qa
Built on: Wed May 10 16:43:47 UTC 2023
Built host: sky1
C bindings: yes
C++ bindings: no
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the nvfortran compiler and/or Open
MPI, does not support the following: array
subsections, direct passthru (where possible) to
underlying Open MPI's C functionality
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C compiler family name: GNU
C compiler version: 4.8.5
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fort compiler: nvfortran
Fort compiler abs: /proj/nv/Linux_x86_64/23.5/compilers/bin/nvfortran
Fort ignore TKR: yes (!DIR$ IGNORE_TKR)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort PROTECTED: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort USE...ONLY: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
C profiling: yes
C++ profiling: no
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, ORTE progress: yes, Event lib:
yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: never
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: no
mpirun default --prefix: yes
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
IPv6 support: no
MPI1 compatibility: yes
MPI extensions: affinity, cuda, pcollreq
FT Checkpoint support: no (checkpoint thread: no)
C/R Enabled Debugging: no
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.5)
MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.1.5)
MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.5)
MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
v4.1.5)
MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.1.5)
MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v4.1.5)
MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v4.1.5)
MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA schizo: jsm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: han (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: ucc (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA coll: adapt (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: hcoll (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA op: avx (MCA v2.1.0, API v1.0.0, Component v4.1.5)
MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
v4.1.5)
MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.5)
MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.1.5)
MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
v4.1.5)
MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
v4.1.5)
- Operating system/version: Ubuntu 22.04,
uname -r
: 5.4.0-84-generic - Computer hardware: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, single node, no HCA.
- Network type: N/A.
Details of the problem
Using MPI I/O to write to a file from a device-only memory allocation (e.g. allocated with cudaMalloc
) fails. Allocating that same memory in a host-accessible way, e.g., using cudaMallocManaged
works.
The reproducer file is here:
reproducer.cpp
#include <iostream>
#include <mpi.h>
#include <cuda_runtime_api.h>
int main(int argc, char* argv[]) {
int N = 10;
int* p;
if (auto e = cudaMalloc(&p, sizeof(int) * N); e != cudaSuccess) std::cerr << __LINE__, abort();
if (auto e = cudaMemset(p, (int)'7', sizeof(int) * N); e != cudaSuccess) std::cerr << __LINE__, abort();
if (auto e = cudaDeviceSynchronize(); e != cudaSuccess) std::cerr << __LINE__, abort();
int mt = -1;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mt);
if (mt != MPI_THREAD_MULTIPLE) std::cerr << __LINE__, abort();
int nranks, rank;
MPI_Comm_size(MPI_COMM_WORLD, &nranks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_File f;
MPI_File_open(MPI_COMM_WORLD, "output", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &f);
MPI_Offset bytes = sizeof(int) * (MPI_Offset)N;
MPI_Offset total_bytes = bytes * (MPI_Offset)nranks;
MPI_Offset off = bytes * (MPI_Offset)rank;
MPI_File_set_size(f, total_bytes);
MPI_Request req;
MPI_File_iwrite_at(f, off, p, bytes, MPI_INT, &req);
MPI_Waitall(1, &req, MPI_STATUSES_IGNORE);
MPI_File_close(&f);
MPI_Finalize();
return 0;
}
Compile it with any CUDA C++ compiler, e.g., nvcc
or nvc++
and running it
OMPI_CXX=nvc++ mpicxx -std=c++20 -stdpar=gpu -o mpi_io_bug mpi_io_bug.cpp
mpirun -np 2 ./mpi_io_bug
fails with this error:
The call to cuMemcpyAsync failed. This is a unrecoverable error and will
cause the program to abort.
cuMemcpyAsync(0x1b1b5f8, 0x7f25f4a00000, 160) returned value 1
The expected behavior is for this to work correctly.
Full Error Message
--------------------------------------------------------------------------
The call to cuMemcpyAsync failed. This is a unrecoverable error and will
cause the program to abort.
cuMemcpyAsync(0x1b1b5f8, 0x7f25f4a00000, 160) returned value 1
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
[ipp2-0153.nvidia.com:00861] CUDA: Error in cuMemcpy: res=-1, dest=0x1b1b5f8, src=0x7f25f4a00000, size=160
[ipp2-0153:00861] *** Process received signal ***
[ipp2-0153:00861] Signal: Aborted (6)
[ipp2-0153:00861] Signal code: (-6)
[ipp2-0153.nvidia.com:00860] CUDA: Error in cuMemcpy: res=-1, dest=0x30f1908, src=0x7fc2f6a00000, size=160
[ipp2-0153:00860] *** Process received signal ***
[ipp2-0153:00860] Signal: Aborted (6)
[ipp2-0153:00860] Signal code: (-6)
[ipp2-0153:00861] [ 0] [ipp2-0153:00860] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f281ae1a520]
[ipp2-0153:00861] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f281ae6ea7c]
[ipp2-0153:00861] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc51ca1a520]
[ipp2-0153:00860] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fc51ca6ea7c]
[ipp2-0153:00860] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f281ae1a476]
[ipp2-0153:00861] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fc51ca1a476]
[ipp2-0153:00860] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f281ae007f3]
[ipp2-0153:00861] [ 4] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fc51ca007f3]
[ipp2-0153:00860] [ 4] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(+0x55829)[0x7f281a655829]
[ipp2-0153:00861] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(opal_convertor_pack+0x18f)[0x7f281a647bcf]
[ipp2-0153:00861] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite+0x281)[0x7f25c340aae1]
[ipp2-0153:00861] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite_at+0x49)[0x7f25c340ae39]
[ipp2-0153:00861] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/openmpi/mca_io_ompio.so(mca_io_ompio_file_iwrite_at+0x26)[0x7f25c3805b56]
[ipp2-0153:00861] [ 9] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(+0x55829)[0x7fc51c255829]
[ipp2-0153:00860] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(opal_convertor_pack+0x18f)[0x7fc51c247bcf]
[ipp2-0153:00860] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite+0x281)[0x7fc2c140aae1]
[ipp2-0153:00860] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite_at+0x49)[0x7fc2c140ae39]
[ipp2-0153:00860] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/openmpi/mca_io_ompio.so(mca_io_ompio_file_iwrite_at+0x26)[0x7fc2c9405b56]
[ipp2-0153:00860] [ 9] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmpi.so.40(PMPI_File_iwrite_at+0x5e)[0x7f281f2679ce]
[ipp2-0153:00861] [10] ./mpi_io_bug[0x4024c6]
[ipp2-0153:00861] [11] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmpi.so.40(PMPI_File_iwrite_at+0x5e)[0x7fc520e679ce]
[ipp2-0153:00860] [10] ./mpi_io_bug[0x4024c6]
[ipp2-0153:00860] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f281ae01d90]
[ipp2-0153:00861] [12] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fc51ca01d90]
[ipp2-0153:00860] [12] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f281ae01e40]
[ipp2-0153:00861] [13] ./mpi_io_bug[0x402295]
[ipp2-0153:00861] *** End of error message ***
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc51ca01e40]
[ipp2-0153:00860] [13] ./mpi_io_bug[0x402295]
[ipp2-0153:00860] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ipp2-0153 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
[ipp2-0153.nvidia.com:00856] 1 more process has sent help message help-mpi-common-cuda.txt / cuMemcpyAsync failed
[ipp2-0153.nvidia.com:00856] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages