Skip to content

Deadlock when using MPI with Python packages and DFTK #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
simonganne01 opened this issue Feb 24, 2025 · 5 comments
Open

Deadlock when using MPI with Python packages and DFTK #1067

simonganne01 opened this issue Feb 24, 2025 · 5 comments
Labels
documentation Correcting or expanding documentation

Comments

@simonganne01
Copy link

Hello everyone,

I'm heaving problems parallelizing DFTK. I run the scripts added to this message. When I run multithreaded vs single-threaded the following is the output:

julia -tauto --project=. run_DFTK.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 22
│   n_DFTK = 22
│   n_fft = 1
└   n_blas = 22
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188027919356e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -157.3406674143                   -0.25    6.2    32.0s
  2   -157.8496486906       -0.29       -0.98    6.1    25.4s
  3   -157.8596014620       -2.00       -1.40    8.2    34.4s
  4   -157.8603026113       -3.15       -2.15    2.0    10.5s
  5   -157.8603796767       -4.11       -2.81    3.9    16.6s
  6   -157.8603823541       -5.57       -3.01    3.3    16.4s
  7   -157.8603833636       -6.00       -3.21    1.6    6.82s
  8   -157.8603838541       -6.31       -3.48    2.1    11.7s
  9   -157.8603838612       -8.15       -3.53    1.9    10.3s
 10   -157.8603838773       -7.79       -3.69    1.1    7.30s
 11   -157.8603838863       -8.05       -3.93    1.8    10.5s
 12   -157.8603838916       -8.28       -4.34    2.0    11.8s
 13   -157.8603838933       -8.76       -5.20    2.9    14.0s
 14   -157.8603838934      -10.04       -5.46    4.2    19.2s
 15   -157.8603838934      -11.38       -5.90    1.6    8.68s
 16   -157.8603838934      -13.07       -6.53    2.8    11.2s
 17   -157.8603838934      -12.55       -6.92    4.0    17.8s
 18   -157.8603838934      -12.70       -7.53    2.2    11.6s
 19   -157.8603838934   +  -12.94       -7.98    3.6    17.3s
 20   -157.8603838934      -12.77       -8.31    3.0    14.4s

for multithreaded and

julia --project=. run_DFTK.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188008275448e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -157.3405698695                   -0.25    6.3    7.14s
  2   -157.8497105405       -0.29       -0.98    5.8    7.89s
  3   -157.8596028516       -2.00       -1.40    8.1    6.89s
  4   -157.8603060979       -3.15       -2.16    2.0    3.75s
  5   -157.8603800028       -4.13       -2.80    4.0    4.67s
  6   -157.8603822929       -5.64       -3.00    3.5    4.79s
  7   -157.8603833368       -5.98       -3.20    1.4    3.34s
  8   -157.8603838472       -6.29       -3.46    2.1    3.36s
  9   -157.8603838637       -7.78       -3.53    2.3    3.54s
 10   -157.8603838781       -7.84       -3.69    1.1    2.84s
 11   -157.8603838878       -8.01       -3.91    1.6    3.25s
 12   -157.8603838924       -8.34       -4.55    2.2    3.54s
 13   -157.8603838933       -9.04       -5.16    3.9    4.63s
 14   -157.8603838934      -10.22       -5.40    3.4    4.63s
 15   -157.8603838934      -10.94       -5.99    2.2    3.93s
 16   -157.8603838934   +  -12.22       -6.18    3.8    4.42s
 17   -157.8603838934   +  -12.43       -6.75    1.2    3.23s
 18   -157.8603838934   +    -Inf       -7.11    3.4    4.16s
 19   -157.8603838934      -13.25       -7.90    2.9    3.86s
 20   -157.8603838934   +  -12.43       -8.11    4.3    5.02s

for singlethreaded. As you can see the singlethreaded one is faster. Any idea what is wrong?

For MPI I disable the multithreading and run it with mpiexecjl but it is very slow on startup. I get this output and after that it seems to be stuck:

/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"

run_DFTK.jl:

# using MPI
using DFTK
using Unitful
using UnitfulAtomic
using AtomsIO        # Enables only Julia-based parsers
using AtomsIOPython  # Enable python-based parsers as well
using PseudoPotentialData

disable_threading()
DFTK.mpi_master() || (redirect_stdout(); redirect_stderr())


# setup_threading()
# println("Multithreading enabled")

system = load_system("POSCAR")


family_upf = PseudoFamily("dojo.nc.sr.lda.v0_4_1.standard.upf");

pseudopotentials = load_psp(family_upf, system)

temperature = 0.01
smearing = DFTK.Smearing.FermiDirac()


# 2. Select model and basis
# Had to change something with respect to tutorial
model = model_LDA(system, temperature=temperature, smearing=smearing, pseudopotentials=pseudopotentials)
kgrid = [7, 7, 7]     # k-point grid (Regular Monkhorst-Pack grid)
Ecut = 1000.0u"eV"              # kinetic energy cutoff
# n_alg = AdaptiveBands(model, n_bands_converge=20)
basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid, use_symmetries_for_kpoint_reduction=true)


# 3. Run the SCF procedure to obtain the ground state
scfres = self_consistent_field(basis, tol=1e-8)
# scfres = DFTK.unfold_bz(scfres)

Thanks in advance for the help!

POSCAR.txt

@mfherbst
Copy link
Member

For threading that's just the way it is: For small problems threading includes an overhead. Sometimes less threads is faster than more.

Regarding your MPI issue, that looks like a precompilation issue, which can cause MPI-based executions to deadlock. @Technici4n can probably comment whether these symptoms match his experience.

@Technici4n we should probably make a note about this in the parallelisation docs.

@Technici4n
Copy link
Contributor

For precompilation, you can make sure that all packages are precompiled using --compiled-modules=strict. Other reasonable options might be no or existing.

For multithreading you can try less threads, for example -t2 or -t4 and see if that improves performance.

@mfherbst mfherbst changed the title Multitheading and MPI Deadlock when using MPI with Python packages and DFTK Feb 24, 2025
@mfherbst mfherbst added the documentation Correcting or expanding documentation label Feb 24, 2025
@mfherbst
Copy link
Member

mfherbst commented Feb 24, 2025

Ok, let's see if this helps to avoid the deadlock for @simonganne01. I any case this should be documented better.

@simonganne01
Copy link
Author

like this?

/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl

because now I get this:

sganne@UG-FSRPF54:~/VSC/docker/DFTK$ /home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
┌ Warning: Negative ρcore detected: -2.5519188019003375e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
┌ Error: The MPI process failed
│   proc = Process(setenv(`/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin/mpiexec -np 4 julia --compiled-modules=strict run_DFTK.jl`,["OPENBLAS_MAIN_FREE=1", "PATH=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/bin/remote-cli:/home/sganne/.local/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/libnvvp:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/libnvvp:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/ProgramData/chocolatey/bin:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/MATLAB/R2024b/bin:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/Git/cmd:/Docker/host/bin:/mnt/c/Program Files/NVIDIA Corporation/Nsight Compute 2025.1.0/:/mnt/c/Program Files/PuTTY/:/mnt/c/Users/sganne/AppData/Local/Programs/Python/Launcher/:/mnt/c/Users/sganne/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/sganne/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/sganne/AppData/Local/Programs/MiKTeX/miktex/bin/x64/:/mnt/c/Users/sganne/AppData/Local/GitHubDesktop/bin:/mnt/c/Users/sganne/AppData/Local/Microsoft/WinGet/Packages/simonmichael.hledger_Microsoft.Winget.Source_8wekyb3d8bbwe:/mnt/c/Users/sganne/AppData/Local/Programs/Ollama:/mnt/c/Users/sganne/.cache/lm-studio/bin:/snap/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin", "ESPRESSO_TMPDIR=/tmp", "WSLENV=VSCODE_WSL_EXT_LOCATION/up", "WAYLAND_DISPLAY=wayland-0", "MPITRAMPOLINE_MPIEXEC=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin/mpiexec", "NAME=UG-FSRPF54", "LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib:/home/sganne/packages/julias/julia-1.11/bin/../lib/julia:/home/sganne/packages/julias/julia-1.11/bin/../lib", "DEBUGPY_ADAPTER_ENDPOINTS=/home/sganne/.vscode-server/extensions/ms-python.debugpy-2025.0.1-linux-x64/.noConfigDebugAdapterEndpoints/endpoint-9a04f020623d20df.txt", "GIT_ASKPASS=/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/extensions/git/dist/askpass.sh"  …  "WSL_INTEROP=/run/WSL/1310_interop", "PS1=\\[\e]633;A\a\\](quantum) \\[\\e]0;\\u@\\h: \\w\\a\\]\${debian_chroot:+(\$debian_chroot)}\\[\\033[01;32m\\]\\u@\\h\\[\\033[00m\\]:\\[\\033[01;34m\\]\\w\\[\\033[00m\\]\\\$ \\[\e]633;B\a\\]", "ESPRESSO_PSEUDO=/home/sganne/QE-2019/pseudo", "ALF_DIR=/home/sganne/VSC/Software/ALF", "HOME=/home/sganne", "TERM=xterm-256color", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:", "COLORTERM=truecolor", "VIRTUAL_ENV=/home/sganne/VSC/quantum", "HOSTTYPE=x86_64"]), ProcessExited(1))
└ @ Main none:7

@Technici4n
Copy link
Contributor

Yes exactly. --compiled-modules=strict means that importing a package that is not precompiled will throw an error. I think it's probably what you want for optimal performance, but of course you have to remember to precompile. :)

Now it seems you are getting into the fun of trying to figure out what might be wrong with MPI. Without additional logging output it will be difficult. 😅 I would recommend adding print statements at different stages in your code to understand what is failing. (Possibly even the imports).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Correcting or expanding documentation
Projects
None yet
Development

No branches or pull requests

3 participants