Skip to content

"Rejecting cache file" on a heterogeneous cluster, leading to repeated precompilation #48579

Open
@jishnub

Description

@jishnub

I am using a freshly downloaded nightly on a Slurm cluster, and encounter this repeated cache invalidation that leads to repeated precompilation.
The login node has

julia> versioninfo()
Julia Version 1.10.0-DEV.524
Commit 2c619b77e04 (2023-02-07 12:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7742 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = /home/user/lib:/lib::/home/user/.local/lib
  JULIA_DEPOT_PATH = /scratch/user/.julia
  JULIA_REVISE_POLL = 1
  JULIA_NUM_PRECOMPILE_TASKS = 1
  JULIA_DEBUG = loading

and the compute node has

julia> versioninfo()
Julia Version 1.10.0-DEV.524
Commit 2c619b77e04 (2023-02-07 12:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 40 × Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 40 virtual cores
Environment:
  LD_LIBRARY_PATH = /home/user/lib:/lib:/home/user/lib:/lib::/home/user/.local/lib
  JULIA_DEPOT_PATH = /scratch/user/.julia
  JULIA_REVISE_POLL = 1
  JULIA_NUM_PRECOMPILE_TASKS = 1
  JULIA_DEBUG = loading

I start by deleting my .julia directory to avoid clashes:

rm -rf /scratch/user/.julia

After this, on the login node, I generate a simple package with FillArrays.jl as the only dependency. I instantiate the package on the login node, to see

(Testpkg) pkg> instantiate
  Installing known registries into `/scratch/user/.julia`
   Installed FillArrays ─ v0.13.7
Precompiling environment...
  7 dependencies successfully precompiled in 4 seconds
  2 dependencies had warnings during precompilation:
┌ FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/Statistics/ERcPL_ty4bU.so for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2]
│  └ @ Base loading.jl:1004
└  
┌ LibSSH2_jll [29816b5a-b9ab-546f-933c-edad1886dfa8]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/MbedTLS_jll/u5NEn_ty4bU.so for MbedTLS_jll [c8ffd9c3-330d-5841-b78e-0817d7145fa1]
│  └ @ Base loading.jl:1004
└  

(Testpkg) pkg> precompile

(Testpkg) pkg>

So far, so good, as the package clearly doesn't precompile twice. Now, I drop to the compute node and find that the package precompiles again:

(Testpkg) pkg> precompile
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/Statistics/ERcPL_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/Zlib_jll/xjq3Q_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/SuiteSparse_jll/ME9At_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/MbedTLS_jll/u5NEn_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
Precompiling environment...
  7 dependencies successfully precompiled in 5 seconds
  2 dependencies had warnings during precompilation:
┌ FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/Statistics/ERcPL_ty4bU.so for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2]
│  └ @ Base loading.jl:1004
└  
┌ LibSSH2_jll [29816b5a-b9ab-546f-933c-edad1886dfa8]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/MbedTLS_jll/u5NEn_ty4bU.so for MbedTLS_jll [c8ffd9c3-330d-5841-b78e-0817d7145fa1]
│  └ @ Base loading.jl:1004

Now, if I go back to the login node and try to precompile the package again, I find

(Testpkg) pkg> precompile
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/Statistics/ERcPL_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/Zlib_jll/xjq3Q_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/SuiteSparse_jll/ME9At_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
┌ Debug: Rejecting cache file /scratch/user/.julia/compiled/v1.10/MbedTLS_jll/u5NEn_ty4bU.ji for  [top-level] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2710
Precompiling environment...
  7 dependencies successfully precompiled in 5 seconds
  2 dependencies had warnings during precompilation:
┌ FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/Statistics/ERcPL_ty4bU.so for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2]
│  └ @ Base loading.jl:1004
└  
┌ LibSSH2_jll [29816b5a-b9ab-546f-933c-edad1886dfa8]
│  ┌ Debug: Loading object cache file /scratch/user/.julia/compiled/v1.10/MbedTLS_jll/u5NEn_ty4bU.so for MbedTLS_jll [c8ffd9c3-330d-5841-b78e-0817d7145fa1]
│  └ @ Base loading.jl:1004

Every time I switch between the login and the compute node, the package requires a fresh round of precompilation, which can be quite time-consuming. I wonder if it'll be possible to save two sets of cache files such that one doesn't invalidate the other?

Metadata

Metadata

Assignees

Labels

needs docsDocumentation for this change is requiredpackagesPackage management and loadingpkgimage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions