Skip to content

Extract the install prefix from the shared library. #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Mar 17, 2025

This is part of a multi-project effort, a similar PR will be created in OpenPMIX and OMPI. The goal of each of these changes is the same: instead of using build-time generated prefix that ignore a project rebase, take the prefix from the shared library of each project and derive the necessary paths from it. The user can however overwrite this using the environment variables, and the configuration files.

jsquyres and others added 30 commits March 31, 2024 09:03
Homebrew has broken something and I cannot figure
out how to fix it.

Signed-off-by: Ralph Castain <[email protected]>
(cherry picked from commit 2ac45f3)
Fix the issues with the MacOS builds so that they work again in Github
Action environments.

Signed-off-by: Jeff Squyres <[email protected]>
It has been reported (and confirmed) that building against
one version of PMIx and then running with another version
will cause PRRTE to segfault. This isn't a universal rule.
For example, one can switch v5.0 and master without a
problem. However, switching v5.0 and v4.2 is a definite
segfault.

The root cause of the problem is a change in the layout
of the base pmix_object_t definition. This renders all
PMIx objects binary incompatible when crossing between
the v5 and v4 (and below) series.

Changing the v5 definition back to match v4 is an
overly complex task. The changes were required to
accommodate the new shared memory support that
was introduced in v5.

So instead, we check the runtime version of PMIx against
the build version. If the runtime version is incompatible
with the build version, then we print an explanatory
error message and error out.

Signed-off-by: Ralph Castain <[email protected]>

bot:notacherrypick

dd

Signed-off-by: Ralph Castain <[email protected]>
In some recent Slurm versions, the Slurm runtime is inserting
custom arguments to the PRRTE launcher's `srun` cmd line without
the user being aware of it. In many cases, this may not be a
problem - but in some cases (where the user or the system
admin needs/wants particular cmd line arguments used) this can
cause problems as it happens silently, without the user being
aware of it.

Make this visible when it happens, and provide a mechanism by
which the user/admin can override it. Provide a fairly long
help message explaining what happened and offering advice on
resolution, along with a param for disabling the warning. Add
a param for overriding the "args" param if necessary, along
with a caution as to possible consequences.

Signed-off-by: Ralph Castain <[email protected]>
Enables build against v1.11.8 and above.

Signed-off-by: Ralph Castain <[email protected]>
Add a new cmd line option that corresponds to this
attribute. Add the attribute to the prun payload.
When received, it will default to including in the
job info for the spawned job. Add query support for it.

Signed-off-by: Ralph Castain <[email protected]>
(cherry picked from commit 3957789)
Changes will need to be made to Open MPI to parse the contents of
the OMPI_MCA_mpi_memory_alloc_kinds environment variable to
determine how to use the user supplied memory-alloc-kinds information.

See section 11.4.3 of the MPI 4.1 standard.

Signed-off-by: Howard Pritchard <[email protected]>
Provide a warning of potentially unknown Slurm params
MPI 4.1: add support for memory-alloc-kinds
If we use one cpu from an object, then we will get a NULL
response if we ask for the next object of that type within
the remaining cpuset since not all of the cpus in the object
are still available. This problem resulted from the recent
change to only use available cpus in PRRTE topologies.

So instead scan across the cpus, check to see if it is
inside the object of interest - if so, then we can bind
to that cpu, if not then we keep searching.

Signed-off-by: Ralph Castain <[email protected]>
If we are trying to bind to an HWLOC object type that is not
defined on a given node, then (a) if the binding policy was
specified by user, then error out; and (b) if we are using
a default binding policy, then simply do not bind.

Signed-off-by: Ralph Castain <[email protected]>
Protect against missing HWLOC object types
Fix a segfault when no arguments are provided
We currently do not support the LTO optimizer
as it is incompatible with our plugin component
architecture. So detect it has been specified
in configure and error out with an explanation.

Includes suggestions from @jsquyres

Signed-off-by: Ralph Castain <[email protected]>
Break the multi-loop thru loading of param files
that caused us to overwrite values. Defer to the
PMIx pmdl components for obtaining envars and for
checking MCA param overlaps across projects.

Signed-off-by: Ralph Castain <[email protected]>
Python 3.12 no longer allows escapes in regular expressions. Instead, use "r" strings.

Signed-off-by: Ralph Castain <[email protected]>
Use the PMIx functions to check params
Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit e204c73)
The formatting is messed up in places, so try and fit it.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit b4f5884)
We don't really have stale issues, so we can close anything
old manually.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 2215b29)
See open-mpi/ompi#12829 for an explanation

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit bff20a8)
RTD is rolling out some changes. Per
https://about.readthedocs.com/blog/2024/07/addons-by-default/, these are the changes we need to make.

Port of open-mpi/ompi#12687

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 584845f)
hppritcha and others added 26 commits October 1, 2024 10:03
Restore processing of interface include and exclude
parameters - not sure where it went, but bring it
back anyway.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 99e4e24)
Protect against the envar version of the Slurm
custom args MCA param. This is an unfortunate
hack that hopefully will eventually go away.
See both of the following for detailed
explanations and discussion:

openpmix#1974
open-mpi/ompi#12471

Orgs/users wanting to add custom args to the
internal "srun" command used to spawn the
PRRTE daemons must do so via the default MCA
param files (system or user), or via the
prterun (or its proxy) cmd line

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 28432ed)
Pay attention to interface include/exclude params
Protect against the envar version of the Slurm custom args param
We have deprecated the "socket" object in favor of
"package", so we need to extend that treatment to
the resource qualifier in the `--map-by ppr` directive.
Detect both "socket" and the "skt" shorthand for
backward-compatibility reasons.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 7f507df)
Fix deprecation warnings for ppr on socket objects
We played with alternative implementations over the years,
but nothing ever really stuck. So let's simplify the code
by removing the framework and the associated loopbacks
that were built into it (e.g., if a message cannot be sent,
then loop it back into the OOB base to see if another
component is available that can send it).

The code badly needs reorganization as I've made no
attempt to do so here. A pass to see if event steps
can be eliminated would also be good - I've cleaned
up a few of them, but what remains could use another
pair of eyes.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 7b54c48)
We no longer support all the way to pre-StoneAge
versions.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit cd8b2f7)
Not sure the compiler is correct in its complaint, but
just in case, let's ensure that the variables being used
are always initialized.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit d5e580a)
Fix typo that locked the target to rank=0 and instead
use the rank provided by user.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit b0d461d)
It isn't perfect and it does raise precedence
concerns, but let the OMPI schizo component
parse the OMPI param files. Use the PMIx
pmdl functions to check for relevant params
to convert to PRRTE and PMIx. Restore the
overlap detection for when params are set
for OMPI frameworks that have a corrolary
to frameworks in PRRTE and PMIx.

This still raises precedence questions. For
now, what this will do is have values in the
OMPI param files give way to corresponding
values specified in PRRTE and PMIx param
files. In other words, values in PMIx and
PRRTE param files are written into the
environment, not overwriting any previously
existing envar. OMPI param files are then
read, and any params that correspond to
PRRTE and PMIx params are written into
the environment - without overwriting
any that already exist.

So if you have a param specified in a
PRRTE param file, and you also specify it
in the OMPI param file, the PRRTE param file
value will "win".

User beware - someone is bound to be very
unhappy.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit fcbefcc)
Raise min required HWLOC version to 2.1.0
See long explanation on openpmix#2025

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit 7de93e9)
No longer any challengers to hwloc, so streamline the
binding support by re-integrating it back into the
odls framework.

Signed-off-by: Ralph Castain <[email protected]>
(from upstream commit add324c)
Get takes a (pmix_value_t**), so don't cast it to (void**)

Signed-off-by: Ralph Castain <[email protected]>
(from upstream from commit 16ce9a8)
Signed-off-by: George Bosilca <[email protected]>
This is part of a multi-project effort, a similar PR will be created in
OpenPMIX and OMPI. The goal of each of these changes is the same:
instead of using build-time generated prefix that ignore a project
rebase, take the prefix from the shared library of each project and
derive the necessary paths from it. The user can however overwrite this
using the environment variables, and the configuration files.

Signed-off-by: George Bosilca <[email protected]>
@rhc54
Copy link

rhc54 commented Mar 17, 2025

I doubt this will merge cleanly up here - I transferred it to the upstream repo: openpmix#2167

@bosilca
Copy link
Member Author

bosilca commented Mar 17, 2025

Thanks. But we still need to bring it back here as this is the official submodule now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants