forked from openpmix/prrte
-
Notifications
You must be signed in to change notification settings - Fork 2
Extract the install prefix from the shared library. #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bosilca
wants to merge
58
commits into
open-mpi:master
Choose a base branch
from
bosilca:topic/runtime_prefix
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Jeff Squyres <[email protected]>
Homebrew has broken something and I cannot figure out how to fix it. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 2ac45f3)
Fix the issues with the MacOS builds so that they work again in Github Action environments. Signed-off-by: Jeff Squyres <[email protected]>
Fix macos tests
It has been reported (and confirmed) that building against one version of PMIx and then running with another version will cause PRRTE to segfault. This isn't a universal rule. For example, one can switch v5.0 and master without a problem. However, switching v5.0 and v4.2 is a definite segfault. The root cause of the problem is a change in the layout of the base pmix_object_t definition. This renders all PMIx objects binary incompatible when crossing between the v5 and v4 (and below) series. Changing the v5 definition back to match v4 is an overly complex task. The changes were required to accommodate the new shared memory support that was introduced in v5. So instead, we check the runtime version of PMIx against the build version. If the runtime version is incompatible with the build version, then we print an explanatory error message and error out. Signed-off-by: Ralph Castain <[email protected]> bot:notacherrypick dd Signed-off-by: Ralph Castain <[email protected]>
In some recent Slurm versions, the Slurm runtime is inserting custom arguments to the PRRTE launcher's `srun` cmd line without the user being aware of it. In many cases, this may not be a problem - but in some cases (where the user or the system admin needs/wants particular cmd line arguments used) this can cause problems as it happens silently, without the user being aware of it. Make this visible when it happens, and provide a mechanism by which the user/admin can override it. Provide a fairly long help message explaining what happened and offering advice on resolution, along with a param for disabling the warning. Add a param for overriding the "args" param if necessary, along with a caution as to possible consequences. Signed-off-by: Ralph Castain <[email protected]>
Enables build against v1.11.8 and above. Signed-off-by: Ralph Castain <[email protected]>
Add a new cmd line option that corresponds to this attribute. Add the attribute to the prun payload. When received, it will default to including in the job info for the spawned job. Add query support for it. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 3957789)
Changes will need to be made to Open MPI to parse the contents of the OMPI_MCA_mpi_memory_alloc_kinds environment variable to determine how to use the user supplied memory-alloc-kinds information. See section 11.4.3 of the MPI 4.1 standard. Signed-off-by: Howard Pritchard <[email protected]>
Check the runtime version of PMIx
Provide a warning of potentially unknown Slurm params
MPI 4.1: add support for memory-alloc-kinds
If we use one cpu from an object, then we will get a NULL response if we ask for the next object of that type within the remaining cpuset since not all of the cpus in the object are still available. This problem resulted from the recent change to only use available cpus in PRRTE topologies. So instead scan across the cpus, check to see if it is inside the object of interest - if so, then we can bind to that cpu, if not then we keep searching. Signed-off-by: Ralph Castain <[email protected]>
Repair the binding algorithm
If we are trying to bind to an HWLOC object type that is not defined on a given node, then (a) if the binding policy was specified by user, then error out; and (b) if we are using a default binding policy, then simply do not bind. Signed-off-by: Ralph Castain <[email protected]>
Protect against missing HWLOC object types
Signed-off-by: Luke Robison <[email protected]>
Fix a segfault when no arguments are provided
We currently do not support the LTO optimizer as it is incompatible with our plugin component architecture. So detect it has been specified in configure and error out with an explanation. Includes suggestions from @jsquyres Signed-off-by: Ralph Castain <[email protected]>
Break the multi-loop thru loading of param files that caused us to overwrite values. Defer to the PMIx pmdl components for obtaining envars and for checking MCA param overlaps across projects. Signed-off-by: Ralph Castain <[email protected]>
Python 3.12 no longer allows escapes in regular expressions. Instead, use "r" strings. Signed-off-by: Ralph Castain <[email protected]>
Protect against LTO optimizer
docs: update for Python 3.12
Use the PMIx functions to check params
Signed-off-by: Ralph Castain <[email protected]> (from upstream commit e204c73)
The formatting is messed up in places, so try and fit it. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit b4f5884)
We don't really have stale issues, so we can close anything old manually. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 2215b29)
See open-mpi/ompi#12829 for an explanation Signed-off-by: Ralph Castain <[email protected]> (from upstream commit bff20a8)
RTD is rolling out some changes. Per https://about.readthedocs.com/blog/2024/07/addons-by-default/, these are the changes we need to make. Port of open-mpi/ompi#12687 Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 584845f)
Cleanup show_help formatting
Remove unused yaml
Read The Docs updates
Restore processing of interface include and exclude parameters - not sure where it went, but bring it back anyway. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 99e4e24)
Protect against the envar version of the Slurm custom args MCA param. This is an unfortunate hack that hopefully will eventually go away. See both of the following for detailed explanations and discussion: openpmix#1974 open-mpi/ompi#12471 Orgs/users wanting to add custom args to the internal "srun" command used to spawn the PRRTE daemons must do so via the default MCA param files (system or user), or via the prterun (or its proxy) cmd line Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 28432ed)
Pay attention to interface include/exclude params
Protect against the envar version of the Slurm custom args param
We have deprecated the "socket" object in favor of "package", so we need to extend that treatment to the resource qualifier in the `--map-by ppr` directive. Detect both "socket" and the "skt" shorthand for backward-compatibility reasons. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 7f507df)
Fix deprecation warnings for ppr on socket objects
We played with alternative implementations over the years, but nothing ever really stuck. So let's simplify the code by removing the framework and the associated loopbacks that were built into it (e.g., if a message cannot be sent, then loop it back into the OOB base to see if another component is available that can send it). The code badly needs reorganization as I've made no attempt to do so here. A pass to see if event steps can be eliminated would also be good - I've cleaned up a few of them, but what remains could use another pair of eyes. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 7b54c48)
We no longer support all the way to pre-StoneAge versions. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit cd8b2f7)
Not sure the compiler is correct in its complaint, but just in case, let's ensure that the variables being used are always initialized. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit d5e580a)
Fix typo that locked the target to rank=0 and instead use the rank provided by user. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit b0d461d)
It isn't perfect and it does raise precedence concerns, but let the OMPI schizo component parse the OMPI param files. Use the PMIx pmdl functions to check for relevant params to convert to PRRTE and PMIx. Restore the overlap detection for when params are set for OMPI frameworks that have a corrolary to frameworks in PRRTE and PMIx. This still raises precedence questions. For now, what this will do is have values in the OMPI param files give way to corresponding values specified in PRRTE and PMIx param files. In other words, values in PMIx and PRRTE param files are written into the environment, not overwriting any previously existing envar. OMPI param files are then read, and any params that correspond to PRRTE and PMIx params are written into the environment - without overwriting any that already exist. So if you have a param specified in a PRRTE param file, and you also specify it in the OMPI param file, the PRRTE param file value will "win". User beware - someone is bound to be very unhappy. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit fcbefcc)
Collapse the OOB framework
Unlock stdin target
Raise min required HWLOC version to 2.1.0
See long explanation on openpmix#2025 Signed-off-by: Ralph Castain <[email protected]> (from upstream commit 7de93e9)
No longer any challengers to hwloc, so streamline the binding support by re-integrating it back into the odls framework. Signed-off-by: Ralph Castain <[email protected]> (from upstream commit add324c)
Remove the rtc framework
Get takes a (pmix_value_t**), so don't cast it to (void**) Signed-off-by: Ralph Castain <[email protected]> (from upstream from commit 16ce9a8) Signed-off-by: George Bosilca <[email protected]>
Restore NUMA tracking
Restore parsing of OMPI param files
Protect against uninitialized var
Silence warning
This is part of a multi-project effort, a similar PR will be created in OpenPMIX and OMPI. The goal of each of these changes is the same: instead of using build-time generated prefix that ignore a project rebase, take the prefix from the shared library of each project and derive the necessary paths from it. The user can however overwrite this using the environment variables, and the configuration files. Signed-off-by: George Bosilca <[email protected]>
I doubt this will merge cleanly up here - I transferred it to the upstream repo: openpmix#2167 |
Thanks. But we still need to bring it back here as this is the official submodule now. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is part of a multi-project effort, a similar PR will be created in OpenPMIX and OMPI. The goal of each of these changes is the same: instead of using build-time generated prefix that ignore a project rebase, take the prefix from the shared library of each project and derive the necessary paths from it. The user can however overwrite this using the environment variables, and the configuration files.