Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

hpcrun should identify when hpcfnbounds fails to start #75

Closed
@jmellorcrummey

Description

@jmellorcrummey

Today at LLNL, we encountered a problem where hpcfnbounds linked in a copy of libdw that was built with spack. The libdw copy was safely installed in lib/hpctoolkit/ext-libs. However, spack had hardcoded a dependence on a version of libbz2.so.1.0 that is in the spack install tree.

As a consequence of LLNL's default security, that spack repository, which was built in /usr/workspace/.../${LOGNAME} wasn't readable by others. The version of libbzip that was linked into libdw had incompatible version numbers from the ones publicly available in /lib64/libbzip.so.1 and /lib64/libbzip.so.1.0.6 and . As a result, hpcfnbounds-bin silently failed to run because it couldn't load a dependent library. hpcrun, however, continued to spawn failing copies of hpcfnbounds until the whole execution keeled over with an error message about something else, which wasn't helpful for diagnosing the problem.

hpcrun should detect that hpcfnbounds starts correctly - a handshake before using it for the first time would do this - and then, if something is amiss, rather than failing inscrutably after causing a fork bomb, it should exit with a helpful error message

hpcrun: unable to start hpcfnbounds to analyze application load modules
     check permissions of the script, the executable in libexec/hpctoolkit/hpcfnbounds-bin, 
     and its dependent libraries

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions