Skip to content

master: opal/stacktrace improvements #2772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 26, 2017

Conversation

jjhursey
Copy link
Member

The first commit fixes a bug that I saw with this test program:

If I run the program with Open MPI, it catches the first SIGFPE and prints a stack through the opal/stacktrace mechanism, but let's the program continue executing 😞 . When it hits the second SIGFPE then the system's default signal handler is invoked and kills the program.

I would expect Open MPI to print the stack and kill the program on the first instance of this signal. And if I run Open MPI with -mca opal_signal 6,7,11 (skipping 8 = SIGFPE) then it does exactly that (exits the program at the first encounter).

So the first commit makes sure that we re-raise the signal after processing the stacktrace.

The second commit adds some more flexibility to the stacktrace mechanism in particular where the stacktrace output is directed. Default is still stderr, but now we have options for none, stdout, and a set of per-process files.

See the commit messages for some more detail.

@jjhursey
Copy link
Member Author

Adding @hjelmn since they were seeing sometime similar to the problem I attempted a fix for in this PR.

I found the discussion here helpful:

@jsquyres
Copy link
Member

@jjhursey @hjelmn We're going to make an RC for 2.0.2 tomorrow. If you want this in 2.0.2, please merge to master and PR to v2.0.2 ASAP. Thanks.

 - This prevents us for accidentally masking a signal that was meant to
   terminate the application.

Signed-off-by: Joshua Hursey <[email protected]>
 - New MCA option: opal_stacktrace_output
   - Specifies where the stack trace output stream goes.
   - Accepts: none, stdout, stderr, file[:filename]
   - Default filename 'stacktrace'
     - Filename will be `stacktrace.PID`, or if VPID is available,
       then the filename will be `stacktrace.VPID.PID`
 - Update util/stacktrace to allow for different output avenues
   including files. Previously this was hardcoded to 'stderr'.
 - Since opal_backtrace_print needs to be signal safe, passing it a
   FILE object that actually represents a file stream is difficult. This
   is because we cannot open the file in the signal handler using
   `fopen` (not safe), but have to use `open` (safe). Additionally, we
   cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
   is also not signal safe.
   - I did not want to break the backtrace.h API so I introduced a new
     rule (documented in `backtrace.c`) that if the `FILE *file`
     argument is `NULL` then look for the `opal_stacktrace_output_fileno`
     variable to tell you which file descriptor to use for output.

Signed-off-by: Joshua Hursey <[email protected]>
@jjhursey jjhursey force-pushed the topic/stacktrace-improv branch from 332d633 to 6d98559 Compare January 26, 2017 17:55
Copy link
Member

@hjelmn hjelmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jsquyres jsquyres changed the title opal/stacktrace improvements master: opal/stacktrace improvements Jan 26, 2017
@jsquyres jsquyres merged commit 2c277a6 into open-mpi:master Jan 26, 2017
@jsquyres
Copy link
Member

Refs #2801

@jjhursey jjhursey deleted the topic/stacktrace-improv branch January 26, 2017 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants