master: opal/stacktrace improvements #2772

jjhursey · 2017-01-20T22:10:51Z

The first commit fixes a bug that I saw with this test program:

https://gist.github.com/jjhursey/1af4dac993f11f4d6d3e9b4247b96475

If I run the program with Open MPI, it catches the first SIGFPE and prints a stack through the opal/stacktrace mechanism, but let's the program continue executing 😞 . When it hits the second SIGFPE then the system's default signal handler is invoked and kills the program.

I would expect Open MPI to print the stack and kill the program on the first instance of this signal. And if I run Open MPI with -mca opal_signal 6,7,11 (skipping 8 = SIGFPE) then it does exactly that (exits the program at the first encounter).

So the first commit makes sure that we re-raise the signal after processing the stacktrace.

The second commit adds some more flexibility to the stacktrace mechanism in particular where the stacktrace output is directed. Default is still stderr, but now we have options for none, stdout, and a set of per-process files.

See the commit messages for some more detail.

jjhursey · 2017-01-25T21:06:23Z

Adding @hjelmn since they were seeing sometime similar to the problem I attempted a fix for in this PR.

I found the discussion here helpful:

https://www.gnu.org/software/libc/manual/html_node/Termination-in-Handler.html#Termination-in-Handler

jsquyres · 2017-01-26T01:31:46Z

@jjhursey @hjelmn We're going to make an RC for 2.0.2 tomorrow. If you want this in 2.0.2, please merge to master and PR to v2.0.2 ASAP. Thanks.

- This prevents us for accidentally masking a signal that was meant to terminate the application. Signed-off-by: Joshua Hursey <[email protected]>

- New MCA option: opal_stacktrace_output - Specifies where the stack trace output stream goes. - Accepts: none, stdout, stderr, file[:filename] - Default filename 'stacktrace' - Filename will be `stacktrace.PID`, or if VPID is available, then the filename will be `stacktrace.VPID.PID` - Update util/stacktrace to allow for different output avenues including files. Previously this was hardcoded to 'stderr'. - Since opal_backtrace_print needs to be signal safe, passing it a FILE object that actually represents a file stream is difficult. This is because we cannot open the file in the signal handler using `fopen` (not safe), but have to use `open` (safe). Additionally, we cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it is also not signal safe. - I did not want to break the backtrace.h API so I introduced a new rule (documented in `backtrace.c`) that if the `FILE *file` argument is `NULL` then look for the `opal_stacktrace_output_fileno` variable to tell you which file descriptor to use for output. Signed-off-by: Joshua Hursey <[email protected]>

hjelmn

👍

jsquyres · 2017-01-26T18:52:09Z

Refs #2801

jjhursey added bug enhancement labels Jan 20, 2017

jjhursey assigned hjelmn Jan 25, 2017

jjhursey mentioned this pull request Jan 26, 2017

v2.0.x: opal/stacktace: Raise the signal after processing #2846

Merged

jjhursey force-pushed the topic/stacktrace-improv branch from e54ecf3 to 332d633 Compare January 26, 2017 17:20

jjhursey mentioned this pull request Jan 26, 2017

v1.10: opal/stacktace: Raise the signal after processing #2848

Merged

jjhursey added 2 commits January 26, 2017 11:55

opal/stacktace: Raise the signal after processing

f8918e3

- This prevents us for accidentally masking a signal that was meant to terminate the application. Signed-off-by: Joshua Hursey <[email protected]>

jjhursey force-pushed the topic/stacktrace-improv branch from 332d633 to 6d98559 Compare January 26, 2017 17:55

hjelmn approved these changes Jan 26, 2017

View reviewed changes

jsquyres changed the title ~~opal/stacktrace improvements~~ master: opal/stacktrace improvements Jan 26, 2017

jsquyres merged commit 2c277a6 into open-mpi:master Jan 26, 2017

jjhursey mentioned this pull request Jan 26, 2017

v2.x: opal/stacktrace improvements #2856

Merged

jjhursey deleted the topic/stacktrace-improv branch January 26, 2017 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

master: opal/stacktrace improvements #2772

master: opal/stacktrace improvements #2772

Uh oh!

jjhursey commented Jan 20, 2017

Uh oh!

jjhursey commented Jan 25, 2017

Uh oh!

jsquyres commented Jan 26, 2017

Uh oh!

hjelmn left a comment

Uh oh!

jsquyres commented Jan 26, 2017

Uh oh!

Uh oh!

master: opal/stacktrace improvements #2772

master: opal/stacktrace improvements #2772

Uh oh!

Conversation

jjhursey commented Jan 20, 2017

Uh oh!

jjhursey commented Jan 25, 2017

Uh oh!

jsquyres commented Jan 26, 2017

Uh oh!

hjelmn left a comment

Choose a reason for hiding this comment

Uh oh!

jsquyres commented Jan 26, 2017

Uh oh!

Uh oh!