intermittent issue with tests/kernel/fatal #7291

andrewboie · 2018-05-01T21:08:07Z

starting test - test_fatal
test alt thread 1: generic CPU exception
***** CPU exception 6
Current thread ID = 0x00400060
eax: 0x00000000, ebx: 0x00000000, ecx: 0x80000000, edx: 0x0040e040
esi: 0x00000000, edi: 0x00004d85, ebp: 0x00405fd0, esp: 0x00405fd0
eflags: 0x00000206 cs: 0x0008
call trace:
eip: 0x00001322
     0x00004d95 (0x0)
Caught system error -- reason 6
test alt thread 2: initiate kernel oops
***** Kernel OOPS! *****
Current thread ID = 0x00400060
eax: 0x00000206, ebx: 0x00000000, ecx: 0x80000000, edx: 0x0040e040
esi: 0x00000000, edi: 0x00004d85, ebp: 0x00405fd0, esp: 0x00405fcc
eflags: 0x00000006 cs: 0x0008
call trace:
eip: 0x0000133a
     0x00004d95 (0x0)
Caught system error -- reason 7
test alt thread 3: initiate kernel panic
***** Kernel Panic! *****
Current thread ID = 0x00400060
eax: 0x00000206, ebx: 0x00000000, ecx: 0x80000000, edx: 0x0040e040
esi: 0x00000000, edi: 0x00004d85, ebp: 0x00405fd0, esp: 0x00405fcc
eflags: 0x00000006 cs: 0x0008
call trace:
eip: 0x00001344
     0x00004d95 (0x0)
Caught system error -- reason 8
test stack overflow - timer irq
***** Stack Check Fail! *****
Current thread ID = 0x00400060
eax: 0x00004d95, ebx: 0x00000000, ecx: 0x80000000, edx: 0x0040e040
esi: 0x00000000, edi: 0x00004d85, ebp: 0x00405fc8, esp: 0x00404fc8
eflags: 0x00000202 cs: 0x0008
call trace:
eip: 0x00001730
     0x00001764 (0x405fe8)
     0x00004d95 (0x0)
Caught system error -- reason 4249512

    Assertion failed at /projects/zephyr/tests/kernel/fatal/src/main.c:224: test_fatal: (crash_reason not equal to expected_reason)
bad reason code got 4249512 expected 4
FAIL - test_fatal.

Seems to be some kind of race? This only happens sometimes.

The text was updated successfully, but these errors were encountered:

nashif · 2018-05-29T21:20:27Z

move to low, since this is intermittent and not easily reproducible.

spoorthik · 2018-07-16T07:35:20Z

I too found similar crash when running sanitycheck:

Latest Commit ID: afad09d

***** Booting Zephyr OS v1.12.0-790-gafad09d *****
Running test suite fatal
===================================================================
starting test - test_fatal
test alt thread 1: generic CPU exception
***** CPU exception 6
Current thread ID = 0x00400060
eax: 0x00000000, ebx: 0x00000000, ecx: 0x80000000, edx: 0x00413060
esi: 0x00000000, edi: 0x000016fe, ebp: 0x00406fd0, esp: 0x00406fd0
eflags: 0x00000216 cs: 0x0008
call trace:
eip: 0x00001231
     0x0000170e (0x0)
Caught system error -- reason 6
test alt thread 2: initiate kernel oops
***** Kernel OOPS! *****
Current thread ID = 0x00400060
eax: 0x00000216, ebx: 0x00000000, ecx: 0x80000000, edx: 0x00413060
esi: 0x00000000, edi: 0x000016fe, ebp: 0x00406fd0, esp: 0x00406fcc
eflags: 0x00000016 cs: 0x0008
call trace:
eip: 0x00001249
     0x0000170e (0x0)
Caught system error -- reason 7
test alt thread 3: initiate kernel panic
***** Kernel Panic! *****
Current thread ID = 0x00400060
eax: 0x00000216, ebx: 0x00000000, ecx: 0x80000000, edx: 0x00413060
esi: 0x00000000, edi: 0x000016fe, ebp: 0x00406fd0, esp: 0x00406fcc
eflags: 0x00000016 cs: 0x0008
call trace:
eip: 0x00001253
 0x0000170e (0x0)
Caught system error -- reason 8
test stack overflow - timer irq
***** General Protection Fault
***** Exception code: 0x60
Current thread ID = 0x00400060
eax: 0x004127f0, ebx: 0x00000000, ecx: 0x00000000, edx: 0x00000216
esi: 0x00000000, edi: 0x00000216, ebp: 0x004127fc, esp: 0x004127d0
eflags: 0x00000046 cs: 0x0008
call trace:
eip: 0x00002d1f
Caught system error -- reason 6

    Assertion failed at zephyr.git/tests/kernel/fatal/src/main.c:234: test_fatal: (crash_reason not equal to _N
bad reason code got 6 expected 4

FAIL - test_fatal
===================================================================
===================================================================
PROJECT EXECUTION FAILED

In the event of a double fault, we do a HW task switch to a special _df_tss hardware task which resets the stack pointer to the interrupt stack and otherwise restores the main hardware task to a runnable state so that _df_handler_bottom() can run. However, we need to make sure that _df_handler_bottom() runs with interrupts locked, otherwise another IRQ could corrupt the interrupt stack resulting in undefined behavior. We have very little stack space to work with in this context, just zero it. It's a fatal error for the thread in any event. Fixes: zephyrproject-rtos#7291 Signed-off-by: Andrew Boie <[email protected]>

andrewboie · 2019-02-13T05:32:06Z

I think this is an unlucky timer interrupt when the double-fault handler bottom half is using the interrupt stack, which is corrupting the reason code, it doesn't ensure that interrupts are locked and an interrupt that fires will clobber its context.

sent a patch to zero EFLAGS when df_handler bottom is running, previously it was inheriting whatever flags were when the double fault happened.

In the event of a double fault, we do a HW task switch to a special _df_tss hardware task which resets the stack pointer to the interrupt stack and otherwise restores the main hardware task to a runnable state so that _df_handler_bottom() can run. However, we need to make sure that _df_handler_bottom() runs with interrupts locked, otherwise another IRQ could corrupt the interrupt stack resulting in undefined behavior. We have very little stack space to work with in this context, just zero it. It's a fatal error for the thread in any event. Fixes: #7291 Signed-off-by: Andrew Boie <[email protected]>

nashif added the bug The issue is a bug, or the PR is fixing a bug label May 2, 2018

MaureenHelm added priority: medium Medium impact/importance bug area: Kernel labels May 4, 2018

nashif added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels May 29, 2018

nashif mentioned this issue Jan 30, 2019

List of tests that keep failing sporadically #12553

Closed

18 tasks

andrewboie mentioned this issue Feb 13, 2019

x86: clear EFLAGS on double fault #13339

Merged

andrewboie self-assigned this Feb 13, 2019

andrewboie closed this as completed in #13339 Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

intermittent issue with tests/kernel/fatal #7291

intermittent issue with tests/kernel/fatal #7291

andrewboie commented May 1, 2018

nashif commented May 29, 2018

Uh oh!

spoorthik commented Jul 16, 2018 •

edited

Loading

Uh oh!

andrewboie commented Feb 13, 2019

Uh oh!

intermittent issue with tests/kernel/fatal #7291

intermittent issue with tests/kernel/fatal #7291

Comments

andrewboie commented May 1, 2018

nashif commented May 29, 2018

Uh oh!

spoorthik commented Jul 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewboie commented Feb 13, 2019

Uh oh!

spoorthik commented Jul 16, 2018 •

edited

Loading