Skip to content

xtensa: context returns to thread after kernel oops #15037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewboie opened this issue Mar 29, 2019 · 1 comment
Closed

xtensa: context returns to thread after kernel oops #15037

andrewboie opened this issue Mar 29, 2019 · 1 comment
Assignees
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@andrewboie
Copy link
Contributor

andrewboie commented Mar 29, 2019

Describe the bug
See CI failures for #15032

What is happening is that a thread is blowing its stack which generates a call to z_except_reason(). k_thread_abort() gets called and the thread should never return to its faulting context.

Instead, the thread returns to where it faulted! After the "Caught system error" observe that the thread continues to run, failing the test case.

test stack sentinel overflow - timer irq
posting 1024 bytes of junk to stack...
waiting for tick advance...
@ /home/apboie/projects/zephyr3/zephyr/kernel/thread.c:259:
***** Stack Check Fail! *****
Current thread ID = 0x60005b78
Faulting instruction address = 0xdeaddead
Thread 0x60005b78 Caught system error -- reason 2
FAIL - stack_sentinel_timer@160. should never see this

    Assertion failed at /home/apboie/projects/zephyr3/zephyr/tests/kernel/fatal/src/main.c:207: check_stack_overflow: (rv equal to TC_FAIL)
thread 0x60005b78 was not aborted
@andrewboie andrewboie added bug The issue is a bug, or the PR is fixing a bug area: Xtensa Xtensa Architecture labels Mar 29, 2019
@andrewboie andrewboie changed the title xtensa: context returns to thread after fatal exception xtensa: context returns to thread after kernel oops Mar 29, 2019
@andrewboie
Copy link
Contributor Author

andrewboie commented Mar 29, 2019

Some more details:

  1. The timer interrupt fires. On the way out of the timer interrupt, xtensa_excint1_c() in the xtensa arch code makes a call to z_get_next_switch_handle().
  2. z_get_next_switch_handle() in kernel/sched.c calls z_check_stack_sentinel()
  3. Since we blew the stack, z_check_stack_sentinel() calls z_except_reason(_NANO_ERR_STACK_CHECK_FAIL)
  4. xtensa has no arch-specific Z_ARCH_EXCEPT() implemented, so a real exception is not generated. Instead z_except_reason() is the default implementation in include/kernel.h which just calls z_NanoFatalErrorHandler() directly
  5. z_NanoFatalErrorHandler() runs, calls z_SysFatalErrorHandler() which is a custom implementation in tests/kernel/fatal/src/main.c
  6. z_SysFatalErrorHandler() calls k_thread_abort()
  7. It appears the z_reschedule() call in z_impl_k_thread_abort() returns to the caller on Xtensa. The rest of the interrupt unwinds and then the faulting thread is run again even though it shouldn't be.

My guess is that due to some implementation detail on xtensa, trying to call z_except_reason() in that particular context doesn't have the desired effect on this arch.

@andrewboie andrewboie added the priority: medium Medium impact/importance bug label Mar 29, 2019
andrewboie pushed a commit to andrewboie/zephyr that referenced this issue Mar 29, 2019
Checking the stack sentinel may abort the current thread,
make this check before we determine what the next thread
to run is.

Fixes: zephyrproject-rtos#15037

Signed-off-by: Andrew Boie <[email protected]>
@andrewboie andrewboie self-assigned this Mar 29, 2019
nashif pushed a commit that referenced this issue Mar 30, 2019
Checking the stack sentinel may abort the current thread,
make this check before we determine what the next thread
to run is.

Fixes: #15037

Signed-off-by: Andrew Boie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

1 participant