Skip to content

ARM NXP hardware stack overflows do not report _NANO_ERR_STACK_CHK_FAIL or provide MPU fault information #7706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewboie opened this issue May 22, 2018 · 7 comments
Assignees
Labels
area: ARM ARM (32-bit) Architecture area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Milestone

Comments

@andrewboie
Copy link
Contributor

andrewboie commented May 22, 2018

When stack guard pages are enabled and a thread overflows its stack, the error reported is as follows, causing tests/kernel/fatal to work improperly:

test stack overflow - timer irq
***** BUS FAULT *****
  Executing thread ID (thread): 0x2000002c
  Faulting instruction address:  0x1b6e
  Imprecise data bus error
Caught system error -- reason 0

    Assertion failed at /home/apboie/projects/zephyr/tests/kernel/fatal/src/main.c:224: test_fatal: )
bad reason code got 0 expected 2

Reason code 0 is _NANO_ERR_CPU_EXCEPTION, the expected code of 2 is _NANO_ERR_STACK_CHK_FAIL.

I reproduced this on frdm_k64f.

We need to fix this:

  • There is logic in fault.c for dumping out MPU faults, but it is never called even though this is an MPU access violation. The user will not know that the fault was due to illegal memory access.
  • If the MPU fault was due to access violation within the region used for the stack guard, the reason code provided to _NanoFatalErrorHandler should be _NANO_ERR_STACK_CHK_FAIL and not _NANO_ERR_HW_EXCEPTION.

I'm committing a FIXME workaround to the fatal test until this is fixed.

@andrewboie andrewboie added bug The issue is a bug, or the PR is fixing a bug area: ARM ARM (32-bit) Architecture area: Memory Protection labels May 22, 2018
andrewboie pushed a commit that referenced this issue May 22, 2018
An errant commit accidentally disabled all testing of
hardware-based stack protection. Restore it, and work
around a problem with how these kinds of exceptions are
reported on ARM until #7706 is fixed.

Signed-off-by: Andrew Boie <[email protected]>
andrewboie pushed a commit that referenced this issue May 22, 2018
An errant commit accidentally disabled all testing of
hardware-based stack protection. Restore it, and work
around a problem with how these kinds of exceptions are
reported on ARM until #7706 is fixed.

We need to globally disable user mode due to how the
select statements in Kconfig work, the stack sentinel
is incompatible with user mode.

Some build warnings when compiling as native_posix
fixed.

Signed-off-by: Andrew Boie <[email protected]>
@ioannisg ioannisg self-assigned this May 22, 2018
@andrewboie
Copy link
Contributor Author

per @MaureenHelm this may be specific to the NXP MPU

@MaureenHelm
Copy link
Member

The NXP MPU throws a bus fault on an access violation, whereas the ARM MPU throws a MemManage fault. Running this test on the mimxrt1050_evk board, which has the ARM MPU, gives the result:

test stack overflow - timer irq
posting 2048 bytes of junk to stack...
***** MPU FAULT *****
  Executing thread ID (thread): 0x2000002c
  Faulting instruction address:  0xb7c
  Data Access Violation
  Address: 0x20001000
Caught system error -- reason 0

andrewboie pushed a commit to andrewboie/zephyr that referenced this issue May 22, 2018
An errant commit accidentally disabled all testing of
hardware-based stack protection. Restore it, and work
around a problem with how these kinds of exceptions are
reported on ARM until zephyrproject-rtos#7706 is fixed.

We need to globally disable user mode due to how the
select statements in Kconfig work, the stack sentinel
is incompatible with user mode.

Some build warnings when compiling as native_posix
fixed.

Signed-off-by: Andrew Boie <[email protected]>
@andrewboie andrewboie changed the title ARM hardware stack overflows do not report _NANO_ERR_STACK_CHK_FAIL or provide MPU fault information ARM NXP hardware stack overflows do not report _NANO_ERR_STACK_CHK_FAIL or provide MPU fault information May 22, 2018
andrewboie pushed a commit that referenced this issue May 22, 2018
An errant commit accidentally disabled all testing of
hardware-based stack protection. Restore it, and work
around a problem with how these kinds of exceptions are
reported on ARM until #7706 is fixed.

We need to globally disable user mode due to how the
select statements in Kconfig work, the stack sentinel
is incompatible with user mode.

Some build warnings when compiling as native_posix
fixed.

Signed-off-by: Andrew Boie <[email protected]>
@ioannisg
Copy link
Member

ioannisg commented May 23, 2018

If I understand correctly, @andrewboie , _NANO_ERR_HW_EXCEPTION is the sole error code used in arm's fault.c, so this part of the problem does not look like it is only an NXP issue, right?

Looking at @MaureenHelm 's log, "reason" is still 0.

@MaureenHelm
Copy link
Member

If I understand correctly, @andrewboie , _NANO_ERR_HW_EXCEPTION is the sole error code used in arm's fault.c, so this part of the problem does not look like it is only an NXP issue, right?

Right. The NXP-specific issue is @andrewboie 's first bullet about not dumping MPU faults on parts like k64f that have the NXP MPU, because they take a bus fault instead. Parts like i.mx rt1050 that have the ARM MPU do dump MPU faults as expected.

In both cases, the reason code is still 0.

@ioannisg
Copy link
Member

ioannisg commented Feb 5, 2019

Should we try to get this into 1.14, @MaureenHelm ? I might have some time for that, but only after Friday (bu that is ok for bug fixes, right?)

@andrewboie
Copy link
Contributor Author

we have plenty of time to address bug fixes for the LTS

@andrewboie
Copy link
Contributor Author

we tripped over this again in #13166

@ioannisg ioannisg added this to the v1.14.0 milestone Feb 12, 2019
ioannisg added a commit to ioannisg/zephyr that referenced this issue Feb 12, 2019
This commit does the following:
- Refactors the Stack Corruption reporting in ARM platforms,
  by moving the evaluation of stack corruption out of the
  fault.c file.
- Fixes the issue of NXP_MPU-based platforms not reporting
  stack corruption failures properly.
- Modify the #ifdef guards around ARMv8-M stack overflow
  capturing in UsageFault handler, so the BUILTIN_STACK_GUARD
  macro is used instead of the (generic) HW_STACK_PROTETION.

Fixes zephyrproject-rtos#7706

Signed-off-by: Ioannis Glaropoulos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants