Skip to content

MPU fault on performing fifo operations #13110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karthikprabhu17 opened this issue Feb 6, 2019 · 4 comments
Closed

MPU fault on performing fifo operations #13110

karthikprabhu17 opened this issue Feb 6, 2019 · 4 comments
Labels
bug The issue is a bug, or the PR is fixing a bug

Comments

@karthikprabhu17
Copy link

Describe the bug
Whenever I try to insert a fifo element inside mqtt async evt callback, I get a a MPU fault on my sam_e70_board.

***** MPU FAULT *****
Data Access Violation
MMFAR Address: 0x401779
***** Hardware exception *****
Current thread ID = 0x20409254
Faulting instruction address = 0x41cbd8
Fatal fault in ISR! Spinning...

Steps taken for diagnosis:
I ran k_isr_in_isr() inside the client->evt_cb. It returns a 1 which means it runs with interrupt context.
I tried offloading from this ISR to systemworkqueue thread and that did not give MPU fault. But the receiving thread(main thread) still did not get my fifo item

To Reproduce
Steps to reproduce the behavior:

  1. mkdir build;
  2. cd build
  3. cmake -DWEST_DIR=/Users/kprabhuv/new_zephyr/west ../
  4. make flash
  5. Trigger a MQTT_PUBLISH evt, the client->evt_cb is called but inside that a fifo_put operation fails.

Expected behavior
Since as per documentation, an ISR or any other thread should be able to easily perform k_fifo_put operations and receiving thread should be able to receive it. There should no MPU faults.

Impact
It is stopping us from moving our mqtt application to a MQTT_SOCK implementation. I didnt face these issues with MQTT legacy. Also, legacy has been removed. So we are in a race against time.

Screenshots or console output
If applicable, add a screenshot (drag-and-drop an image), or console logs
(cut-and-paste text and put a code fence (```) before and after, to help
explain the issue.

Environment (please complete the following information):

Additional context
Add any other context about the problem here.

@karthikprabhu17 karthikprabhu17 added the bug The issue is a bug, or the PR is fixing a bug label Feb 6, 2019
@andrewboie
Copy link
Contributor

How confident are you that you are passing valid data to the k_fifo APIs?

A fifo is just a k_queue, which is just a set of memory blocks whose first 32-bits are reserved for linked list pointers, which creates the ordered queue.

It looks to me like this is a memory corruption issue where the pointers are being modified incorrectly somehow. Depending on how they are corrupted, it could truncate the queue (if it gets zeroed) or cause invalid memory to be accessed as the next queue item when the API is called.

You might want to look at the queue items in a debugger and see where the 0x401779 address is coming from.

It's possible, but seems very unlikely that the culprit is somewhere in kernel/, I would exhaustively investigate any garbage-in-garbage-out scenarios first.

@karthikprabhu17
Copy link
Author

@andrewboie: @andyross helped me diagnose the issue. I was able to solve this by bumping the CONFIG_ISR_STACK_SIZE from 2048 to 4096.

The instruction was trying to write to an address that was not writable

For future, is gdb the only tool I need to use diagnose such issues?

I will close this. we still need #12711

@ioannisg
Copy link
Member

ioannisg commented Feb 6, 2019

@andrewboie: @andyross helped me diagnose the issue. I was able to solve this by bumping the CONFIG_ISR_STACK_SIZE from 2048 to 4096.

The instruction was trying to write to an address that was not writable

For future, is gdb the only tool I need to use diagnose such issues?

I will close this. we still need #12711

@andrewboie it would have been nice to be able to detect ISR stack overflow on ARM. Is there a way to do it?
The stack overflow detection in ARMv8 is available for ARMv8-M, though :)

@andrewboie
Copy link
Contributor

@andrewboie it would have been nice to be able to detect ISR stack overflow on ARM. Is there a way to do it?

On X86, we have a non-present guard page immediately preceding the interrupt stack.
Should be doable to have something similar on ARM, but without doing something clever it will eat up an MPU region

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

3 participants