Skip to content

mqtt_socket connect is hung on sam_e70_xplained #12945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karthikprabhu17 opened this issue Jan 31, 2019 · 14 comments · Fixed by #13204
Closed

mqtt_socket connect is hung on sam_e70_xplained #12945

karthikprabhu17 opened this issue Jan 31, 2019 · 14 comments · Fixed by #13204
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: Microchip SAM Microchip SAM Platform (formerly Atmel SAM) priority: medium Medium impact/importance bug

Comments

@karthikprabhu17
Copy link

Describe the bug
The mqtt socket connect is hung while trying to connect to a broker after this commit: 6ad7e13 on a sam_e70_xplained board. I did a git bisect and zeroed on the commit.

After this commit, the tcp_connect perpetually waits on a semaphore on sam_e70: thttps://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/ip/tcp.c#L2655

commit 6ad7e13
Author: Aurelien Jarno [email protected]
Date: Mon Jan 28 23:04:54 2019 +0100

ext: hal: atmel: sam: update to version 2.3.98

This is an import of Atmel SAM E70 HAL version 2.3.98, and only for the
revision A of the chip. The files have been passed through dos2unix to
minimize the differences with the previous version which seems to also
have been imported that way. All the patches that have been applied on
the previous version have been removed.

Signed-off-by: Aurelien Jarno <[email protected]>

What have you tried to diagnose or workaround this issue?

To Reproduce
Steps to reproduce the behavior:

  1. Go to samples/mqtt_publisher and make changes for broker_addr, username, password and client IP settings are right.
  2. mkdir build; cd build
  3. export BOARD=sam_e70_xplained
  4. cmake -DWEST_DIR=/Users/PATH/new_zephyr/west ../
  5. make flash

Expected behavior
Mqtt connect should just not be hung and connect if broker_addr, username, password and client IP settings are right

Impact
This is a showstopper on sam_e70 board

Screenshots or console output
If applicable, add a screenshot (drag-and-drop an image), or console logs
(cut-and-paste text and put a code fence (```) before and after, to help
explain the issue.

Environment (please complete the following information):

  • OS: MAC
  • Toolcha: ZEPHYR_TOOLCHAIN_VARIANT=gnuarmemb
  • Commit SHA or Version used: 6ad7e13

Additional context
Add any other context about the problem here.

@karthikprabhu17 karthikprabhu17 added the bug The issue is a bug, or the PR is fixing a bug label Jan 31, 2019
@nashif
Copy link
Member

nashif commented Jan 31, 2019

@aurel32 can you take a look?

@aurel32
Copy link
Collaborator

aurel32 commented Jan 31, 2019

@karthikprabhu17 Thanks for reporting the bug. I am using TCP/IP with this board, and for me this commit doesn't introduce new issues, although I haven't tried it with MQTT yet. I don't have an MQTT setup ready to try that, but i'll do that tomorrow.

That said, I wonder if it is just another instance of issue #12560, which causes TCP to randomly work or not work. It might be worth trying to apply #12561 to check if the issue you observe disappear. Note that this PR introduces issues, however it should be enough to diagnose the issue.

@karthikprabhu17
Copy link
Author

@aurel32 : Thanks for that pointer.. I will apply patch #12561 and let you know if it worked. Please do let me know about your mqtt test.

@nashif Thanks for following up.

@karthikprabhu17
Copy link
Author

@aurel32: Applying that patch worked for me. Thanks. I had to chat about another issue. Will keep it brief. I realized the irc channels are no longer used.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 4, 2019

@karthikprabhu17: If #12561 fixed for you, please add comment there.

IRC channel in welcome message should have a link to Slack channel: https://tinyurl.com/yarkuemx

@karthikprabhu17
Copy link
Author

@pfalcon : Thanks..Done!

@nashif nashif added the priority: medium Medium impact/importance bug label Feb 6, 2019
@nashif
Copy link
Member

nashif commented Feb 6, 2019

@karthikprabhu17 try this patch #12711

@karthikprabhu17
Copy link
Author

@nashif: Tried this on top of master right now... mqtt_connect works fine. fifo_put still gives

***** BUS FAULT *****
Imprecise data bus error
***** Hardware exception *****
Current thread ID = 0x204092b0
Faulting instruction address = 0x3a
Fatal fault in essential thread! Spinning...

@nashif
Copy link
Member

nashif commented Feb 6, 2019

@karthikprabhu17 I have no idea what fifo_put is, you really need to provide some more details, this is probably worth its own bug.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

Looking at the HAL changes, I have noticed this changed in gmac.h:

-/* -------- GMAC_ISRPQ : (GMAC Offset: 0x400) (R/ 32) Interrupt Status Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_ISRPQ : (GMAC Offset: 0x3fc) (R/ 32) Interrupt Status Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_TBQBAPQ : (GMAC Offset: 0x440) (R/W 32) Transmit Buffer Queue Base Address Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_TBQBAPQ : (GMAC Offset: 0x43c) (R/W 32) Transmit Buffer Queue Base Address Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_RBQBAPQ : (GMAC Offset: 0x480) (R/W 32) Receive Buffer Queue Base Address Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_RBQBAPQ : (GMAC Offset: 0x47c) (R/W 32) Receive Buffer Queue Base Address Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_RBSRPQ : (GMAC Offset: 0x4a0) (R/W 32) Receive Buffer Size Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_RBSRPQ : (GMAC Offset: 0x49c) (R/W 32) Receive Buffer Size Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_IERPQ : (GMAC Offset: 0x600) (/W 32) Interrupt Enable Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_IERPQ : (GMAC Offset: 0x5fc) (/W 32) Interrupt Enable Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_IDRPQ : (GMAC Offset: 0x620) (/W 32) Interrupt Disable Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_IDRPQ : (GMAC Offset: 0x61c) (/W 32) Interrupt Disable Register Priority Queue (index = 1) 0 -------- */

-/* -------- GMAC_IMRPQ : (GMAC Offset: 0x640) (R/W 32) Interrupt Mask Register Priority Queue  (index = 1) 0 -------- */
+/* -------- GMAC_IMRPQ : (GMAC Offset: 0x63c) (R/W 32) Interrupt Mask Register Priority Queue (index = 1) 0 -------- */

All the queue registers have been shifted by 4 bytes, I guess it's because the priority queues are numbered starting counting at 1, the queue 0 being the main queue. That way GMAC_ISRPQ[1] actually correspond to the queue 1 or the first priority queue. The previous version of the HAL probably numbered them starting counting at 0, that's why we have plenty of queue_idx - 1 in our code.

@nashif and @jukkar tested the patch from PR #12711 using this branch https://github.com/jukkar/zephyr/tree/try-sam-e70-rev-b which restore the previous numbering.

I therefore believe we should remove the - 1 from the gmac driver. I say believe because I am not able to reproduce the issue.

@aurel32 aurel32 added the platform: Microchip SAM Microchip SAM Platform (formerly Atmel SAM) label Feb 6, 2019
@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

Well except those are still defined as an array of 2 entries (e.g uint32_t GMAC_ISRPQ[2]) so GCC complains about "array subscript is above array bounds".

@karthikprabhu17
Copy link
Author

karthikprabhu17 commented Feb 6, 2019

@nashif So, Now 12711 on of top master commit da44946 helps resolve the issue. Thanks

@jukkar
Copy link
Member

jukkar commented Feb 7, 2019

tested the patch from PR #12711 using this branch https://github.com/jukkar/zephyr/tree/try-sam-e70-rev-b which restore the previous numbering.

Note that while resolving the merge conflict, I took the version that the PR #12711 was providing. Perhaps I should have used the one that was provided by the new HAL. My merge resolution was mainly mechanical when I was resolving the conflict in HAL. I can do more testing if you have a patch that needs to be run in rev A.

aurel32 added a commit to aurel32/zephyr that referenced this issue Feb 7, 2019
The latest update of the SAM E70 HAL (commit 6ad7e13) shifted the
GMAC priority queues related registers by 4 bytes. It is not cleared
yet, if it is change is a mistake or a change from 0 based indexing
to 1 based indexing. Indeed the main queue is the number 0, and the
first priority queue is therefore the number 1.

In the meantime revert that change, and use the old registers addresses
matching the datasheet.

Fixes zephyrproject-rtos#12945

Signed-off-by: Aurelien Jarno <[email protected]>
nashif pushed a commit that referenced this issue Feb 7, 2019
The latest update of the SAM E70 HAL (commit 6ad7e13) shifted the
GMAC priority queues related registers by 4 bytes. It is not cleared
yet, if it is change is a mistake or a change from 0 based indexing
to 1 based indexing. Indeed the main queue is the number 0, and the
first priority queue is therefore the number 1.

In the meantime revert that change, and use the old registers addresses
matching the datasheet.

Fixes #12945

Signed-off-by: Aurelien Jarno <[email protected]>
dgloeck added a commit to dgloeck/zephyr that referenced this issue Feb 9, 2019
Commit 0a7b45d ("ext: hal: atmel: sam: fix GMAC priority queues
related registers") tried to correct register offsets, but failed to do
so in the structure that is used by our driver to access the registers.

Fixes: zephyrproject-rtos#12945

Signed-off-by: Daniel Glöckner <[email protected]>
@dgloeck
Copy link
Contributor

dgloeck commented Feb 9, 2019

I believe we need the changes from #13204 to really fix this

nashif pushed a commit that referenced this issue Feb 15, 2019
Commit 0a7b45d ("ext: hal: atmel: sam: fix GMAC priority queues
related registers") tried to correct register offsets, but failed to do
so in the structure that is used by our driver to access the registers.

Fixes: #12945

Signed-off-by: Daniel Glöckner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: Microchip SAM Microchip SAM Platform (formerly Atmel SAM) priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants