Skip to content

BUS FAULT when running nmap towards echo_async sample #54140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hjuul opened this issue Jan 26, 2023 · 3 comments · Fixed by #54171
Closed

BUS FAULT when running nmap towards echo_async sample #54140

hjuul opened this issue Jan 26, 2023 · 3 comments · Fixed by #54171
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32

Comments

@hjuul
Copy link
Contributor

hjuul commented Jan 26, 2023

Describe the bug
I am running Zephyr sample samples/net/sockets/echo_async on stm32f769i_disco.
I can telnet into the target and it echos back as expected.

But when I do a port scan with nmap on the IP address of the target, I get a bus fault (see logs below).

This happens both on Zephyr v3.2.0 and on main branch.

To Reproduce
Steps to reproduce the behavior:

  1. west build samples/net/sockets/echo_async --board=stm32f769i_disco -p
  2. west -v flash -d build/ -r jlink
  3. telnet 192.0.2.1 4242 -> it works
  4. nmap 192.0.2.1 -> nmap fails note! Don't run with sudo! (see explanation below)
  5. telnet 192.0.2.1 4242 -> FAILS

This fails without any modifications other than changing the IP address in prj.conf.
Add CONFIG_LOG=y and CONFIG_LOG_MODE_DEFERRED=y to see the hardfault

Expected behavior
nmap should report the open port (4242).
Application should continue to work regardless of nmap activity.

Impact
Hard-faulting when subjected to a port scan is a critical failure.
This prevents me from upgrading my application from Zephyr v3.0.0 to v3.2.0
(this didn't happen with Zephyr v3.0.0).

Logs and console output
This is the output when I've set CONFIG_LOG=y and CONFIG_LOG_MODE_DEFERRED=y.

*** Booting Zephyr OS build zephyr-v3.2.0-3990-g06d53b1343ba ***
Connection #0 from 10.42.68.123 fd=2
Connection fd=2 closed
Connection #1 from 10.42.68.123 fd=2
[00:00:22.904,000] <err> os: ***** BUS FAULT *****
[00:00:22.904,000] <err> os:   Precise data bus error
[00:00:22.904,000] <err> os:   BFAR Address: 0x5dddfe44
[00:00:22.904,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x5dddfe40  r2/a3:  0x00000001
[00:00:22.904,000] <err> os: r3/a4:  0x3dce0000 r12/ip:  0x00000025 r14/lr:  0x080148bd
[00:00:22.904,000] <err> os:  xpsr:  0x210e2c00
[00:00:22.904,000] <err> os: Faulting instruction address (r15/pc): 0x08014604
[00:00:22.904,000] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:00:22.904,000] <err> os: Current thread: 0x20022138 (unknown)
[00:00:22.966,000] <err> os: Halting system

Looking up in the map file, thread 0x20022138 turns out to be z_main_thread (which runs the socket code).

Environment

  • OS: Linux
  • Toolchain: zephyr-sdk-0.15.2
  • Commit SHA or Version used: v3.2.0 and latest commit on main branch (06d53b1)

Additional context

  • I first saw this in my own application running on custom hardware (stm32f777ni)
  • This started happening when I upgraded from Zephyr v3.0.0 to v3.2.0

Note! If you run nmap as privileged user (sudo), the application doesn't fail. This is because the port scanning behaves differently in unprivileged mode. The nmap manual has a lot to say about this, for example:

On Unix boxes, only the privileged user root is generally able to send and receive raw TCP packets.
For unprivileged users, a workaround is automatically employed whereby the connect system call is
initiated against each target port. This has the effect of sending a SYN packet to the target host, in
an attempt to establish a connection.

This should provide some hints to why it fails. I've tried using different nmap scan types and I've also briefly looked at Wireshark logs, but I haven't been able to pinpoint exactly why this happens.

@hjuul hjuul added the bug The issue is a bug, or the PR is fixing a bug label Jan 26, 2023
@nordicjm
Copy link
Collaborator

So isn't this just a memory buffer issue because essentially you're trying to open too many sockets on the board? Have you tried increasing buffer/stack sizes?

@hjuul
Copy link
Contributor Author

hjuul commented Jan 27, 2023

@nordicjm, I didn't try this on the sample application, but on my application I have plenty of buffers and generous buffer sizes, and I also tried tripling the stack size.
With the sample application I tried to create more than five connections through telnet, and the application behaves as expected: When trying to establish the sixth connection, telnet stalls until one of the first five connections are closed, then it successfully establishes the new connection. So I don't think it is related to that.

@rlubos
Copy link
Collaborator

rlubos commented Jan 27, 2023

@hjuul @nordicjm I've identified the problem and opened #54171 to fix the issue.

What nmap does in unpriviliged mode, is that it sends SYN packets to consecutive ports. If it succeeds to open a TCP connection, it immediately sends a RST packet to abort it. This did not play well with over-optimised net_context structure.

The net_context assumed that it can safely share memory for FIFO reserved space and user data. The nmap case proved that this was not always the case. As TCP uses the user data to notify errors to upper layers, and receiving RST packet is considered as an error condition, we've ended up in a situation when user data pointer was overwritten, while the net_context could still await on the accept queue (being simply a FIFO). This damaged the FIFO reserved memory, leading to a crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants