BUS FAULT when running nmap towards echo_async sample #54140

hjuul · 2023-01-26T15:40:51Z

Describe the bug
I am running Zephyr sample samples/net/sockets/echo_async on stm32f769i_disco.
I can telnet into the target and it echos back as expected.

But when I do a port scan with nmap on the IP address of the target, I get a bus fault (see logs below).

This happens both on Zephyr v3.2.0 and on main branch.

To Reproduce
Steps to reproduce the behavior:

west build samples/net/sockets/echo_async --board=stm32f769i_disco -p
west -v flash -d build/ -r jlink
telnet 192.0.2.1 4242 -> it works
nmap 192.0.2.1 -> nmap fails note! Don't run with sudo! (see explanation below)
telnet 192.0.2.1 4242 -> FAILS

This fails without any modifications other than changing the IP address in prj.conf.
Add CONFIG_LOG=y and CONFIG_LOG_MODE_DEFERRED=y to see the hardfault

Expected behavior
nmap should report the open port (4242).
Application should continue to work regardless of nmap activity.

Impact
Hard-faulting when subjected to a port scan is a critical failure.
This prevents me from upgrading my application from Zephyr v3.0.0 to v3.2.0
(this didn't happen with Zephyr v3.0.0).

Logs and console output
This is the output when I've set CONFIG_LOG=y and CONFIG_LOG_MODE_DEFERRED=y.

*** Booting Zephyr OS build zephyr-v3.2.0-3990-g06d53b1343ba ***
Connection #0 from 10.42.68.123 fd=2
Connection fd=2 closed
Connection #1 from 10.42.68.123 fd=2
[00:00:22.904,000] <err> os: ***** BUS FAULT *****
[00:00:22.904,000] <err> os:   Precise data bus error
[00:00:22.904,000] <err> os:   BFAR Address: 0x5dddfe44
[00:00:22.904,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x5dddfe40  r2/a3:  0x00000001
[00:00:22.904,000] <err> os: r3/a4:  0x3dce0000 r12/ip:  0x00000025 r14/lr:  0x080148bd
[00:00:22.904,000] <err> os:  xpsr:  0x210e2c00
[00:00:22.904,000] <err> os: Faulting instruction address (r15/pc): 0x08014604
[00:00:22.904,000] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:00:22.904,000] <err> os: Current thread: 0x20022138 (unknown)
[00:00:22.966,000] <err> os: Halting system

Looking up in the map file, thread 0x20022138 turns out to be z_main_thread (which runs the socket code).

Environment

OS: Linux
Toolchain: zephyr-sdk-0.15.2
Commit SHA or Version used: v3.2.0 and latest commit on main branch (06d53b1)

Additional context

I first saw this in my own application running on custom hardware (stm32f777ni)
This started happening when I upgraded from Zephyr v3.0.0 to v3.2.0

Note! If you run nmap as privileged user (sudo), the application doesn't fail. This is because the port scanning behaves differently in unprivileged mode. The nmap manual has a lot to say about this, for example:

On Unix boxes, only the privileged user root is generally able to send and receive raw TCP packets.
For unprivileged users, a workaround is automatically employed whereby the connect system call is
initiated against each target port. This has the effect of sending a SYN packet to the target host, in
an attempt to establish a connection.

This should provide some hints to why it fails. I've tried using different nmap scan types and I've also briefly looked at Wireshark logs, but I haven't been able to pinpoint exactly why this happens.

The text was updated successfully, but these errors were encountered:

nordicjm · 2023-01-27T08:16:03Z

So isn't this just a memory buffer issue because essentially you're trying to open too many sockets on the board? Have you tried increasing buffer/stack sizes?

hjuul · 2023-01-27T08:27:54Z

@nordicjm, I didn't try this on the sample application, but on my application I have plenty of buffers and generous buffer sizes, and I also tried tripling the stack size.
With the sample application I tried to create more than five connections through telnet, and the application behaves as expected: When trying to establish the sixth connection, telnet stalls until one of the first five connections are closed, then it successfully establishes the new connection. So I don't think it is related to that.

rlubos · 2023-01-27T16:02:23Z

@hjuul @nordicjm I've identified the problem and opened #54171 to fix the issue.

What nmap does in unpriviliged mode, is that it sends SYN packets to consecutive ports. If it succeeds to open a TCP connection, it immediately sends a RST packet to abort it. This did not play well with over-optimised net_context structure.

The net_context assumed that it can safely share memory for FIFO reserved space and user data. The nmap case proved that this was not always the case. As TCP uses the user data to notify errors to upper layers, and receiving RST packet is considered as an error condition, we've ended up in a situation when user data pointer was overwritten, while the net_context could still await on the accept queue (being simply a FIFO). This damaged the FIFO reserved memory, leading to a crash.

hjuul added the bug The issue is a bug, or the PR is fixing a bug label Jan 26, 2023

nordicjm assigned rlubos Jan 27, 2023

nordicjm added area: Networking platform: STM32 ST Micro STM32 labels Jan 27, 2023

rlubos mentioned this issue Jan 27, 2023

net: context: Separate user data pointer from FIFO reserved space #54171

Merged

fabiobaltieri closed this as completed in #54171 Jan 30, 2023

This was referenced Jan 30, 2023

[Backport v3.2-branch] net: context: Separate user data pointer from FIFO reserved space #54215

Merged

[Backport v2.7-branch] net: context: Separate user data pointer from FIFO reserved space #54216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUS FAULT when running nmap towards echo_async sample #54140

BUS FAULT when running nmap towards echo_async sample #54140

hjuul commented Jan 26, 2023

nordicjm commented Jan 27, 2023

Uh oh!

hjuul commented Jan 27, 2023

Uh oh!

rlubos commented Jan 27, 2023 •

edited

Loading

Uh oh!

BUS FAULT when running nmap towards echo_async sample #54140

BUS FAULT when running nmap towards echo_async sample #54140

Comments

hjuul commented Jan 26, 2023

nordicjm commented Jan 27, 2023

Uh oh!

hjuul commented Jan 27, 2023

Uh oh!

rlubos commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rlubos commented Jan 27, 2023 •

edited

Loading