net/ethernet: Remove inserted L2 header buffer #12563

tbursztyka · 2019-01-18T07:29:10Z

The packet can be referenced somewhere else and letting the newly added
L2 header will generate corrupt packet. If the same packet is being
resent, ethernet will add again its L2 header. Thus the need to remove
such L2 header every time a packet has been sent, successfully or not.

Fixes #12560

Signed-off-by: Tomasz Bursztyka [email protected]

codecov-io · 2019-01-18T08:05:42Z

Codecov Report

Merging #12563 into master will increase coverage by 0.13%.
The diff coverage is 92.3%.

@@            Coverage Diff             @@
##           master   #12563      +/-   ##
==========================================
+ Coverage   48.63%   48.76%   +0.13%     
==========================================
  Files         313      313              
  Lines       46439    46452      +13     
  Branches    10710    10710              
==========================================
+ Hits        22585    22652      +67     
+ Misses      19398    19312      -86     
- Partials     4456     4488      +32

Impacted Files	Coverage Δ
subsys/net/ip/net_pkt.c	`65.33% <100%> (+0.07%)`	⬆️
subsys/net/l2/ethernet/ethernet.c	`53.07% <87.5%> (-0.85%)`	⬇️
subsys/net/ip/ipv6.c	`56.57% <0%> (-0.8%)`	⬇️
subsys/net/ip/net_core.c	`63.22% <0%> (-0.65%)`	⬇️
subsys/net/ip/net_if.c	`61.04% <0%> (+0.64%)`	⬆️
include/net/net_pkt.h	`85.65% <0%> (+0.89%)`	⬆️
subsys/net/ip/ipv6_nbr.c	`34.22% <0%> (+6.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d9bbdf...9884ffb. Read the comment docs.

jukkar · 2019-01-18T08:10:12Z

subsys/net/l2/ethernet/ethernet.c

 	if (ret != 0) {
 		eth_stats_update_errors_tx(iface);
 		goto error;
 	}
 #if defined(CONFIG_NET_STATISTICS_ETHERNET)
 	ethernet_update_tx_stats(iface, pkt);
 #endif
-	ret = net_pkt_get_len(pkt);
+	ret = net_pkt_get_len(pkt) + sizeof(struct net_eth_hdr);


The ret value will be incorrect if the packet has VLAN header.

jukkar · 2019-01-18T08:11:27Z

subsys/net/l2/ethernet/ethernet.c

@@ -560,14 +571,17 @@ static int ethernet_send(struct net_if *iface, struct net_pkt *pkt)
 	}

 	ret = api->send(net_if_get_device(iface), pkt);
+
+	ethernet_remove_l2_header(pkt);
+
 	if (ret != 0) {
 		eth_stats_update_errors_tx(iface);
 		goto error;
 	}
 #if defined(CONFIG_NET_STATISTICS_ETHERNET)
 	ethernet_update_tx_stats(iface, pkt);


The statistics will be incorrect now as the ethernet_update_tx_stats() will call net_pkt_get_len(pkt) to get the length. Better would be to remove the l2 header just before we return from the function.

pfalcon · 2019-01-18T11:35:40Z

Commit message:

The packet can be referenced somewhere else and letting the newly added
L2 header will generate corrupt packet.

The word "letting" seems to be mis-chosen here (or another word is lost), so it's hard to grasp the meaning.

pfalcon · 2019-01-18T11:38:41Z

subsys/net/l2/ethernet/ethernet.c

+{
+	struct net_buf *buf;
+
+	/* Let's remove the buffer added in ethernet_fill_header() */


Can the comment be more formal here? E.g. "First fragment is Ethernet header added in ethernet_fill_header()". (Btw, ethernet_fill_header() better be called ethernet_add_header(), ethernet_insert_header(), or ethernet_prepend_header())

I don't see why we would need to change that function name. moreover in that PR.

I don't see why we would need to change that function name. moreover in that PR.

Well, at least I'd suggest to remove "Let's" from the comment, or it gives a feeling that we can either remove it or not, but ok, decided to remove it after all ;-).

pfalcon · 2019-01-18T11:42:06Z

tests/net/context/src/main.c

@@ -1020,16 +1020,12 @@ static int tester_send(struct device *dev, struct net_pkt *pkt)
 		udp_hdr->dst_port = port;
 		net_udp_set_hdr(pkt, udp_hdr);

-		if (net_recv_data(net_pkt_iface(pkt), pkt) < 0) {
+		if (net_recv_data(net_pkt_iface(pkt),


I'm at once not sure why a few tests now require these clones. You fixed issue, everything should just work as it should, right? Perhaps, comments added would be helpful. Is it related to comment below about unref'ing pkt? (On TX path?)

Before it was referencing the pkt, pushing it back. It was working because the l2 header was still there. Now that we remove it, it's much easier to just clone the whole packet.

This is a hack only for the very few tests that make the packet looping back to the core from the fake ethernet device. It's not nice, but well...

aurel32 · 2019-01-18T12:40:05Z

subsys/net/l2/ethernet/ethernet.c

@@ -562,12 +573,14 @@ static int ethernet_send(struct net_if *iface, struct net_pkt *pkt)
 	ret = api->send(net_if_get_device(iface), pkt);
 	if (ret != 0) {
 		eth_stats_update_errors_tx(iface);
+		ethernet_remove_l2_header(pkt);


Let me ask the same question than in the other PR.

I wonder if it is correct to just pull the L2 header, while the ethernet driver still holds a reference of the packet which has just been queued in the DMA engine, and maybe not yet sent. This is the case for the SAM E70 GMAC driver for example. Judging by the comments (https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/ethernet/eth_mcux.c#L90-L104) the MCUX driver might also do that on the future.

I can't test now, but I'll test tonight.

would be really nice if you could try.

Then either the driver would need to ref that buffer again, or we could do something like #11873 : l2 would know that driver will keep an internal queue. (It would only unlink it from pkt->frags, but would not touch the reference)

As it is the very first fragment, it's "frags" attribute will point to the rest of the list, which one is going to be pkt->frags after this removal. So maybe such driver need to store that very first buf pointer in addition to the pkt.

What about driver doing ref for all those net_buf's that it would need to keep a bit longer? So after DMA transfer is finished, it would need to unref the net_buf's that were sent. This way all this housekeeping would be done in the driver and upper layers would not know about this.

If this issue is found in 1 specific driver then yes. If it's in 2+ drivers it's worth factorizing it.

we almost need a separate set of fragments for the headers that doesn't form a complete packet from end to end. The total packet is the aggregation of the headers plus data. This would keep the all of this isolated. The extra work would be having to iterate over a list of lists to get all of the data on transfer.

This would keep the all of this isolated.

Well, the whole idea of the refactor was to allow to reuse the same header buffer for all packets. That was a brave refactor by @tbursztyka, and well, sometimes things don't go the easy way it initially seemed. (No fingerpointing, we just should accept (I did) that if we don't adopt well-known solutions like lwIP, then we're effectively experimenting out, and experiments sometimes fail.)

Reusing the same ll header buffer would work only on eth device that don't queue/dma anything. But it is possible. (thus the PR #11873) Though there are not completely solved issue around gptp. Anyway, that's an optimization for small systems, so it's not the priority atm.

I pushed a new version of the PR. But I bet @aurel32 you'll find issues about queuing so that will need to be addressed next.

Reusing the same ll header buffer would work only on eth device that don't queue/dma anything. But it is possible. (thus the PR #11873)

I wonder if we should really design the network stack with the option of supporting DMA or not. Most (all?) SoCs with Ethernet have DMA support. I agree that not all drivers have to use DMA, and that it's a good idea to keep a few very simple drivers for now to test and validate the network stack. Having DMA support in all Ethernet drivers should still be a long term goal for Zephyr. So I really think the network stack should be able to support DMA by default.

I pushed a new version of the PR. But I bet @aurel32 you'll find issues about queuing so that will need to be addressed next.

I confirm that there are issues, actually just a few seconds after booting:

[00:00:00.000,069] <inf> i2c_sam_twihs: Device I2C_0 initialized [00:00:00.001,249] <inf> eth_sam: MAC: fc:c2:3d:12:e9:21 [00:00:00.001,385] <inf> eth_sam: Queue 0 activated [00:00:00.001,390] <inf> eth_sam_phy: Soft Reset of ETH PHY uart:~$ [00:00:00.140,016] <inf> eth_sam_phy: PHYID: 0x221561 at addr: 0 [00:00:00.140,016] <inf> eth_sam_phy: PHYID: 0x221561 at addr: 0 uart:~$ ***** Booting Zephyr OS zephyr-v1.13.0-3341-gc18fe74b53 ***** [00:00:02.820,019] <inf> eth_sam_phy: common abilities: speed 100 Mb, full duplex [00:00:02.820,221] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,240] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.820,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.820,449] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,566] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569) [00:00:02.820,583] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,700] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569) [00:00:02.820,719] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.821,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.821,453] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) 0:02.826,257] <inf> net_config: Initializing network 00:00:02.820,019] <inf> eth_sam_phy: common abilities: speed 100 Mb, full duplex uart:~$ [00:00:02.826,281] <inf> net_config: IPv4 address: 10.243.120.200 [00:00:02.820,221] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,240] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.820,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.820,449] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,566] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569) [00:00:02.820,583] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.820,700] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569) [00:00:02.820,719] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.821,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569) [00:00:02.821,453] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569) [00:00:02.826,257] <inf> net_config: Initializing network [00:00:02.826,281] <inf> net_config: IPv4 address: 10.243.120.200 uart:~$ Single-threaded dumb HTTP server waits for a connection on port 8080...

The driver, when queuing, should reference the pkt once more, but also ref the first net_buf, and keep it by its own means (out of pkt, since l2 is going to remove it from the list).

pfalcon · 2019-01-18T14:51:28Z

See #12560 (comment) for the intro into how I do testing.

So, with this patch, situation with dumb_http_server is much more lively (no errors reported with just master):

Connection #76 from 192.0.2.2
Connection from 192.0.2.2 closed
[00:00:30.802,750] <err> net_tcp: NULL TCP header!
--- 661 messages dropped ---
[00:00:30.802,757] <err> net_tcp: pkt 0x20003548 has no TCP header
[00:00:30.802,766] <err> net_pkt: NULL fragment data!
[00:00:30.802,775] <err> net_tcp: NULL TCP header!
[00:00:30.802,781] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,834] <err> net_pkt: NULL fragment data!
[00:00:30.810,841] <err> net_tcp: NULL TCP header!
[00:00:30.810,867] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,876] <err> net_pkt: NULL fragment data!
[00:00:30.810,884] <err> net_tcp: NULL TCP header!
[00:00:30.810,891] <err> net_tcp: pkt 0x20003548 has no TCP header
...

Errors repeat indefinitely.

jukkar · 2019-01-18T15:08:31Z

@tbursztyka This patch should fix the IPv6 test which is failing

0001-tests-net-ipv6-Fix-the-DAD-failure-in-the-test.patch.txt

tbursztyka · 2019-01-18T17:33:23Z

@pfalcon so basically tcp pkt is fully fucked up with this PR? What's the setup so I can try to reproduce it?

agross-oss · 2019-01-18T17:35:29Z

@tbursztyka I think his issues went away when he restricted the frdm ethernet driver to only 1 RX buffer.

pfalcon · 2019-01-18T19:22:52Z

so basically tcp pkt is fully fucked up with this PR?

Well, I didn't say that. I mostly trying to find the answer to question posed by @agross-linaro in #12560 - how on earth nobody noticed those issues before him? So, I'm trying different things, and well, while I see various issues, I have to conclude that those look like different issues than @agross-linaro's, i.e. my standard testing process doesn't hit cases as used by him :-I.

What's the setup so I can try to reproduce it?

My test plans are described in https://github.com/pfalcon/zephyr/wiki/NetworkingTestPlan

aurel32 · 2019-01-18T22:14:19Z

See #12560 (comment) for the intro into how I do testing.

So, with this patch, situation with dumb_http_server is much more lively (no errors reported with just master):

Connection #76 from 192.0.2.2
Connection from 192.0.2.2 closed
[00:00:30.802,750] <err> net_tcp: NULL TCP header!
--- 661 messages dropped ---
[00:00:30.802,757] <err> net_tcp: pkt 0x20003548 has no TCP header
[00:00:30.802,766] <err> net_pkt: NULL fragment data!
[00:00:30.802,775] <err> net_tcp: NULL TCP header!
[00:00:30.802,781] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,834] <err> net_pkt: NULL fragment data!
[00:00:30.810,841] <err> net_tcp: NULL TCP header!
[00:00:30.810,867] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,876] <err> net_pkt: NULL fragment data!
[00:00:30.810,884] <err> net_tcp: NULL TCP header!
[00:00:30.810,891] <err> net_tcp: pkt 0x20003548 has no TCP header
...

Errors repeat indefinitely.

When applying the patch, I also get the same issue on an RT1020 board, even after setting CONFIG_ETH_MCUX_RX_BUFFERS=1 and CONFIG_ETH_MCUX_TX_BUFFERS=1.

net_pkt_clone() was missing some attributes. Signed-off-by: Tomasz Bursztyka <[email protected]>

tbursztyka · 2019-02-06T09:12:13Z

@aurel32 There was an obvious mistake in how ethernet removed the inserted net_buf. Can you retry?

tbursztyka · 2019-02-06T09:35:32Z

@pfalcon can you retry?

pfalcon · 2019-02-06T09:46:59Z

@tbursztyka: Note that I wasn't able to reproduce any particular adverse effects in my usual testing. So I guess, it's up to @agross-linaro (and @aurel32 ?) to check whether it improves things for them. I'm happy to +1 it, but would wait for feedback from them. I'll also try to test this PR later anyway.

The packet can be referenced somewhere else and letting the newly added L2 header will generate corrupt packet. If the same packet is being resent, ethernet will add again its L2 header. Thus the need to remove such L2 header every time a packet has been sent, successfully or not. Fixes zephyrproject-rtos#12560 Signed-off-by: Tomasz Bursztyka <[email protected]>

The DAD was failing because we received the DAD network packet even when should not had received it. Fix it by discarding all the ICMPv6 NS packets that we receive. Signed-off-by: Jukka Rissanen <[email protected]>

aurel32 · 2019-02-06T13:50:47Z

Another fresh thought: the whole initial invariant was that driver's ->send() is blocking, i.e. it returns when the packet has actually gone on the wire, right? Well, let's enforce that (for now and foreseeable future). Then eth_sam_gmac should actually block on a semaphore to wait when DMA actually has sent the packet out.

That won't work for PTP. And the GMAC driver is relatively easy to "fix" if we agree the packet can be modified, by referencing the fragments instead of the packet. I'll try to work on that tonight.

aurel32 · 2019-02-06T13:52:37Z

Hmm, if we change the GMAC driver to use the fragments instead of the packet, I have the impression that the only real user of the reference counter in net_pkt will be the PTP code. Otherwise the reference counter will only be 1.

tbursztyka · 2019-02-06T14:08:42Z

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

@aurel32 I was thinking of such signature int (*send)(struct device *dev, struct net_pkt *pkt, struct net_buf *l2_hdr);

But that will look very hackish on driver side: they'll need first to read from the l2_hdr, then only read from the net_pkt to fill in their local memory.

Let's give it a bit more thoughts.

aurel32 · 2019-02-06T14:11:49Z

@tbursztyka One solution for the PTP part is to clone the packet when PTP is used. This is also very hackish, though I don't know which solution is the more hackish.

tbursztyka · 2019-02-06T14:18:37Z

@aurel32 cloning is costly also.

Too bad we cannot guaranty running the whole TX thread before TS one.

pfalcon · 2019-02-06T14:40:40Z

@tbursztyka

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

Of course it can! Yes, that's not too efficient, but we need to make stuff work first, optimize later. There're too many moving parts in there, we need to fix something.

@aurel32:

I have the impression that the only real user of the reference counter in net_pkt will be the PTP code. Otherwise the reference counter will only be 1.

Here's another emerging usecase fo refcounts: #12564 (comment) . Note that it's on receive path. And as we can see, there're widely different requirements and issues to address on TX vs RX paths.

aurel32 · 2019-02-06T14:45:35Z

@tbursztyka

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

Of course it can! Yes, that's not too efficient, but we need to make stuff work first, optimize later. There're too many moving parts in there, we need to fix something.

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

tbursztyka · 2019-02-06T14:47:17Z

yes, let's find a way to properly fix this. We have almost 2months ahead of us, it should be enough.

pfalcon · 2019-02-06T14:49:45Z

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

Well, I already told @mnkp on another occasion that gmac driver is far to advanced for the current state of Zephyr and needs to scale down. Story repeats itself.

If you think you can handle that all together, ok.

aurel32 · 2019-02-06T14:54:15Z

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

Well, I already told @mnkp on another occasion that gmac driver is far to advanced for the current state of Zephyr and needs to scale down. Story repeats itself.

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

If you think you can handle that all together, ok.

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part. I don't know about the PTP part, but that's not specific to the GMAC driver.

Now using your arguments that Zephyr needs to scale down, maybe we can just remove PTP support from the Ethernet drivers?

The SAM GMAC Ethernet driver uses scatter gather DMA to transmit data. Each fragment of a network packet is mapped from a set of descriptors that is used by the controller to do the DMA transfer. This means that the packet is not necessary sent when the send() function returns. For that reason the driver calls net_pkt_ref() on the packet to prevent it from being freed. It is then unreferenced with net_pkt_unref() in the TX ISR when the packet has effectively been sent. However this doesn't work if the packet is modified in the meantime, like it will be done in PR zephyrproject-rtos#12563 to remove the Ethernet header contained in the first fragment. To avoid that, call net_pkt_frag_ref() on each fragment of the packet, and unreferenced them with net_pkt_frag_unref() in the TX ISR when the packet has effectively been sent. Signed-off-by: Aurelien Jarno <[email protected]>

pfalcon · 2019-02-07T12:25:44Z

@aurel32:

To not leave this without reply (because it's important to be on the same line, or at least understand approaches of other folks):

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

No, we just do things in stages. We currently have a broken stack which pretends to support DMA (but as it's broken, it's hard to tell how well it does). The idea is to fix up moving parts for the time being and fix the stack. We can revisit "get everything out of DMA" case after that (which may require bigger changes, e.g. add a dichotomy of memory overhead optimized drivers vs performance-optimized drivers). I doubt we can do it right in haste.

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part.

That's absolutely great that we have people that want to fix things right here, right now! But I'm fixing up things here for 3rd year, and it's still broken. Hint-hint - just patching things around may be not the best option ;-).

I don't know about the PTP part, but that's not specific to the GMAC driver.

Yes, that's my point - gmac driver is a sole white crow. Whereas gPTP is a well-know little PITA, behind a gate (disable it and stop caring about it).

Now using your arguments that Zephyr needs to scale down, maybe we can just remove PTP support from the Ethernet drivers?

Well, I very rarely argue for code removal. But code gets removed anyway (e.g. hasty net_app case). No, I don't hint there's a chance for gPTP to be removed. But @jukkar invited me to look to adding it and VLAN support to some drivers, I looked into it, and ... decided I won't do anything about it. Because otherwise it requires to start another crusade about the need to refactor them. For example, raw gPTP is sprinkled/duplicated from one driver to another. I have no idea how it ended up like that. It needs to be reworked for sure. Or... just left alone (if you don't need that gPTP, which I thankfully don't so far).

Just my 2 cents ;-)

aurel32 · 2019-02-07T15:27:33Z

@aurel32:

To not leave this without reply (because it's important to be on the same line, or at least understand approaches of other folks):

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

No, we just do things in stages. We currently have a broken stack which pretends to support DMA (but as it's broken, it's hard to tell how well it does).

Well it used to work with DMA, and the recent refactoring actually requires to do changes which makes it incompatible with DMA. It means that this stage went into the wrong direction wrt DMA.

The idea is to fix up moving parts for the time being and fix the stack. We can revisit "get everything out of DMA" case after that (which may require bigger changes, e.g. add a dichotomy of memory overhead optimized drivers vs performance-optimized drivers). I doubt we can do it right in haste.

Well the point of DMA is performance, but also avoid copying data, so I don't get your point about the dichotomy. If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part.

That's absolutely great that we have people that want to fix things right here, right now! But I'm fixing up things here for 3rd year, and it's still broken. Hint-hint - just patching things around may be not the best option ;-).

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent. It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver. But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not. If that's the way to go, I can provide a patch for the GMAC driver and open bugs against the other drivers.

OTOH the RX path of the driver should be almost free of requirements, given the RX workqueue basically serializes the reception of packets. Instead the requirements are on the other side of the RX stack when the data reach the application.

pfalcon · 2019-02-07T15:44:47Z

It means that this stage went into the wrong direction wrt DMA.

Exactly my point. It went its own way, and everything now needs to be reworked. And instead of concentrating on reworking the code, we're spread thin to keep DMA case from falling apart. Just forget about it for some time, fix the single experimental driver to work like all other drivers do.

If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

And that's the big issue with that driver. It need tons of buffers, and instead of making its own, it still those buffers from system (== all other drivers, and user applications). Until that is fixed, that driver will always be a PITA.

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent.

If you guys can't fix it well enough during next week, I'd say that's really good plan to follow.

It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver.

The main point - no need to remove any code from GMAC driver. Just need to introduce a blocking on mutex/sema at the end of its send() - i.e. return not when DMA request is set up and started, but when it's finished. The DMA itself will be used, and later the whole matter can be revisited.

But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not.

Exactly my point either. Instead of trying to franken-patch DMA support and dig rabbit wholes where refcounts on packets are separate from refcounts on the underlying buffers, let's better brush up and document the requirements for the baseline networking drivers. Once that is there, we can think how to to add "advanced drivers" model in addition to that.

OTOH the RX path of the driver should be almost free of requirements

Hopefully so. Again, worth thinking. As mentioned above, there's a usecase of putting the same packet in different queues. If we can't get to work with refcounts, it may be good time to question whether we need all those refcounts at all.

aurel32 · 2019-02-07T16:28:30Z

It means that this stage went into the wrong direction wrt DMA.

Exactly my point. It went its own way, and everything now needs to be reworked. And instead of concentrating on reworking the code, we're spread thin to keep DMA case from falling apart. Just forget about it for some time, fix the single experimental driver to work like all other drivers do.

Well "like all other drivers do" is not really something that exists, they all are different and use different assumptions. At least one other will be broken by that L2 header removal in the PTP case.

If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

And that's the big issue with that driver. It need tons of buffers, and instead of making its own, it still those buffers from system (== all other drivers, and user applications). Until that is fixed, that driver will always be a PITA.

I disagree here. First of all making its own buffer, means they have to be copied at some point, which means more memory. Secondly this driver requires more buffers than other drivers, because it enforce at compile time the amount of buffers needed to receive a full Ethernet frame. I am not sure how the other drivers handle the situation of not having enough RX buffers to receive a frame (well I am almost sure they all behave differently), but I guess we can just remove the compile time check and just drop the frame when it happens. Now I still believe the current situation is better than other drivers that just need more stack than the default TX stack size to be able to send a full Ethernet frame.

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent.

If you guys can't fix it well enough during next week, I'd say that's really good plan to follow.

Well I have already proposed a PR that fixes the non PTP case already. As already said the PTP case has to be fixed for both MCUX and GMAC, so we probably want a common solution here.

It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver.

The main point - no need to remove any code from GMAC driver. Just need to introduce a blocking on mutex/sema at the end of its send() - i.e. return not when DMA request is set up and started, but when it's finished. The DMA itself will be used, and later the whole matter can be revisited.

I kind of disagree there. You end-up with a lot of code that is useless but is still executed. For example there is no need to ref the packet and unref it in the ISR. In addition this will just bit-rot and will likely explode once the mutex is removed.

But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not.

Exactly my point either. Instead of trying to franken-patch DMA support and dig rabbit wholes where refcounts on packets are separate from refcounts on the underlying buffers, let's better brush up and document the requirements for the baseline networking drivers. Once that is there, we can think how to to add "advanced drivers" model in addition to that.

OTOH the RX path of the driver should be almost free of requirements

Hopefully so. Again, worth thinking. As mentioned above, there's a usecase of putting the same packet in different queues. If we can't get to work with refcounts, it may be good time to question whether we need all those refcounts at all.

I get your point of using refcounts to put the same packet in different queues, that said I am really not convinced this should be done at the Ethernet driver level. Therefore this is an issue of the IP stack.

agross-oss

Tested on frdm_k64f w/ google iot application

The SAM GMAC Ethernet driver uses scatter gather DMA to transmit data. Each fragment of a network packet is mapped from a set of descriptors that is used by the controller to do the DMA transfer. This means that the packet is not necessary sent when the send() function returns. For that reason the driver calls net_pkt_ref() on the packet to prevent it from being freed. It is then unreferenced with net_pkt_unref() in the TX ISR when the packet has effectively been sent. However this doesn't work if the packet is modified in the meantime, like it will be done in PR #12563 to remove the Ethernet header contained in the first fragment. To avoid that, call net_pkt_frag_ref() on each fragment of the packet, and unreferenced them with net_pkt_frag_unref() in the TX ISR when the packet has effectively been sent. Signed-off-by: Aurelien Jarno <[email protected]>

Scale down the TX path of the GMAC driver by waiting for a packet to be fully sent before returning from the send function. This has a small performance impact, but has a few advantages: - It allows the Ethernet code to modify the packet afterward, fixing PTP support on this board (see PR zephyrproject-rtos#12563). - It returns an error to the IP stack in case of a transmit failure. - It doesn't require net_buf to be thread safe. This change can be reverted by changing GMAC_MULTIPLE_TX_PACKETS from 0 to 1. Signed-off-by: Aurelien Jarno <[email protected]>

Scale down the TX path of the GMAC driver by waiting for a packet to be fully sent before returning from the send function. This has a small performance impact, but has a few advantages: - It allows the Ethernet code to modify the packet afterward, fixing PTP support on this board (see PR #12563). - It returns an error to the IP stack in case of a transmit failure. - It doesn't require net_buf to be thread safe. This change can be reverted by changing GMAC_MULTIPLE_TX_PACKETS from 0 to 1. Signed-off-by: Aurelien Jarno <[email protected]>

Wait in the send callback for the packet to be actually sent. After this change, only one TX packet will be handled at once. This is needed because of the way the TX packets are currently handled in L2 after this PR: zephyrproject-rtos#12563 This is similar to what zephyrproject-rtos#13167 did for the SAM GMAC on SAM E-70. Without this, packet time-stamping does not work with the current stack. This commit is minimalistic on purpose to make it easily revertible when the network stack is able to properly handle DMA drivers for TX packets again. Signed-off-by: Tomasz Gorochowik <[email protected]>

Wait in the send callback for the packet to be actually sent. After this change, only one TX packet will be handled at once. This is needed because of the way the TX packets are currently handled in L2 after this PR: #12563 This is similar to what #13167 did for the SAM GMAC on SAM E-70. Without this, packet time-stamping does not work with the current stack. This commit is minimalistic on purpose to make it easily revertible when the network stack is able to properly handle DMA drivers for TX packets again. Signed-off-by: Tomasz Gorochowik <[email protected]>

tbursztyka requested review from jukkar and agross-oss January 18, 2019 07:29

jukkar requested changes Jan 18, 2019

View reviewed changes

jukkar added the area: Networking label Jan 18, 2019

tbursztyka force-pushed the eth_hdr_rem branch from b3dca21 to 0047973 Compare January 18, 2019 11:01

tbursztyka requested a review from pfalcon as a code owner January 18, 2019 11:01

pfalcon reviewed Jan 18, 2019

View reviewed changes

aurel32 reviewed Jan 18, 2019

View reviewed changes

agross-oss mentioned this pull request Jan 24, 2019

Using TCP w/ wired NIC results in mismanagement of buffers due to ACK accounting error #12560

Closed

tbursztyka mentioned this pull request Jan 24, 2019

net: pkt: fix race condition in packet reference counting #12709

Merged

jukkar added this to the v1.14.0 milestone Jan 30, 2019

tbursztyka force-pushed the eth_hdr_rem branch from ec9e267 to 2ff3a22 Compare February 5, 2019 13:11

net/pkt: Make sure all attributes are copied while cloning

7af73b4

net_pkt_clone() was missing some attributes. Signed-off-by: Tomasz Bursztyka <[email protected]>

tbursztyka force-pushed the eth_hdr_rem branch from 2ff3a22 to 8649f68 Compare February 6, 2019 09:11

jukkar mentioned this pull request Feb 6, 2019

net: ethernet: Fix accounting of Ethernet packets #12561

Closed

Tomasz Bursztyka and others added 2 commits February 6, 2019 12:17

tests: net: ipv6: Fix the DAD failure in the test

9884ffb

The DAD was failing because we received the DAD network packet even when should not had received it. Fix it by discarding all the ICMPv6 NS packets that we receive. Signed-off-by: Jukka Rissanen <[email protected]>

tbursztyka force-pushed the eth_hdr_rem branch from 8649f68 to 9884ffb Compare February 6, 2019 11:18

This was referenced Feb 6, 2019

drivers: eth: gmac: keep a reference to the packet fragments #13112

Merged

net subsystem needs documentation on how net_pkt and net_buf can be used #13114

Closed

agross-oss approved these changes Feb 7, 2019

View reviewed changes

jukkar approved these changes Feb 8, 2019

View reviewed changes

aurel32 mentioned this pull request Feb 8, 2019

driver: eth: gmac: scale down the TX path #13167

Merged

jukkar merged commit 8de351c into zephyrproject-rtos:master Feb 8, 2019

tgorochowik mentioned this pull request Feb 20, 2019

eth: eth_mcux: wait in send for the packet to be sent [REVERT ME] #13556

Merged

tbursztyka deleted the eth_hdr_rem branch March 12, 2019 14:42

net/ethernet: Remove inserted L2 header buffer #12563

net/ethernet: Remove inserted L2 header buffer #12563

Uh oh!

Conversation

tbursztyka commented Jan 18, 2019

Uh oh!

codecov-io commented Jan 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Jan 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Jan 18, 2019

Uh oh!

jukkar commented Jan 18, 2019

Uh oh!

tbursztyka commented Jan 18, 2019

Uh oh!

agross-oss commented Jan 18, 2019

Uh oh!

pfalcon commented Jan 18, 2019

Uh oh!

aurel32 commented Jan 18, 2019

Uh oh!

tbursztyka commented Feb 6, 2019

Uh oh!

tbursztyka commented Feb 6, 2019

Uh oh!

pfalcon commented Feb 6, 2019

Uh oh!

aurel32 commented Feb 6, 2019

Uh oh!

aurel32 commented Feb 6, 2019

Uh oh!

tbursztyka commented Feb 6, 2019

Uh oh!

aurel32 commented Feb 6, 2019

Uh oh!

tbursztyka commented Feb 6, 2019

Uh oh!

pfalcon commented Feb 6, 2019

Uh oh!

aurel32 commented Feb 6, 2019

Uh oh!

tbursztyka commented Feb 6, 2019

Uh oh!

pfalcon commented Feb 6, 2019

Uh oh!

codecov-io commented Jan 18, 2019 •

edited

Loading