Skip to content

net/ethernet: Remove inserted L2 header buffer #12563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 8, 2019

Conversation

tbursztyka
Copy link
Collaborator

The packet can be referenced somewhere else and letting the newly added
L2 header will generate corrupt packet. If the same packet is being
resent, ethernet will add again its L2 header. Thus the need to remove
such L2 header every time a packet has been sent, successfully or not.

Fixes #12560

Signed-off-by: Tomasz Bursztyka [email protected]

@codecov-io
Copy link

codecov-io commented Jan 18, 2019

Codecov Report

Merging #12563 into master will increase coverage by 0.13%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12563      +/-   ##
==========================================
+ Coverage   48.63%   48.76%   +0.13%     
==========================================
  Files         313      313              
  Lines       46439    46452      +13     
  Branches    10710    10710              
==========================================
+ Hits        22585    22652      +67     
+ Misses      19398    19312      -86     
- Partials     4456     4488      +32
Impacted Files Coverage Δ
subsys/net/ip/net_pkt.c 65.33% <100%> (+0.07%) ⬆️
subsys/net/l2/ethernet/ethernet.c 53.07% <87.5%> (-0.85%) ⬇️
subsys/net/ip/ipv6.c 56.57% <0%> (-0.8%) ⬇️
subsys/net/ip/net_core.c 63.22% <0%> (-0.65%) ⬇️
subsys/net/ip/net_if.c 61.04% <0%> (+0.64%) ⬆️
include/net/net_pkt.h 85.65% <0%> (+0.89%) ⬆️
subsys/net/ip/ipv6_nbr.c 34.22% <0%> (+6.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d9bbdf...9884ffb. Read the comment docs.

if (ret != 0) {
eth_stats_update_errors_tx(iface);
goto error;
}
#if defined(CONFIG_NET_STATISTICS_ETHERNET)
ethernet_update_tx_stats(iface, pkt);
#endif
ret = net_pkt_get_len(pkt);
ret = net_pkt_get_len(pkt) + sizeof(struct net_eth_hdr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ret value will be incorrect if the packet has VLAN header.

@@ -560,14 +571,17 @@ static int ethernet_send(struct net_if *iface, struct net_pkt *pkt)
}

ret = api->send(net_if_get_device(iface), pkt);

ethernet_remove_l2_header(pkt);

if (ret != 0) {
eth_stats_update_errors_tx(iface);
goto error;
}
#if defined(CONFIG_NET_STATISTICS_ETHERNET)
ethernet_update_tx_stats(iface, pkt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statistics will be incorrect now as the ethernet_update_tx_stats() will call net_pkt_get_len(pkt) to get the length. Better would be to remove the l2 header just before we return from the function.

@pfalcon
Copy link
Collaborator

pfalcon commented Jan 18, 2019

Commit message:

The packet can be referenced somewhere else and letting the newly added
L2 header will generate corrupt packet.

The word "letting" seems to be mis-chosen here (or another word is lost), so it's hard to grasp the meaning.

{
struct net_buf *buf;

/* Let's remove the buffer added in ethernet_fill_header() */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the comment be more formal here? E.g. "First fragment is Ethernet header added in ethernet_fill_header()". (Btw, ethernet_fill_header() better be called ethernet_add_header(), ethernet_insert_header(), or ethernet_prepend_header())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we would need to change that function name. moreover in that PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we would need to change that function name. moreover in that PR.

Well, at least I'd suggest to remove "Let's" from the comment, or it gives a feeling that we can either remove it or not, but ok, decided to remove it after all ;-).

@@ -1020,16 +1020,12 @@ static int tester_send(struct device *dev, struct net_pkt *pkt)
udp_hdr->dst_port = port;
net_udp_set_hdr(pkt, udp_hdr);

if (net_recv_data(net_pkt_iface(pkt), pkt) < 0) {
if (net_recv_data(net_pkt_iface(pkt),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm at once not sure why a few tests now require these clones. You fixed issue, everything should just work as it should, right? Perhaps, comments added would be helpful. Is it related to comment below about unref'ing pkt? (On TX path?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before it was referencing the pkt, pushing it back. It was working because the l2 header was still there. Now that we remove it, it's much easier to just clone the whole packet.

This is a hack only for the very few tests that make the packet looping back to the core from the fake ethernet device. It's not nice, but well...

@@ -562,12 +573,14 @@ static int ethernet_send(struct net_if *iface, struct net_pkt *pkt)
ret = api->send(net_if_get_device(iface), pkt);
if (ret != 0) {
eth_stats_update_errors_tx(iface);
ethernet_remove_l2_header(pkt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask the same question than in the other PR.

I wonder if it is correct to just pull the L2 header, while the ethernet driver still holds a reference of the packet which has just been queued in the DMA engine, and maybe not yet sent. This is the case for the SAM E70 GMAC driver for example. Judging by the comments (https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/ethernet/eth_mcux.c#L90-L104) the MCUX driver might also do that on the future.

I can't test now, but I'll test tonight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be really nice if you could try.

Then either the driver would need to ref that buffer again, or we could do something like #11873 : l2 would know that driver will keep an internal queue. (It would only unlink it from pkt->frags, but would not touch the reference)

As it is the very first fragment, it's "frags" attribute will point to the rest of the list, which one is going to be pkt->frags after this removal. So maybe such driver need to store that very first buf pointer in addition to the pkt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about driver doing ref for all those net_buf's that it would need to keep a bit longer? So after DMA transfer is finished, it would need to unref the net_buf's that were sent. This way all this housekeeping would be done in the driver and upper layers would not know about this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue is found in 1 specific driver then yes. If it's in 2+ drivers it's worth factorizing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we almost need a separate set of fragments for the headers that doesn't form a complete packet from end to end. The total packet is the aggregation of the headers plus data. This would keep the all of this isolated. The extra work would be having to iterate over a list of lists to get all of the data on transfer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would keep the all of this isolated.

Well, the whole idea of the refactor was to allow to reuse the same header buffer for all packets. That was a brave refactor by @tbursztyka, and well, sometimes things don't go the easy way it initially seemed. (No fingerpointing, we just should accept (I did) that if we don't adopt well-known solutions like lwIP, then we're effectively experimenting out, and experiments sometimes fail.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the same ll header buffer would work only on eth device that don't queue/dma anything. But it is possible. (thus the PR #11873) Though there are not completely solved issue around gptp. Anyway, that's an optimization for small systems, so it's not the priority atm.

I pushed a new version of the PR. But I bet @aurel32 you'll find issues about queuing so that will need to be addressed next.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the same ll header buffer would work only on eth device that don't queue/dma anything. But it is possible. (thus the PR #11873)

I wonder if we should really design the network stack with the option of supporting DMA or not. Most (all?) SoCs with Ethernet have DMA support. I agree that not all drivers have to use DMA, and that it's a good idea to keep a few very simple drivers for now to test and validate the network stack. Having DMA support in all Ethernet drivers should still be a long term goal for Zephyr. So I really think the network stack should be able to support DMA by default.

I pushed a new version of the PR. But I bet @aurel32 you'll find issues about queuing so that will need to be addressed next.

I confirm that there are issues, actually just a few seconds after booting:

[00:00:00.000,069] <inf> i2c_sam_twihs: Device I2C_0 initialized
[00:00:00.001,249] <inf> eth_sam: MAC: fc:c2:3d:12:e9:21
[00:00:00.001,385] <inf> eth_sam: Queue 0 activated
[00:00:00.001,390] <inf> eth_sam_phy: Soft Reset of ETH PHY


uart:~$ [00:00:00.140,016] <inf> eth_sam_phy: PHYID: 0x221561 at addr: 0
[00:00:00.140,016] <inf> eth_sam_phy: PHYID: 0x221561 at addr: 0
uart:~$ ***** Booting Zephyr OS zephyr-v1.13.0-3341-gc18fe74b53 *****
[00:00:02.820,019] <inf> eth_sam_phy: common abilities: speed 100 Mb, full duplex
[00:00:02.820,221] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,240] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.820,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.820,449] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,566] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569)
[00:00:02.820,583] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,700] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569)
[00:00:02.820,719] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.821,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.821,453] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
0:02.826,257] <inf> net_config: Initializing network
00:00:02.820,019] <inf> eth_sam_phy: common abilities: speed 100 Mb, full duplex
uart:~$ [00:00:02.826,281] <inf> net_config: IPv4 address: 10.243.120.200
[00:00:02.820,221] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,240] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.820,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.820,449] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,566] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569)
[00:00:02.820,583] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.820,700] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already by tx_completed():569 (tx_completed():569)
[00:00:02.820,719] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.821,431] <err> net_pkt: *** ERROR *** frag 0x2040ca10 is freed already (tx_completed():569)
[00:00:02.821,453] <err> net_pkt: *** ERROR *** frag 0x2040ca28 is freed already (tx_completed():569)
[00:00:02.826,257] <inf> net_config: Initializing network
[00:00:02.826,281] <inf> net_config: IPv4 address: 10.243.120.200
uart:~$ Single-threaded dumb HTTP server waits for a connection on port 8080...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver, when queuing, should reference the pkt once more, but also ref the first net_buf, and keep it by its own means (out of pkt, since l2 is going to remove it from the list).

@pfalcon
Copy link
Collaborator

pfalcon commented Jan 18, 2019

See #12560 (comment) for the intro into how I do testing.

So, with this patch, situation with dumb_http_server is much more lively (no errors reported with just master):

Connection #76 from 192.0.2.2
Connection from 192.0.2.2 closed
[00:00:30.802,750] <err> net_tcp: NULL TCP header!
--- 661 messages dropped ---
[00:00:30.802,757] <err> net_tcp: pkt 0x20003548 has no TCP header
[00:00:30.802,766] <err> net_pkt: NULL fragment data!
[00:00:30.802,775] <err> net_tcp: NULL TCP header!
[00:00:30.802,781] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,834] <err> net_pkt: NULL fragment data!
[00:00:30.810,841] <err> net_tcp: NULL TCP header!
[00:00:30.810,867] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,876] <err> net_pkt: NULL fragment data!
[00:00:30.810,884] <err> net_tcp: NULL TCP header!
[00:00:30.810,891] <err> net_tcp: pkt 0x20003548 has no TCP header
...

Errors repeat indefinitely.

@jukkar
Copy link
Member

jukkar commented Jan 18, 2019

@tbursztyka This patch should fix the IPv6 test which is failing

0001-tests-net-ipv6-Fix-the-DAD-failure-in-the-test.patch.txt

@tbursztyka
Copy link
Collaborator Author

@pfalcon so basically tcp pkt is fully fucked up with this PR? What's the setup so I can try to reproduce it?

@agross-oss
Copy link
Collaborator

@tbursztyka I think his issues went away when he restricted the frdm ethernet driver to only 1 RX buffer.

@pfalcon
Copy link
Collaborator

pfalcon commented Jan 18, 2019

so basically tcp pkt is fully fucked up with this PR?

Well, I didn't say that. I mostly trying to find the answer to question posed by @agross-linaro in #12560 - how on earth nobody noticed those issues before him? So, I'm trying different things, and well, while I see various issues, I have to conclude that those look like different issues than @agross-linaro's, i.e. my standard testing process doesn't hit cases as used by him :-I.

What's the setup so I can try to reproduce it?

My test plans are described in https://github.com/pfalcon/zephyr/wiki/NetworkingTestPlan

@aurel32
Copy link
Collaborator

aurel32 commented Jan 18, 2019

See #12560 (comment) for the intro into how I do testing.

So, with this patch, situation with dumb_http_server is much more lively (no errors reported with just master):

Connection #76 from 192.0.2.2
Connection from 192.0.2.2 closed
[00:00:30.802,750] <err> net_tcp: NULL TCP header!
--- 661 messages dropped ---
[00:00:30.802,757] <err> net_tcp: pkt 0x20003548 has no TCP header
[00:00:30.802,766] <err> net_pkt: NULL fragment data!
[00:00:30.802,775] <err> net_tcp: NULL TCP header!
[00:00:30.802,781] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,834] <err> net_pkt: NULL fragment data!
[00:00:30.810,841] <err> net_tcp: NULL TCP header!
[00:00:30.810,867] <err> net_tcp: pkt 0x20003598 has no TCP header
[00:00:30.810,876] <err> net_pkt: NULL fragment data!
[00:00:30.810,884] <err> net_tcp: NULL TCP header!
[00:00:30.810,891] <err> net_tcp: pkt 0x20003548 has no TCP header
...

Errors repeat indefinitely.

When applying the patch, I also get the same issue on an RT1020 board, even after setting CONFIG_ETH_MCUX_RX_BUFFERS=1 and CONFIG_ETH_MCUX_TX_BUFFERS=1.

net_pkt_clone() was missing some attributes.

Signed-off-by: Tomasz Bursztyka <[email protected]>
@tbursztyka
Copy link
Collaborator Author

@aurel32 There was an obvious mistake in how ethernet removed the inserted net_buf. Can you retry?

@tbursztyka
Copy link
Collaborator Author

@pfalcon can you retry?

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 6, 2019

@tbursztyka: Note that I wasn't able to reproduce any particular adverse effects in my usual testing. So I guess, it's up to @agross-linaro (and @aurel32 ?) to check whether it improves things for them. I'm happy to +1 it, but would wait for feedback from them. I'll also try to test this PR later anyway.

Tomasz Bursztyka and others added 2 commits February 6, 2019 12:17
The packet can be referenced somewhere else and letting the newly added
L2 header will generate corrupt packet. If the same packet is being
resent, ethernet will add again its L2 header. Thus the need to remove
such L2 header every time a packet has been sent, successfully or not.

Fixes zephyrproject-rtos#12560

Signed-off-by: Tomasz Bursztyka <[email protected]>
The DAD was failing because we received the DAD network packet
even when should not had received it. Fix it by discarding all
the ICMPv6 NS packets that we receive.

Signed-off-by: Jukka Rissanen <[email protected]>
@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

Another fresh thought: the whole initial invariant was that driver's ->send() is blocking, i.e. it returns when the packet has actually gone on the wire, right? Well, let's enforce that (for now and foreseeable future). Then eth_sam_gmac should actually block on a semaphore to wait when DMA actually has sent the packet out.

That won't work for PTP. And the GMAC driver is relatively easy to "fix" if we agree the packet can be modified, by referencing the fragments instead of the packet. I'll try to work on that tonight.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

Hmm, if we change the GMAC driver to use the fragments instead of the packet, I have the impression that the only real user of the reference counter in net_pkt will be the PTP code. Otherwise the reference counter will only be 1.

@tbursztyka
Copy link
Collaborator Author

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

@aurel32 I was thinking of such signature int (*send)(struct device *dev, struct net_pkt *pkt, struct net_buf *l2_hdr);

But that will look very hackish on driver side: they'll need first to read from the l2_hdr, then only read from the net_pkt to fill in their local memory.

Let's give it a bit more thoughts.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

@tbursztyka One solution for the PTP part is to clone the packet when PTP is used. This is also very hackish, though I don't know which solution is the more hackish.

@tbursztyka
Copy link
Collaborator Author

@aurel32 cloning is costly also.

Too bad we cannot guaranty running the whole TX thread before TS one.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 6, 2019

@tbursztyka

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

Of course it can! Yes, that's not too efficient, but we need to make stuff work first, optimize later. There're too many moving parts in there, we need to fix something.

@aurel32:

I have the impression that the only real user of the reference counter in net_pkt will be the PTP code. Otherwise the reference counter will only be 1.

Here's another emerging usecase fo refcounts: #12564 (comment) . Note that it's on receive path. And as we can see, there're widely different requirements and issues to address on TX vs RX paths.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

@tbursztyka

@pfalcon synchronous send would be nice, but that cannot happen on driver that have hw tx queues.

Of course it can! Yes, that's not too efficient, but we need to make stuff work first, optimize later. There're too many moving parts in there, we need to fix something.

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

@tbursztyka
Copy link
Collaborator Author

yes, let's find a way to properly fix this. We have almost 2months ahead of us, it should be enough.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 6, 2019

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

Well, I already told @mnkp on another occasion that gmac driver is far to advanced for the current state of Zephyr and needs to scale down. Story repeats itself.

If you think you can handle that all together, ok.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 6, 2019

Well we should not break stuff first. If A was working, and B breaks it, I don't see why we should de-optimize A to fix that.

Well, I already told @mnkp on another occasion that gmac driver is far to advanced for the current state of Zephyr and needs to scale down. Story repeats itself.

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

If you think you can handle that all together, ok.

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part. I don't know about the PTP part, but that's not specific to the GMAC driver.

Now using your arguments that Zephyr needs to scale down, maybe we can just remove PTP support from the Ethernet drivers?

aurel32 added a commit to aurel32/zephyr that referenced this pull request Feb 7, 2019
The SAM GMAC Ethernet driver uses scatter gather DMA to transmit data.
Each fragment of a network packet is mapped from a set of descriptors
that is used by the controller to do the DMA transfer. This means that
the packet is not necessary sent when the send() function returns. For
that reason the driver calls net_pkt_ref() on the packet to prevent it
from being freed. It is then unreferenced with net_pkt_unref() in the
TX ISR when the packet has effectively been sent.

However this doesn't work if the packet is modified in the meantime,
like it will be done in PR zephyrproject-rtos#12563 to remove the Ethernet header
contained in the first fragment. To avoid that, call net_pkt_frag_ref()
on each fragment of the packet, and unreferenced them with
net_pkt_frag_unref() in the TX ISR when the packet has effectively been
sent.

Signed-off-by: Aurelien Jarno <[email protected]>
@pfalcon
Copy link
Collaborator

pfalcon commented Feb 7, 2019

@aurel32:

To not leave this without reply (because it's important to be on the same line, or at least understand approaches of other folks):

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

No, we just do things in stages. We currently have a broken stack which pretends to support DMA (but as it's broken, it's hard to tell how well it does). The idea is to fix up moving parts for the time being and fix the stack. We can revisit "get everything out of DMA" case after that (which may require bigger changes, e.g. add a dichotomy of memory overhead optimized drivers vs performance-optimized drivers). I doubt we can do it right in haste.

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part.

That's absolutely great that we have people that want to fix things right here, right now! But I'm fixing up things here for 3rd year, and it's still broken. Hint-hint - just patching things around may be not the best option ;-).

I don't know about the PTP part, but that's not specific to the GMAC driver.

Yes, that's my point - gmac driver is a sole white crow. Whereas gPTP is a well-know little PITA, behind a gate (disable it and stop caring about it).

Now using your arguments that Zephyr needs to scale down, maybe we can just remove PTP support from the Ethernet drivers?

Well, I very rarely argue for code removal. But code gets removed anyway (e.g. hasty net_app case). No, I don't hint there's a chance for gPTP to be removed. But @jukkar invited me to look to adding it and VLAN support to some drivers, I looked into it, and ... decided I won't do anything about it. Because otherwise it requires to start another crusade about the need to refactor them. For example, raw gPTP is sprinkled/duplicated from one driver to another. I have no idea how it ended up like that. It needs to be reworked for sure. Or... just left alone (if you don't need that gPTP, which I thankfully don't so far).

Just my 2 cents ;-)

@aurel32
Copy link
Collaborator

aurel32 commented Feb 7, 2019

@aurel32:

To not leave this without reply (because it's important to be on the same line, or at least understand approaches of other folks):

Well, we are designing an IP stack that is not DMA friendly. We are in 2019 almost all MCUs have a DMA, and Zephyr should be able to handle that (which doesn't mean that all drivers should use DMA).

No, we just do things in stages. We currently have a broken stack which pretends to support DMA (but as it's broken, it's hard to tell how well it does).

Well it used to work with DMA, and the recent refactoring actually requires to do changes which makes it incompatible with DMA. It means that this stage went into the wrong direction wrt DMA.

The idea is to fix up moving parts for the time being and fix the stack. We can revisit "get everything out of DMA" case after that (which may require bigger changes, e.g. add a dichotomy of memory overhead optimized drivers vs performance-optimized drivers). I doubt we can do it right in haste.

Well the point of DMA is performance, but also avoid copying data, so I don't get your point about the dichotomy. If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

Yes, I'll work on a patch tonight for the GMAC driver, excluding the PTP part.

That's absolutely great that we have people that want to fix things right here, right now! But I'm fixing up things here for 3rd year, and it's still broken. Hint-hint - just patching things around may be not the best option ;-).

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent. It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver. But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not. If that's the way to go, I can provide a patch for the GMAC driver and open bugs against the other drivers.

OTOH the RX path of the driver should be almost free of requirements, given the RX workqueue basically serializes the reception of packets. Instead the requirements are on the other side of the RX stack when the data reach the application.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 7, 2019

It means that this stage went into the wrong direction wrt DMA.

Exactly my point. It went its own way, and everything now needs to be reworked. And instead of concentrating on reworking the code, we're spread thin to keep DMA case from falling apart. Just forget about it for some time, fix the single experimental driver to work like all other drivers do.

If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

And that's the big issue with that driver. It need tons of buffers, and instead of making its own, it still those buffers from system (== all other drivers, and user applications). Until that is fixed, that driver will always be a PITA.

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent.

If you guys can't fix it well enough during next week, I'd say that's really good plan to follow.

It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver.

The main point - no need to remove any code from GMAC driver. Just need to introduce a blocking on mutex/sema at the end of its send() - i.e. return not when DMA request is set up and started, but when it's finished. The DMA itself will be used, and later the whole matter can be revisited.

But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not.

Exactly my point either. Instead of trying to franken-patch DMA support and dig rabbit wholes where refcounts on packets are separate from refcounts on the underlying buffers, let's better brush up and document the requirements for the baseline networking drivers. Once that is there, we can think how to to add "advanced drivers" model in addition to that.

OTOH the RX path of the driver should be almost free of requirements

Hopefully so. Again, worth thinking. As mentioned above, there's a usecase of putting the same packet in different queues. If we can't get to work with refcounts, it may be good time to question whether we need all those refcounts at all.

@aurel32
Copy link
Collaborator

aurel32 commented Feb 7, 2019

It means that this stage went into the wrong direction wrt DMA.

Exactly my point. It went its own way, and everything now needs to be reworked. And instead of concentrating on reworking the code, we're spread thin to keep DMA case from falling apart. Just forget about it for some time, fix the single experimental driver to work like all other drivers do.

Well "like all other drivers do" is not really something that exists, they all are different and use different assumptions. At least one other will be broken by that L2 header removal in the PTP case.

If you refer about the amount of RX buffers required by the GMAC driver, you also have to take into account that contrary to other drivers, it doesn't require a static buffer to hold the RX or TX data (either in the driver or hidden into the HAL).

And that's the big issue with that driver. It need tons of buffers, and instead of making its own, it still those buffers from system (== all other drivers, and user applications). Until that is fixed, that driver will always be a PITA.

I disagree here. First of all making its own buffer, means they have to be copied at some point, which means more memory. Secondly this driver requires more buffers than other drivers, because it enforce at compile time the amount of buffers needed to receive a full Ethernet frame. I am not sure how the other drivers handle the situation of not having enough RX buffers to receive a frame (well I am almost sure they all behave differently), but I guess we can just remove the compile time check and just drop the frame when it happens. Now I still believe the current situation is better than other drivers that just need more stack than the default TX stack size to be able to send a full Ethernet frame.

Ok, let's go that way, let's add a rule that drivers must not return from the send function until the frame is sent.

If you guys can't fix it well enough during next week, I'd say that's really good plan to follow.

Well I have already proposed a PR that fixes the non PTP case already. As already said the PTP case has to be fixed for both MCUX and GMAC, so we probably want a common solution here.

It allows the L2 ethernet code to remove the L2 header like proposed in this PR, without breaking PTP. That will remove quite a bit of code from the GMAC driver.

The main point - no need to remove any code from GMAC driver. Just need to introduce a blocking on mutex/sema at the end of its send() - i.e. return not when DMA request is set up and started, but when it's finished. The DMA itself will be used, and later the whole matter can be revisited.

I kind of disagree there. You end-up with a lot of code that is useless but is still executed. For example there is no need to ref the packet and unref it in the ISR. In addition this will just bit-rot and will likely explode once the mutex is removed.

But let's list the rules and apply the same rules to all drivers, just not when there is something that looks broken. When we have rules, it's easy by looking at the code to know if it is followed or not.

Exactly my point either. Instead of trying to franken-patch DMA support and dig rabbit wholes where refcounts on packets are separate from refcounts on the underlying buffers, let's better brush up and document the requirements for the baseline networking drivers. Once that is there, we can think how to to add "advanced drivers" model in addition to that.

OTOH the RX path of the driver should be almost free of requirements

Hopefully so. Again, worth thinking. As mentioned above, there's a usecase of putting the same packet in different queues. If we can't get to work with refcounts, it may be good time to question whether we need all those refcounts at all.

I get your point of using refcounts to put the same packet in different queues, that said I am really not convinced this should be done at the Ethernet driver level. Therefore this is an issue of the IP stack.

Copy link
Collaborator

@agross-oss agross-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on frdm_k64f w/ google iot application

jukkar pushed a commit that referenced this pull request Feb 8, 2019
The SAM GMAC Ethernet driver uses scatter gather DMA to transmit data.
Each fragment of a network packet is mapped from a set of descriptors
that is used by the controller to do the DMA transfer. This means that
the packet is not necessary sent when the send() function returns. For
that reason the driver calls net_pkt_ref() on the packet to prevent it
from being freed. It is then unreferenced with net_pkt_unref() in the
TX ISR when the packet has effectively been sent.

However this doesn't work if the packet is modified in the meantime,
like it will be done in PR #12563 to remove the Ethernet header
contained in the first fragment. To avoid that, call net_pkt_frag_ref()
on each fragment of the packet, and unreferenced them with
net_pkt_frag_unref() in the TX ISR when the packet has effectively been
sent.

Signed-off-by: Aurelien Jarno <[email protected]>
@jukkar jukkar merged commit 8de351c into zephyrproject-rtos:master Feb 8, 2019
aurel32 added a commit to aurel32/zephyr that referenced this pull request Feb 12, 2019
Scale down the TX path of the GMAC driver by waiting for a packet to be
fully sent before returning from the send function. This has a small
performance impact, but has a few advantages:
- It allows the Ethernet code to modify the packet afterward, fixing PTP
  support on this board (see PR zephyrproject-rtos#12563).
- It returns an error to the IP stack in case of a transmit failure.
- It doesn't require net_buf to be thread safe.

This change can be reverted by changing GMAC_MULTIPLE_TX_PACKETS from 0
to 1.

Signed-off-by: Aurelien Jarno <[email protected]>
nashif pushed a commit that referenced this pull request Feb 15, 2019
Scale down the TX path of the GMAC driver by waiting for a packet to be
fully sent before returning from the send function. This has a small
performance impact, but has a few advantages:
- It allows the Ethernet code to modify the packet afterward, fixing PTP
  support on this board (see PR #12563).
- It returns an error to the IP stack in case of a transmit failure.
- It doesn't require net_buf to be thread safe.

This change can be reverted by changing GMAC_MULTIPLE_TX_PACKETS from 0
to 1.

Signed-off-by: Aurelien Jarno <[email protected]>
tgorochowik added a commit to antmicro/zephyr that referenced this pull request Feb 20, 2019
Wait in the send callback for the packet to be actually sent.
After this change, only one TX packet will be handled at once.
This is needed because of the way the TX packets are currently handled
in L2 after this PR: zephyrproject-rtos#12563

This is similar to what zephyrproject-rtos#13167 did for the SAM GMAC on SAM E-70.

Without this, packet time-stamping does not work with the current stack.

This commit is minimalistic on purpose to make it easily revertible when
the network stack is able to properly handle DMA drivers for TX packets
again.

Signed-off-by: Tomasz Gorochowik <[email protected]>
nashif pushed a commit that referenced this pull request Feb 21, 2019
Wait in the send callback for the packet to be actually sent.
After this change, only one TX packet will be handled at once.
This is needed because of the way the TX packets are currently handled
in L2 after this PR: #12563

This is similar to what #13167 did for the SAM GMAC on SAM E-70.

Without this, packet time-stamping does not work with the current stack.

This commit is minimalistic on purpose to make it easily revertible when
the network stack is able to properly handle DMA drivers for TX packets
again.

Signed-off-by: Tomasz Gorochowik <[email protected]>
@tbursztyka tbursztyka deleted the eth_hdr_rem branch March 12, 2019 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants