Skip to content

frdm_k64f: samples/net/sockets/echo_server doesn't work #13301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Feb 12, 2019 · 15 comments
Closed

frdm_k64f: samples/net/sockets/echo_server doesn't work #13301

ghost opened this issue Feb 12, 2019 · 15 comments
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: medium Medium impact/importance bug

Comments

@ghost
Copy link

ghost commented Feb 12, 2019

samples/net/sockets/echo_server shows no reaction on frdm_k64f board with default setting

as suggested by @jarz-nordic it runs with additional CONFIG_LOG_PRINTK=y enabled

Another observation:
I keept all the features as they were AND I've addded one more:
CONFIG_LOG_PRINTK=y and it solved the problem.

We might have a problem with simultanous access to UART. @nordic-krch suggested that: Logger - shell
and printk are compeating for UART access at the same time. Maybe one is breaking anothers transfer?
When I've set CONFIG_LOG_PRINTK=y all parties accessing UART are synchronized.

@ghost ghost added the bug The issue is a bug, or the PR is fixing a bug label Feb 12, 2019
@rlubos
Copy link
Collaborator

rlubos commented Feb 12, 2019

Do you have stack sentinel enabled? Running the samples on nrf52840_pca10056 with overlay-802154 shows that logger_thread stack overflows on both, echo_server and echo_client:

***** Stack Check Fail! *****
Current thread ID = 0x20000aa4
Faulting instruction address = 0x15fba

Increasing LOG_PROCESS_THREAD_STACK_SIZE to 1024 worked for me.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 12, 2019

We might have a problem with simultanous access to UART. @nordic-krch suggested that: Logger - shell
and printk are compeating for UART access at the same time. Maybe one is breaking anothers transfer?

Oh really? One may only wonder how did we live before. printk() was used concurrently from everywhere - from multiple threads, from ISRs, whatever, and nothing deadlocked.

@jakub-uC
Copy link
Collaborator

jakub-uC commented Feb 12, 2019

I can smell an irony here...

What options we have to improve situation?

@nordic-krch
Copy link
Collaborator

@pfalcon you have shell talking to uart in the interrupt mode and printk in polling. That will clash eventually. As now we have option to have synchronous logs i think that the best is to redirect printk to log. If we want in future to use printk through the same medium as shell/logger but in parallel then we need to implement tty interace imo.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 12, 2019

@jarz-nordic:

What options we have to improve situation?

Well, please let me finish with #13260, proposal for resolving which includes reverting #12747 (the nature of which appear to be the opposite - make different thread output to deadlock on each other), and then I'll look into retesting with new data available here and come out with options, though I bet it will be the well-known one - IMMEDIATE should be default for logging.

@ghost
Copy link
Author

ghost commented Feb 12, 2019

echo_server runs with CONFIG_STACK_SENTINEL=y and CONFIG_LOG_PRINTK=n

@jakub-uC
Copy link
Collaborator

jakub-uC commented Feb 12, 2019

@pfalcon : But reverting #12747 will not improve situation in any way it will only make it worse. What this PR does is preventing shell from breaking other transfer started by shell.

It adds a mutex to shell_fprint function.

@pfalcon
Copy link
Collaborator

pfalcon commented Feb 12, 2019

It adds a mutex to shell_fprint function.

Let me translate that: "mutex" means "deadlock". More detailed account comes in #13260, I'd appreciate if you could look there. The only reason it pops up here is because we have deadlocking around. (Causes may be absolutely different, so let's not discuss that case here.)

@agansari
Copy link
Collaborator

agansari commented Feb 12, 2019

I've made some changes to eth_mcux to handle kinetis interrupts similar to i.mx:

@@ -965,8 +965,10 @@ static void eth_mcux_rx_isr(void *p)
 {
        struct device *dev = p;
        struct eth_context *context = dev->driver_data;
+       int irq_lock_key = irq_lock();
 
        ENET_ReceiveIRQHandler(ENET, &context->enet_handle);
+       irq_unlock(irq_lock_key);
 }
 #endif
 
@@ -975,8 +977,10 @@ static void eth_mcux_tx_isr(void *p)
 {
        struct device *dev = p;
        struct eth_context *context = dev->driver_data;
+       int irq_lock_key = irq_lock();
 
        ENET_TransmitIRQHandler(ENET, &context->enet_handle);
+       irq_unlock(irq_lock_key);
 }
 #endif
 
@@ -985,12 +989,14 @@ static void eth_mcux_error_isr(void *p)
 {
        struct device *dev = p;
        struct eth_context *context = dev->driver_data;
+       int irq_lock_key = irq_lock();
        u32_t pending = ENET_GetInterruptStatus(ENET);
 
        if (pending & ENET_EIR_MII_MASK) {
                k_work_submit(&context->phy_work);
                ENET_ClearInterruptStatus(ENET, kENET_MiiInterrupt);
        }
+       irq_unlock(irq_lock_key);
 }
 #endif

This will prevent the deadlock by letting Ethernet interrupts run their course without anyone interrupting. I ran without CONFIG_LOG_PRINTK=y.
This maybe just hiding the issue under the rug.

@ghost
Copy link
Author

ghost commented Feb 12, 2019

echo_server seems to run with the changes to eth_mcux.c for the first moment, but it gets totally instable.
Pings/Packages drop after a few minutes and finally there is no reaction any more.

edit:
it get's instable with the eth_mcux.c from 8d86773 as well, but it seems to take much longer (>20min).

@agansari
Copy link
Collaborator

I'm still investigating this issue after the merge #13260, looks to me it's something similar to what @pfalcon mentioned a conflict between ethernet and uart interrupts/configurations. I'll continue tomorrow to debug this issue.

@agansari agansari self-assigned this Feb 12, 2019
@nashif nashif added the priority: medium Medium impact/importance bug label Feb 12, 2019
@rlubos
Copy link
Collaborator

rlubos commented Feb 13, 2019

I don't have an ethernet board to reproduce, but I can add here my observations from running the samples on qemu_x86. Both, echo_client and echo_server stopped at assertions (after exchanging ~27k TCP packets).
Echo client failed here (FYI @tbursztyka, from what I've seen you've recently reworked the code where this function is called):

ASSERTION FAIL [iface] @ /home/rolu/repos/zephyrproject/zephyr/subsys/net/ip/net_stats.h:83

UPDATE_STAT(iface, stats.ipv6.drop++);

And echo server failed here:

ASSERTION FAIL [context->tcp] @ /home/rolu/repos/zephyrproject/zephyr/subsys/net/ip/tcp.c:1480

I've also ran the samples over 802.15.4 on nrf52840_pca10056 overnight, and they ran with no issues (beside of the logger stack problem described above). But in this configuration it's UDP only.

@agansari
Copy link
Collaborator

@i8080 looks like this is no longer an issue at this moment.

nashif pushed a commit that referenced this issue Feb 21, 2019
The sending of the final FIN message was being put on the sent_list
to be handled by the retry logic and being sent directly. But in
the case of a socket that never had a connection, the logic after
the direct send would decrement the reference on the packet which
would cause the retry logic to eventually have a situation where
the pkt is on the sent_list queue but the buffer associated with
the packet is freed back to slab. The code would then get a null
pointer to the tcp header and would fault when setting the crc to
zero on frdm-k64f platform.

Fixes #13489, #13301

Signed-off-by: david leach <[email protected]>
@pfalcon
Copy link
Collaborator

pfalcon commented Feb 27, 2019

looks like this is no longer an issue at this moment.

Sorry for the delay with getting back to test this, but various fixes applied in the meantime indeed appear to have positive effect, and I confirm that as of master ddf744d, samples/net/sockets/echo_server works out of the box. I test all for combinations of (IPv4, IPv6) x (TCP, UDP).

@i8080, I'm closing this issue, as we need to maintain actual picture of the bugs for the release. If you still see an issue, please comment/reopen.

@pfalcon pfalcon closed this as completed Feb 27, 2019
@pfalcon
Copy link
Collaborator

pfalcon commented Feb 27, 2019

For logging issues (which are no longer visible by default, but pop up with CONFIG_LOG_IMMEDIATE), I opened #13897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

6 participants